Spotting patterns among patient records can allow physicians to better diagnose and treat their patients. With access to more records, and more time to parse them, it’s possible that these health care providers could identify and provide better treatments for conditions that have been particularly elusive to diagnoses. Recent developments in artificial intelligence and natural language processing programs are making it possible to glean information from volumes of electronic health records — giving doctors an important new tool to help patients.
One of the largest tests of this effort took place recently in a collaboration between the Children’s Hospital of Philadelphia and researchers from Drexel University’s College of Computing & Informatics. The team developed a process using NLP and AI programs to parse more than 53 million patient notes in a database of pediatric patients and spot similarities among patient groups and determine their risk for developing certain diseases in the future.
Maryam Daniali, a PhD candidate at CCI and the lead author of the paper, and Scott Haag, PhD, an assistant research professor at CCI and a supervisor at CHOP, developed the data analysis method, using machine learning and other AI systems. This collaborative effort between Drexel, the Arcus data-driven research team at CHOP, the University of Pennsylvania and Newcastle University, resulted in the successful development of the clinical data analysis process at scale. Daniali recently shared some of her insights on the project and the prospects of using artificial intelligence to improve health care.
What is the current process for comparing medical notes — both internally/in-system and broadly?
By comparing patient notes, health care professionals can gain insights, detect patterns, and make informed treatment decisions.
There are several processes used for comparing patient notes. Traditionally, health care professionals review patient notes side-by-side to identify similarities and differences. Another conventional method of documenting and comparing relevant information across patients is to create summary tables or spreadsheets. This process, however, is relatively slow and places a heavy burden on medical experts.
More recently, text mining techniques, such as natural language processing (NLP) methods, have been used in the clinical domain. Such systems automatically extract and analyze textual data, identify key concepts, and facilitate comparison of the results. NLP systems have shown significant performance on general text data, but they often struggle with clinical data. This is mainly because of the dynamic nature of diseases and the limited availability of open access datasets.
Ontology-based approaches are another type of techniques that are based on formal representations of knowledge created by experts in the field and can be used to compare patient notes. By mapping patient records to a common ontology, health care professionals can establish relationships, detect overlaps, and identify common elements. Such methods enhance the comparison process but require a huge amount of computation, thus they are only practical for hundreds and thousands of patient records.
In our study, we introduce a novel technique that combines text mining (NLP) and ontology-based approaches and demonstrate its practicality on hospital-size data, terabytes of patient data, with more than 53 million records. Our model also shows greater agreement with clinical experts than conventional techniques.
What sorts of information can this program glean from electronic health records? What is an example of how it could be used?
Our technique transforms electronic health records (EHRs) into low-dimensional vector representations that capture clinical characteristics of patients and their diseases. This allows us to glean insights into various aspects of health care. By analyzing EHR data, we can identify disease patterns, identify risk factors and assess outcomes.
To illustrate the practical application of our technique, let’s consider an example. Suppose there is a group of patients with a genetic disorder related to seizures. By analyzing their medical records using our technique, we can cluster these patients based on their shared characteristics and create distinct groups. With these clusters, we can provide personalized recommendations and treatment plans tailored to each group’s specific needs. Taking this approach ensures that patients receive targeted care, which increases their chances of improving their health.
Another example is revealing hidden subgroups within a population of heart failure patients. These subgroups may have different survival rates or respond differently to various treatments. By identifying these subgroups, health care providers can better understand the nuances of the condition and develop individualized care plans.
Additionally, our program offers an interpretable visualization that measures disease similarity. This visualization can improve the accuracy of diagnoses, prognoses and treatment decisions, ultimately enhancing the overall quality of health care.
What makes this (EHR data mining) a good use case for AI?
Using artificial intelligence to mine electronic health records has some exciting benefits. First, it helps doctors and nurses by automating tasks like organizing and analyzing data. This saves time and allows health care professionals to focus more on patient care and important decisions.
AI can also uncover valuable insights from EHR data that might not be obvious to humans. It can find patterns, detect anomalies and make connections between different pieces of information. This can lead to better decision-making, personalized treatments and improved patient outcomes.
EHR data mining with AI is especially useful because it can handle large and complex datasets found in healthcare. It can integrate data from various sources and make sense of different formats and terminologies. This means that health care providers can get a more complete picture of a patient’s health and make more informed decisions.
Another remarkable thing about AI in EHR data mining is that it keeps learning and improving over time. As it processes more data, it becomes better at analyzing and predicting outcomes. This continuous learning helps AI stay up to date with the latest medical knowledge and improves its ability to support health care decisions. We believe that the partnership between AI and clinicians can be a game-changer in health care.
What were the key advances in technology and medical record-keeping that enabled this opportunity to apply AI?
The opportunity to apply AI in health care has been made possible by two key advancements: the widespread adoption of EHRs and the development of powerful machine learning algorithms.
EHRs store patient data electronically, providing a wealth of information for analysis and have become increasingly common over the past decade, with more than 90% of hospitals in the U.S. now using them.
Machine learning algorithms can efficiently process and analyze this data, uncovering valuable insights and improving patient care. In our study, we were able to use advanced NLP algorithms to analyze more than 53 million patient records simultaneously. These advancements have the potential to revolutionize health care by enabling AI to make accurate diagnoses and personalize treatments.
What challenges did you encounter in applying the Arcus program to analyze 53 million notes?
Applying our technique to analyze over 53 million patient notes presented several challenges that required careful consideration. One of the primary hurdles we encountered was the nature of patient notes in EHRs. These notes often contain unstructured data, which means they are not standardized or organized in a uniform manner. This made it challenging to directly compare and extract meaningful insights from the notes.
Another challenge was the presence of ambiguous terms in the patient notes. Some terms had multiple meanings, making it crucial to discern the intended context. Additionally, determining whether medical terms were negated or not proved to be a complex task. Differentiating between conditions described by clinicians and those reported by patients themselves added further complexity to the analysis. Additionally, clinicians and departments recorded patient conditions differently, causing additional difficulties in establishing consistency.
From a technological perspective, finding suitable natural language processing (NLP) techniques that could effectively handle clinical domain text was a significant challenge. Many existing NLP techniques are designed for general text and struggle to accurately identify and interpret medical terms within patient notes. In Arcus, we used an open-source NLP system that overcomes some of these difficulties.
Another unique challenge we faced was the absence of true labels or diagnoses for comparison. To address this, we conducted a survey among clinical experts and considered their majority vote as the “gold standard” for comparison. However, even among the experts, there was variability in the results. This underlines the inherent difficulty of clinical tasks and the need for ongoing research and refinement in this domain.
Overcoming these challenges required a combination of advanced technological approaches, domain expertise and collaboration with clinical experts. By navigating these obstacles, we extracted valuable insights from the vast amount of patient data and contributed to the advancement of health care knowledge and practices.
What were the most unexpected or exciting findings or trends that you discovered in your analysis?
During our analysis, we made some exciting discoveries that have the potential to revolutionize health care. One of the most significant findings was that using statistics derived from a large collection of patient data provides a more reliable way to measure similarities and differences between diseases. This is a groundbreaking revelation because previous approaches relied only on knowledge graphs created by clinicians to describe diseases and their relationships.
Although they provide a thorough understanding of diseases, these knowledge graphs are not always up to date. For instance, this can be problematic when dealing with a new epidemic. By leveraging a large corpus of patient data, we can obtain a more comprehensive understanding of various conditions, including rare diseases, which can greatly enhance diagnoses and treatment options.
With a more reliable representation of diseases, health care professionals can make more accurate diagnoses and choose appropriate treatment strategies. Our findings have the potential to improve patient outcomes, especially for individuals with rare conditions that may have been challenging to diagnose and treat in the past.
How else could you see AI being applied in the health care field? What are the biggest obstacles facing these applications?
AI is already making significant contributions in various areas of health care. For instance, it assists health care professionals in diagnosing anomalies more accurately and efficiently by analyzing medical images, such as X-rays, MRIs, and CT scans. Furthermore, AI aids in the discovery of new drugs by analyzing vast amounts of data and identifying potential candidates that may have been overlooked. This accelerates drug development and offers hope for faster treatment advancements. AI can also automate administrative tasks, such as appointment scheduling and billing, reducing paperwork and freeing up health care professionals’ time to focus on patient care.
However, several challenges need to be addressed before AI is widely adopted in health care. Data standardization and interoperability remain key obstacles, as health care data is often fragmented across different systems and lacks uniformity. Establishing common standards for seamless data sharing is critical to maximizing AI potential. In addition, it is important to establish trust and acceptance among health care professionals and patients. Concerns about accuracy, reliability and the human touch in health care can limit AI adoption. To gain the trust of all stakeholders, it is vital to educate them about AI and demonstrate its benefits while addressing privacy and security concerns.
Regulations and ethical considerations need to be addressed to fully harness the potential of AI in health care. We must ensure that AI algorithms are transparent, explainable and fair, while mitigating biases. It is important for health care organizations, technology developers, policymakers and patients to work collaboratively to establish guidelines and regulations that promote ethical and responsible AI use.
It is my hope that we will be able to overcome these obstacles and unlock the true potential of AI in health care, transforming the delivery of care and ultimately improving patient outcomes.
Media interested in speaking with Daniali or Haag should contact Britt Faulstick, executive director, News & Media Relations, at 215-895-2617 or firstname.lastname@example.org