AndreyPopov/istock via Getty Ima

What Is the Role of Natural Language Processing in Healthcare?

Natural language processing may be the key to effective clinical decision support, but there are many problems to solve before the healthcare industry can make good on NLP's promises.

For many providers, the healthcare landscape is looking more and more like a shifting quagmire of regulatory pitfalls, financial quicksand, and unpredictable eruptions of acrimony from overwhelmed clinicians on the edge of revolt.

The industry is currently hanging in suspense between the anticipated end of the EHR Incentive Programs and the implementation of the MACRA framework, a transition that may not end up being as smooth as CMS could hope for.

Despite the uncertain atmosphere – or, in some cases, because of it – healthcare providers are taking the opportunity to beef up their big data defenses and develop the technological infrastructure required to meet the impending challenges of value-based reimbursement, population health management, and the unstoppable tide of chronic disease.

Analytics are already playing a major part in helping providers navigate this transition, especially when it comes to the revenue and utilization challenges of moving away from the fee-for-service payment environment. 

But clinical analytics and population health management have been a trickier mountain to climb.  Dissatisfaction with electronic health records remains at a fever pitch, and is unlikely to cool off as developers and regulators try to stuff more and more patient safety features, quality measures, and reporting requirements into the same old software.

Providers often lack access to the socioeconomic, behavioral, and environmental data that would help to create truly actionable analytics at the point of care, and consumer excitement over Internet of Things devices and patient-generated health data is only further complicating the question of how to bring meaningful results to end-users without hopelessly cluttering the computer screen.

While it may be tempting to shut off the laptop, silence the smartphone, and return to a simpler time when the consult room only contained the patient, the provider, and a pad of paper, healthcare won’t solve its insight problems by limiting the amount of data that users have to work with. 

Instead, the old quandary of how to turn big data into smart data will be answered by bigger, smarter computers that can analyze a huge variety of data sources more intelligently, and deliver intuitive, streamlined reports to providers so they can focus on using the information for quality patient care.

Natural language processing (NLP) is at the root of this complicated mission.  The ability to analyze and extract meaning from narrative text or other unstructured data sources is a major piece of the big data puzzle, and drives many of the most advanced and innovative health IT tools on the market.

What is natural language processing?

Natural language processing is the overarching term used to describe the process of using of computer algorithms to identify key elements in everyday language and extract meaning from unstructured spoken or written input.  NLP is a discipline of computer science that requires skills in artificial intelligence, computational linguistics, and other machine learning disciplines.

Some NLP efforts are focused on beating the Turing test by creating algorithmically-based entities that can mimic human-like responses to queries or conversations.  Others try to understand human speech through voice recognition technology, such as the automated customer service applications used by many large companies. 

Still more are centered on providing data to users by identifying and extracting key details from enormously large bodies of information, like super-human speed readers with nearly limitless memory capacity.

Specific tasks for NLP systems may include:

  • Summarizing lengthy blocks of narrative text, such as a clinical note or academic journal article, by identifying key concepts or phrases present in the source material
  • Mapping data elements present in unstructured text to structured fields in an electronic health record in order to improve clinical data integrity
  • Converting data in the other direction from machine-readable formats into natural language for reporting and educational purposes
  • Answering unique free-text queries that require the synthesis of multiple data sources
  • Engaging in optical character recognition to turn images, like PDF documents or scans of care summaries and imaging reports, into text files that can then be parsed and analyzed
  • Conducting speech recognition to allow users to dictate clinical notes or other information that can then be turned into text

Many natural language processing systems “learn” over time, reabsorbing the results of previous interactions as feedback about which results were accurate and which did not meet expectations. 

These machine learning programs can operate based on statistical probabilities, which weigh the likelihood that a given piece of data is actually what the user has requested.  Based on whether or not that answer meets approval, the probabilities can be adjusted in the future to meet the evolving needs of the end-user.

Top 4 Basics to Know about Semantic Computing in Healthcare

How Semantic Data Analytics Benefits Population Health Management

How can natural language processing help providers make better decisions?

In the healthcare industry, natural language processing has many potential applications.  NLP can enhance the completeness and accuracy of electronic health records by translating free text into standardized data.  It can fill data warehouses and semantic data lakes with meaningful information accessed by free-text query interfaces.  It may be able to make documentation requirements easier by allowing providers to dictate their notes, or generate tailored educational materials for patients ready for discharge.

Computer-assisted coding with an NLP foundation received a great deal of attention during the drawn-out ICD-10 conversation process, when it was viewed as a possible silver bullet for the problems of adding sufficient detail and specificity to clinical documentation.

But perhaps of greatest interest right now, especially to providers in desperate need of point-of-care solutions for incredibly complex patient problems, NLP can be – and is being – used for clinical decision support.

The most famous example of a machine learning NLP whiz-kid in the healthcare industry is IBM Watson, which has dominated headlines in recent months due to its voracious appetite for academic literature and its growing expertise in clinical decision support (CDS) for precision medicine and cancer care.

In 2014, just before IBM set up its dedicated Watson Health division, the Jeopardy!-winning supercomputer partnered with EHR developer Epic and the Carillion Clinic in Virginia to investigate how NLP and machine learning could be used to flag patients with heart disease, the first step for helping clinicians take the right actions for patient care.

“Using unstructured data was found to be important in this project,” explained Paul Hake, who worked for the IBM Smarter Care Analytics Group at the time.  “When physicians are recording information, they’ll just prefer to type everything in one place into the notes section of the EMR.  And so this information is kind of lost.  It’s then almost a manual process to map this unstructured information back into the EMR system so that we can then use it for analytics.” 

“We can run natural language processing algorithms against this data and automatically extract these features or risk factors from the notes in the medical record.”

The system didn’t stop at highlighting pertinent clinical data.  It also identified social and behavioral factors recorded in the clinical note that didn’t make it into the structured templates of the EHR.

“Those are some of the factors that are significant in terms of the risk factors,” said Hake.  “Is the patient depressed?  What’s the living status of the patient?  Are they homeless?  These are some of the factors that turn out to be important in the model, but they are also things that can be missed from a traditional analysis that doesn’t consider this sort of unstructured data.”

The pilot program successfully identified 8500 patients who were at risk of developing congestive heart failure within the year.  Watson ran through a whopping 21 million records in just six short weeks, and achieved an 85 percent accuracy rate for patient identification.

More recently, Watson has moved up the difficulty ladder to attack cancer and advanced genomics, which involve even larger data sets.  A new partnership with the New York Genome Center, as well as previous work with some of the biggest clinical and cancer care providers in the country, are prepping the cognitive computing superstar for a career in CDS.

“Cancer is a natural choice to focus on, because of the number of patients and the available proof points in the space,” said Vanessa Michelini, Distinguished Engineer and Master Inventor leading the genomics division of IBM Watson Health.

“There’s this explosion of data – not just genomic data, but all sorts of data – in the healthcare space, and the industry needs to find the best ways to extract what’s relevant and bring it together to help clinicians make the best decisions for their patients.”

In 2014 alone, there were 140,000 academic articles related to the detection and treatment of cancer, she added.  No human being could possible read, understand, and remember all that data, let alone distill it into concrete recommendations about what course of therapy has been most successful for treating patients with similar demographics and comorbidities.

Watson has made a name for itself doing just that, but IBM certainly doesn’t have the NLP world all to itself.  Numerous researchers and academic organizations have been exploring the potential of natural language processing for risk stratification, population health management, and decision support, especially over the last decade or so. 

“There’s this explosion of data in the healthcare space, and the industry needs to find the best ways to extract what’s relevant."

A 2009 article from the Journal of Biomedical Informatics made the case for proactive CDS systems and intelligent data-driven alerts before the EHR Incentive Programs pushed electronic records into the majority of healthcare organizations, and pointed out the vital role that NLP technology would play in making that concept a reality.

“In some cases the facts that should activate a CDS system can be found only in the free text,” wrote three authors from the National Institutes of Health and University of Pittsburgh.  “Notably, medical history, physical examination, and chest radiography results are routinely obtained in free-text form. Indications for further tuberculosis screening could be identified in these clinical notes using NLP methods at no additional cost.”

“In principle, natural language processing could extract the facts needed to actuate many kinds of decisions rules. In theory, NLP systems might also be able to represent clinical knowledge and CDS interventions in standardized formats.”

Since then, those theories have been put into action.  A few of the many examples of national language processing in the clinical decision support and risk stratification realms include:

  • In 2013, the Department of Veterans Affairs used NLP techniques to review more than 2 billion EHR documents for indications of PTSD, depression, and potential self-harm in veteran patients.  The pilot was 80 percent accurate at identifying the difference between records of screenings for suicide and mentions of actual past suicide attempts.
  • Researchers at MIT in 2012 were able to attain a 75 percent accuracy rate for deciphering the semantic meaning of specific clinical terms contained in free-text clinical notes, using a statistical probability model to assess surrounding terms and put ambiguous terms into context. 
  • Natural language processing was able to take the speech patterns of schizophrenic patients and identify which were likely to experience an onset of psychosis with 100 percent accuracy.  The small proof-of-concept study employed an NLP system with “a novel combination of semantic coherence and syntactic assays as predictors of psychosis transition.”
  • At the University of California Los Angeles, researchers analyzed electronic free text to flag patients with cirrhosis.  By combining natural language processing of radiology reports with ICD-9 codes and lab data, the algorithm attained incredibly high levels of sensitivity and specificity.
  • Researchers from the University of Alabama found that NLP identification of reportable cancer cases was 22.6 percent more accurate and precise than manual review of medical records.  The system helped to separate cancer patients whose conditions should be reported to the Cancer Registry Control Panel from cases that did not have to be included in the registry.

What are the challenges of integrating NLP tools into clinical care?

Natural language processing technology is already embedded in products from some electronic health record vendors, including Epic Systems, but unstructured clinical notes and narrative text still present a major problem for computer scientists.

True reliability and accuracy are still in the works, and certain problems such as word disambiguation and fragmented “doctor speak” can stump even the smartest NLP algorithms.

“[Clinical text]… is often ungrammatical, consists of ‘bullet point’ telegraphic phrases with limited context, and lacks complete sentences,” pointed out Hilary Townsend, MSI, in the Journal of AHIMA in 2013. “Clinical notes make heavy use of acronyms and abbreviations, making them highly ambiguous.”

Up to a third of clinical abbreviations in the Unified Medical Language System (UMLS) Metathesaurus have multiple meanings, and more than half of terms, acronyms, or abbreviations typically used in clinical notes are puzzlingly ambiguous, Townsend added.

“For example, ‘discharge’ can signify either bodily excretion or release from a hospital; ‘cold’ can refer to a disease, a temperature sensation, or an environmental condition,” she explained. “Similarly, the abbreviation ‘MD’ can be interpreted as the credential for ‘Doctor of Medicine’ or as an abbreviation for ‘mental disorder.’”

While the human brain can usually decipher these types of differences by relying on the context of the surrounding words for clues, NLP technology still has a long way to go before it can reach the same reliability threshold as the typical flesh-and-blood reader.

“Clinical notes make heavy use of acronyms and abbreviations, making them highly ambiguous.”

In addition to the questionable validity of certain results, EHR developers are having a hard time figuring out how to display clinical decision support data within the workflow.  Inconsequential CDS alerts are already the bane of the majority of physicians, and there is no industry standard for how to create a support tool that will deliver pertinent, meaningful information without disrupting the patient-provider relationship.

Using NLP to fill in the gaps of structured data on the back end is also a challenge.  Poor standardization of data elements, insufficient data governance policies, and infinite variation in the design and programming of electronic health records have left NLP experts with a big job to do.

Four EHR Optimization Steps for Healthcare Data Integrity

The Role of Healthcare Data Governance in Big Data Analytics

Where will natural language processing take the healthcare industry in the future?

Even though natural language processing is not entirely up to snuff just yet, the healthcare industry is willing to put in the work to get there.  Cognitive computing and semantic big data analytics projects, both of which typically rely on NLP for their development, are seeing major investments from some recognizable names.

Financial analysts are bullish on the opportunities for NLP and its associated technologies over the next few years.  Allied Market Research predicts that the cognitive computing market will be worth $13.7 billion across multiple industries by 2020, representing a 33.1 percent compound annual growth rate (CAGR) over current levels.

In 2014, natural language processing accounted for 40 percent of the total market revenue, and will continue to be a major opportunity within the field.  Healthcare is already the biggest user of these technologies, and will continue to snap up NLP tools through the rest of the decade.

The same firm also projects $6.5 billion in spending on text analytics by the year 2020.  Predictive analytics drawn from unstructured data will be a significant area of growth.  Potential applications include consumer behavior modeling, disease tracking, and financial forecasting.

MarketsandMarkets is similarly optimistic about the global NLP spend.  The company predicts that natural language processing will be worth $16.07 billion by 2021 all on its own, and also names healthcare as a key vertical.

Eventually, natural language processing tools may be able to bridge the gap between the unfathomable amount of data generated on a daily basis and the limited cognitive capacity of the human mind.  From the most cutting-edge precision medicine applications to the simple task of coding a claim for billing and reimbursement, NLP has nearly limitless potential to turn electronic health records from burden to boon.

The key to its success will be to develop algorithms that are accurate, intelligent, and healthcare-specific – and to create the user interfaces that can display clinical decision support data without turning users’ stomachs.  If the industry meets these dual goals of extraction and presentation, there is no telling what big data doors could be open in the future.

Dig Deeper on Artificial intelligence in healthcare

xtelligent Health IT and EHR
Close