Definition

What is an enterprise master patient index (EMPI)?

By

Rahul Awati
Alex DelVecchio, Content Development Strategist
Tayla Holman, Site Editor

Published: Aug 08, 2025

An enterprise master patient index (EMPI) is a database that is used to maintain consistent and accurate information about each patient registered by a healthcare organization across its various departments. It may link several smaller master patient indexes (MPIs) together, such as those from outpatient clinics and rehabilitation facilities, and it can aggregate patient data contained in separate systems within one facility.

EMPI benefits

The key purpose of an EMPI is to ensure that every patient is represented only once and with constant demographic identification within all a hospital's systems containing patient data, including accurate and up-to-date information about a patient's clinical history, as well as data about their social determinants of health. Keeping this data well organized and easily retrievable by authorized users, such as physicians, enables them to optimize patient care and, ultimately, improve health outcomes.

Another benefit of an EMPI is that it enables healthcare organizations to accurately match and link patient data across various systems, including electronic medical records (EMRs), public health databases and payer claims systems -- insurance companies, government programs, etc.

The holistic and unified view of patients provided by an EMPI enables hospitals to avoid errors and mistakes that can negatively affect patient health and safety. For example, they can avoid identification errors that may result in them giving an organ transplant to the wrong patient or in dispensing the wrong medication to a patient. By providing a single source of truth for patient data, an EMPI enables providers to avoid creating duplicate patient records or mixing up different patients' records. These duplicates and mix-ups may result in irreversible and life-threatening decisions.

A comprehensive and updated EMPI can also strengthen integration between various healthcare delivery systems, such as electronic health record (EHR) systems, patient portals, remote monitoring systems and radiology information systems. These integrations enable providers to better coordinate patient care and enhance patient safety. Integrated healthcare systems also reduce duplicate and outdated records, which can help to boost operational efficiency and reduce healthcare costs. An EMPI also ensures that the integrations are aligned with industry-specific standards, like Health Level Seven (HL7), Digital Imaging and Communications in Medicine, and Integrating the Healthcare Enterprise to ensure secure and seamless information interchange between integrated systems.

Graphic of an EMPI. — An EMPI creates a single source of patient info across diverse hospital systems and often external systems.

How does an EMPI work?

An EMPI receives patient data from various sources, like EHRs, patient registration systems and sometimes external databases. Typically, the following demographic information is captured in an EMPI for each patient: name, address, date of birth, phone number, sex/gender, medical record number, Social Security number, and insurance company or healthcare provider.

The healthcare organization can choose to include other patient information, such as religion, ethnicity/race, employer and next of kin.

An EMPI uses algorithms to look for duplicate records in the organization's patient registration system. The algorithms scan for data elements within the patient's demographic information.

The algorithms compare and link the records from different sources and identify potential matches. Ultimately, this technology determines whether records belong to the same patient or if more research is needed.

Algorithms for identifying duplicates

An EMPI uses two types of algorithms to match patient records: deterministic and probabilistic.

Deterministic matching

Also called exact match logic, deterministic matching looks for an exact match of the data elements in a patient record. For example, two records are considered to match only if they agree on elements like the patient's first and last name and phone number. Elements that do not match exactly, such as a full name in one record versus a nickname or maiden name in another, are rejected by the system.

Probabilistic matching

Probabilistic matching looks for an approximate match rather than an exact match. These algorithms assign a weight to different data elements based on a preset acceptable level of certainty. The weights of the different elements are added to calculate the final match score. This score indicates the likelihood that two or more records belong to the same patient. The higher the final score, the higher the probability that there is a match between two records, even if there are slight variations in some elements.

Issues with deterministic and probabilistic matching algorithms

There are drawbacks to both types of algorithms.

Deterministic matching can be problematic due to a potentially high rejection rate resulting from the system failing to find exact matches.

In practice, patient data is rarely captured properly and consistently -- nicknames may be used in one record, another may have spelling errors in first names, different records may show a different address for the same patient and so on.

A drawback of probabilistic matching is that the probabilistic algorithm automatically categorizes the record into one of three groups: Yes, No, Maybe. The Maybe records need to be manually reviewed by a human to determine matching accuracy. This can involve a fair amount of work, such as additional research, analysis, etc.

Another issue with probabilistic matching is that it can result in false positives and false negatives. These results can also lead to wasted resources -- time and human effort -- and cause significant disruptions in healthcare workflows.

One way to overcome these problems is to use referential matching. With referential matching, EMPI systems compare and cross-reference patient records against both internal and external demographic databases -- public or semipublic. This can improve matching accuracy. That said, the accuracy may be limited if those external databases lack the patient information needed to perform records matching. For example, a public database may not have a lot of information about a homeless patient, so in this case, referential matching may not yield accurate matches.

Most EMPIs use blocking rules to enhance the speed and accuracy of records matching. These rules specify certain criteria -- for example, matching on demographics like sex or age. Records that don't meet these criteria are filtered out. Blocking rules enable an EMPI to identify the most likely matches in the fastest possible time.

Once the required records for the individual are matched, an EMPI links them and assigns a unique identifier to prevent duplicate records or patient mix-ups.

The identifier also enables the system to monitor all records associated with and available for that patient across various systems and applications. In doing so, it can track the data changing or entering these systems and ensure that the master file of the patient's information remains current.

Calculating the EMPI error rate

The EMPI error rate is a percentage value that reflects how many times an EMPI incorrectly identifies or links patient records.

To calculate this value, the total number of duplicate patient records in the database is divided by the total number of records and is multiplied by 100.

This calculation is important because it enables healthcare organizations to assess the accuracy of their EMPI system and analyze the quality and effectiveness of their patient-matching process. They can also identify areas of improvement, look for error root causes and implement appropriate interventions to improve data quality and patient identification.

EMPI errors may be false positives or false negatives. Accordingly, two values can be calculated:

False Positive Match Rate. The percentage of incorrectly matched candidate pairs over a certain time period is calculated by dividing the false positive matched pairs by the total number of records and multiplying the result by 100.
False Negative/Non-Match Rate. The percentage of candidate pairs that should have been matched by the system but were not is calculated by dividing the false negative nonmatched pairs by the total number of records and multiplying the result by 100.

The average duplicate record rate is approximately 10%, according to the American Health Information Management Association.

EMPI and other healthcare systems

In the U.S., EHRs generally contain or are integrated with an EMPI system. Lab systems, radiology information systems and computerized physician order entry systems that generate unique patient identifiers can also be connected to an EMPI.

EMPIs are databases that store data in a standardized format. For this reason, they can facilitate health information exchange (HIE) between healthcare facilities. Specifically, EMPIs ensure that each patient has a unique identifier within all the healthcare systems of a healthcare organization. This identifier enables care providers to identify the patient and access all relevant information about that patient.

HIE, on the other hand, is a network that enables healthcare systems and organizations to electronically exchange patient health information. Such exchanges enable providers and organizations to access a more complete picture of a patient's health and medical history to coordinate care activities and optimize patient safety and outcomes.

HIE is also essential in accountable care organization settings, as they reduce the amount of work ACOs must do to make sense of and report disparate data collected from the various healthcare organization participants. An EMPI enables health data analytics, making it a key component for HIE in ACOs and also in population health management projects.

Master patient index vs. enterprise master patient index

Like an EMPI, a master patient index is also a database containing patient information. EMPIs can be embedded into systems, including EMR systems. However, an MPI can only match records within a single application.

The chief function of MPIs is to resolve patient identities in an application when those identities are created or stored within that application. Also, most MPIs are limited in their ability to compare records from sources outside the organization, such as public databases. Finally, MPIs often use unsophisticated patient matching algorithms, so matching accuracy can be low.

In contrast, an EMPI matches and links records across multiple diverse applications and platforms. These records are automatically linked using predefined workflows and thresholds. This not only ensures superior matching accuracy, but also provides a more complete view of a person's demographics and clinical history.

EMPIs may use different types of algorithms to link records and find matches. Moreover, they often include data stewardship capabilities, which help to maintain the integrity of patient records and minimize the need for manual remediation of missing data and data discrepancies. Furthermore, EMPIs flag potential errors and send notifications that users can use to ensure that the record is free from errors and duplicates.

Deploying an EMPI

An EMPI is commonly deployed in either an active or passive mode.

An active EMPI deployment means the application is used on the front end during patient registration and scheduling. A passive EMPI deployment is used on the back end, with identification occurring after the registration process.

Key EMPI vendors and features

Many vendors develop EMPI solutions for healthcare organizations. These include the following:

Rhapsody.
4medica.
Verato.
InterSystems.
HealthViewX.
Surety Systems.

Most of these vendors provide cloud-based software-as-a-service EMPI solutions for fast deployment and ease of use.

Most reliable EMPI products include the following features:

Single source of truth for identities across patients.
Advanced and intelligent algorithms to improve patient matching accuracy.
Patient profiling and records standardization to ensure the accuracy of demographic data and minimize identification errors.
Built-in machine learning to recognize patterns from data and produce improved output.
Data analysis capabilities to identify duplicate records and maintain the integrity of the master patient index.
Seamless and secure data exchange with application programming interfaces, Fast Healthcare Interoperability Resources and HL7 to ease patient data management and compliance.
Secure data migration process to further improve matching accuracy and reduce the risk of medical errors.
Seamless integration with existing healthcare IT systems to ensure interoperability and scalability.
Extensive security features, including data encryption -- data at rest, data in flight -- and continuous threat monitoring to ensure data protection and integrity.

Effect of electronic health records on EMPI

An EMPI is often created by and accessed from EHR systems. However, differences in vendors' EHRs often cause irregularities between EMPIs and reduce the possibility of a clean HIE. Therefore, maintaining consistency across different EHR systems and EMPIs is essential for reducing medical errors and improving patient care.

Some U.S. health IT leaders believe that a government- or industry-driven national patient identification system would solve this problem and that it could eventually lead to a national EMPI. Patient data is typically only kept in the EMPIs of hospitals at which they are registered, as well as in the EHRs of the ambulatory providers and specialists they see.

In contrast, the Centers for Medicare & Medicaid Services issue all licensed providers a unique, 10-digit National Provider Identifier.

Data standards are crucial for enabling information exchange between providers and facilitating billing for specific procedures. Learn about top healthcare data and coding standards.

Continue Reading About What is an enterprise master patient index (EMPI)?

The Role of MPI Tools in Health Data Interoperability, Patient Matching

How An HIE Approaches Continuous Patient Data Quality Improvement

How Do Patient Portals and Personal Health Records Differ?

How a Patient Record 'Snapshot' is Driving HIE Data Access

Dig Deeper on Health IT infrastructure

xtelligent Healthtech Analytics

Cigna expands AI capabilities to intervene earlier in chronic care
Cigna announced plans to expand its care management programs using AI and predictive analytics to identify chronic conditions ...
Optum, Anthropic team up on AI, but impact on healthcare is still unclear
Optum's partnership with AI giant Anthropic signals enterprise AI's momentum in healthcare, but sparse details leave the ...
AWS' Allyson Fryhoff on how health systems are using AI to analyze complex data
In a Q&A, AWS' managing director of healthcare and life sciences, Allyson Fryhoff, discussed how providers are using agentic AI ...

xtelligent Healthcare Payers

CMS moves to codify limits on Medicaid provider taxes
The rule is meant to prevent states from guaranteeing that providers will be refunded for their tax costs, and should save ...
Trump administration pauses $1B in Medicaid payments to California, Minnesota
Federal regulators are pausing more Medicaid funds to the states, citing suspicious payments and concerns about fraud.
Aetna: Docs trust payers more even as administrative burden persists
Although providers are trusting payers more, there's an opportunity for health IT to ameliorate administrative burdens, including...

xtelligent Healthtech Security

Hackers steal customer data from major hospital software vendor
The breach is another reminder of how vulnerable the healthcare industry is to supply-chain attacks.
Clover Health hit with data breach
The insurer, which disclosed the breach in a securities filing late last week, doesn’t yet know what type of data was exposed or ...
23andMe settles multistate data breach lawsuit for $18M
A coalition of 43 attorneys general resolved allegations against 23andMe stemming from a 2023 data breach that impacted 7 million...

Close