Identity Resolution Facilitates the Creation of Electronic Health Records

Identity Resolution, Record Management

Identity Resolution facilitates creation of Electronic Health Records

Electronic Health Records Promote Better Health Outcomes

It’s pretty clear that universal Electronic Health Records (EHRs), in which every citizen of a country has a  health record linked to a national system, promote better health outcomes. They also promote higher levels of trust in the medical system. Medical data would also yield greater value for technology like Artificial Intelligence if the data was as comprehensive as it would be with a national EHR.

Since the passage of the Affordable Care Act in 2010 in the US, there has been rapid progress towards the introduction of Electronic Health Records (EHRs). Although few people in the US think that national EHRs are a near- or even medium-term prospect, it was hoped that EHRs would contribute to more coordinated care among the various health care providers in a limited geographical area. This would reduce duplication of medical tests, avoid errors due to one provider not having the information that another one has, and overall allow for the smooth and easy exchange of critical information.

Even though EHRs have proven to be an improvement in the US, doctors and hospitals still have difficulties in sharing data. Patients go to different providers, and it is still not easy to link records across different IT health systems. Converting one system’s data to be compatible with another’s is a hard problem.

Identity Resolution Can Improve Medical Data Sharing

Fortunately, there is a technology that will support the linking of data records and so facilitate the development of accurate and complete EHRs: Identity Resolution (also known as Entity Resolution).

Identity Resolution identifies the variations that occur in data elements across different records. Record fields typically consist of:

  • First and last names (on occasion a middle name or initial)
  • Date of Birth
  • Home address
  • Home and Mobile phone numbers
  • Email address
  • Insurance ID
  • etc.

Linking Patient Records Can Be Tricky

Each field of a patient record offers its own challenges:

  • Names vary: “Jim Baker” vs. “J. Baker” vs. “James R. Baker.” Phenomena like nicknames, initials, and simple typos may be common in the data.
  • Dates need to be handled in accordance with their characteristics, e.g., the ordering of the pieces may vary:
    • “October 9, 2017” vs. “9 October, 2017“
    • “10/01/2017” vs. 01/10/2017 (U.S. vs. European)
    • The nature of what’s considered a close match may vary, e.g., “August 4, 1938” is a pretty close match to “August 3, 1938,” but “January 3, 1961” is also a close match to “January 3, 1971.” The apparent 10 year gap in the latter could be caused by a simple fat-fingering typo. The matching has to take these kinds of phenomena into account.
  • Addresses are quite complex, e.g.,
    • 7735 8th Street, Columbia, NY 01923 vs. 7735 Eighth St. Columbia, New York 01923-3494.

There are four differences here that need to be handled (including the “short” form zip code versus the “long”).

In order to establish that two records refer to the same individual, it’s necessary to first match each of the above elements and provide a score for how close the two fields are.

In addition, it is necessary that there be a way to take similarity scores of each field, combine them according to business rules into a single score, and use that score to determine if two records belong to the same individual or not.

Here are some examples of patient records that show typical variations in the data:

Name DoB Address
James Baker 10/09/71 45 Maple St., Brentwood, VA
Baker, Jim Oct. 9, 1971 45 Maple Street, Brentwood, Virginia 22093


Name DoB Address
Margaret L. Jones 11/3/1990 6 Park Lane, Hialeya, ME
Maggie Jones November 3, 1990 6 Park Ln, Hialeya, Maine 01923


Name DoB Address
Rashid Abdurrahman 3 March, 1995 4 Emory Court, Louisville, MN
Rachid ‘Abd al-Rahman Mar 3, 1995 Four Emory Ct., Louisville, Minnesota


Name DoB Address
Jose A. Benitez Artola 3/4/1979 2134 Raspberry Dr., Olney, MD
Pepe Benítez 3 April 1979 2134 Rasberry Drive, Olney, Maryland


How Identity Resolution Offers Highly Accurate Record Linking

In Identity Resolution, any pair of patient records are first compared with AI-based highly accurate Fuzzy Name Matching, which handles a wide spectrum of variations in person names, addresses, dates, phone numbers, etc. Then all records are clustered (or linked) according to their similarities calculated by Fuzzy Name Matching using a very efficient clustering algorithm that can handle a massive amount of records.  Each resulting cluster represents a real individual in the world and is assigned a persistent ID. As new records become available and are added into the system, Identity Resolution determines whether they belong to the existing clusters or they are new patients, in which case new clusters with new IDs are assigned.  The clustering algorithm assigns a score to each cluster, which indicates how closely the records in that cluster match each other and thus allows users to make tradeoffs between recall and precision based on their particular use cases.

In sum, it may be a while before the US gets to universal EHRs, but it can derive great improvements from the use of Identity Resolution to enhance the sharing of medical information.