Contact us and see what NetOwl can do for you!
How to Choose a Fuzzy Name Matching Product
Homeland Security, Name Matching, Record Management, Risk Management
Fuzzy Name Matching is Critical for Many Applications across Industries
Fuzzy Name Matching is a technology that matches names against a potentially large number of variants due to causes like spelling errors, nicknames, transliteration differences, reordering, missing components, abbreviations, and so on.
There are many critical applications across different industry sectors that call for fuzzy name matching. For instance:
-
- Financial Sector
- Anti-money laundering (AML)
- Politically Exposed Persons (PEP)
- Know Your Customer (KYC)
- Verification of Payee (VOP)
- Fraud detection (e.g., credit cards)
- Homeland Security
- Border security
- Visa application screening
- Law enforcement
- Healthcare
- Patient record matching
- Fraud detection
- Retail
- Customer Data Management
- Customer Stitching
- Background Screening
- Pre-employment background checks
- Volunteer screening
- Other:
- Genealogy research
- Social media user identity verification
- Financial Sector
Applications of fuzzy name matching typically fall into one of two buckets:
- Searching a name against a set of records. Sometimes those records are “bad guys” such as terrorists or other sanctioned individuals for AML purposes. Sometimes those records are “good guys” such as customer records that need to be verified or deduplicated.
- Comparing two names to determine if they represent the same name. For instance, in payment transfers where money is sent from one person to another, the sender or payer supplies the payee’s name along with the payee’s bank account information. For verification purposes, the supplied payee name must match the name on the receiving account.
Criteria for Choosing a Fuzzy Name Matching Product
Matching names against a database of names or comparing two names involves meeting several challenges. As we discussed in our blog on fuzzy name matching, there are many different factors that can cause variations in name spellings, from simple misspellings to complex effects arising from names that are originally written in foreign languages. In the following we provide a guide to what you should look for in a fuzzy name matching product.
1. Does it handle the wide variety of name variant phenomena?
For example, these are some of the common reasons why names are written differently:
-
- Misspellings: James Williams ‒ James Wiliams
- Names with the same sound: Debbie ‒ Debby; Allen – Allan – Alan
- Nicknames: Theodore – Theo – Ted – Teddy; Abigail – Abby – Abbie
- Initials: Samuel Allen Casey – S. A. Casey
- Name order variants: Yamashita Hisao (Surname + First Name) – Hisao Yamashita (First Name + Surname)
- The usual order for Asian names is Surname + Given Name. However, Western sources almost always use the same order for Chinese and Korean, but usually use the Western order for Japanese names.
- Missing name elements: Richard Louis Cushing ‒ Richard Cushing
- Company abbreviations: International Business Machines ‒ IBM; Smith, Jones & Harrigan LLP – SJH
- Differences in Record Field Entries: for instance, Spanish names often have two last names (patronymic + metronymic). It’s not uncommon for the patronymic to be mistaken for a middle name.
[FirstName: José] [FirstName: José]
[MiddleName: García]
[LastName: Fernández] [LastName: García Fernández]
-
- Etc.
2. Can it handle the wide variety of name ethnicity specific phenomena from around the world?
For example:
-
- Since Arabic is written in a non-Latin script and there is no universal standard for transliterating it into Latin script, a single Arabic name can have different equivalences in English. These names are all written the same way in Arabic, but they can vary when brought into English:
Abd al-Rahman vs. Abdul Rahman vs. Abdarrahman
-
- Transliteration Standard differences, as in Chinese. The first transliteration below uses the Pinyin standard while the second one uses Wade-Giles. These are the two major ways of transliterating Chinese characters into Latin script. As you can see, they’re not very similar.
Xi Jinping vs. Hsi Chin-p’ing
-
- Spanish last names with both patronymics and matronymics are often shortened by dropping the maternal last name:
Juan Jiménez Gutiérrez vs. Juan Jiménez
3. Can it handle fuzzy name matching of entity types you need?
You want to check if a fuzzy name matching tool handles the entity types that you need. Here are the common entity types that many applications need to fuzzy match:
-
- Person
- Organization
- Place
- Address
- Vehicle
- Email Address
- Phone number
- Date
4. Can it provide a matching score?
A matching score (e.g., 0-1 with 1 representing an exact match) not only quantifies the similarity between matches but also provides the ability to set cut-off thresholds.
-
- Some use cases prefer the fuzzy name matching to return as few false positives as possible, so being able to set the matching threshold for higher precision is important.
- Other use cases require a less stringent threshold because you want to see more potential matches so as not to miss a match (i.e., there should be few false negatives). In this case, you want to be able to set a lower matching threshold.
- Other applications may call for setting two matching thresholds: a higher one for matches that are so high that they don’t require a human review (e.g., .9 and higher), a lower one for matches that are so low that also don’t require a human review (e.g., lower than .7), and a range of matches in between (e.g., .7-.9) for a human to review.
5. Can it handle fuzzy name matching of records with multiple fields?
Does your use case require that you match just names or do you need to match records with additional fields? For example, you may want to match not only a person’s name but also the person’s date of birth, place of birth, nationality, address, etc. If so, you want a product that takes matching results of all the relevant fields into consideration in an intelligent way.
Ideally, the product should generate a matching score for each field individually and then produce a combined score for all fields. In addition, your fuzzy matching product should allow you to set field weights so that you can assign more weight to your more important fields (e.g., name) and a lower weight to others (e.g., address).
6. How accurate is the fuzzy name matching?
Does it provide low rates of both false positives and false negatives? High accuracy is critical because a high false positive rate would be overwhelming and cost time and money for reviewing the matching results while a high false negative rate would lead to missed matches that could result in dire consequences in cases like national security or law enforcement.
Incidentally, false negatives are harder to assess unless you have an answer key that captures all the possible matches in your target name database for the names you’re searching.
7. How fast and scalable is the fuzzy name matching?
In many use cases, such as matching air travelers against a terrorist watch list or screening financial transactions for AML, the fuzzy name matching has to be real-time and scalable at peak usage times. In other cases, such as that of a marketing firm working to update a customer database, the matching can happen at a slower pace. Depending on your use case, your fuzzy name matching product must meet your speed and scalability requirements.
8. Is it customizable?
Your application domain may require some specific customizations beyond what the out-of-the-box fuzzy name matching product offers. For example:
-
- You may need to handle unconventional name aliases, nicknames, or abbreviations.
- You may want to specify certain field values that should be ignored for matching purposes.
- You may want to change the weights of fields to alter matching behaviors.
- You may want to adjust the importance of certain parts of a name for matching to address your specific use case.
If so, you need a fuzzy name matching product that allows you to customize in the ways that you require.
9. Does it support the languages you need?
In some use cases, there’s a requirement to match names in foreign scripts (Arabic, Chinese, Cyrillic, etc.) against names in Latin/English script, or vice versa. For example, a bank in the Middle East might have a database with names in Arabic script that have to be matched against watch lists in English such as OFAC.
Summary
Selecting a fuzzy name matching product requires consideration of a number of factors as indicated above. You need to weigh them carefully and make the necessary trade-offs to get the functionality you desire.
