Contact us and see what NetOwl can do for you!
The Complexities of Building Your Own Name Matching System
Home-Grown Name Matching Often Doesn’t Hit the Mark
The NetOwl team is often approached by customers that have a home-grown Name Matching solution but realize that it is not meeting their needs. They typically report two types of issues:
- One is accuracy: the name phenomena that have to be handled are much more complex than they had anticipated. Their home-grown solution just isn’t accurate enough. It misses correct matches (false negatives) and/or returns too many bad matches (false positives).
- Another issue is the solution’s scalability: home-grown solutions often don’t scale up when the matching volumes increase and/or a real-time response is required.
In this blog we’ll discuss the complexities and pitfalls of building and using a home-grown Name Matching solution.
What Are the Use Cases for Name Matching?
Name Matching typically comes into play where person and organization names need to be matched against lists of bad actors for AML/KYC compliance, risk management, fraud detection, or border security reasons among others.
Another use case is to match against internal databases, such as to determine whether two different customer records are in fact for the same person. The main purpose of this use case is to avoid record duplication or to consolidate records from multiple databases.
Why is Name Matching Challenging?
Name Matching needs to handle the characteristics of name variants. Some of these are simple. Some are quite complex. The very basic include:
- Simple misspellings: Dik Simpson vs. Dick Simpson
- Variations in word order: John Dickerson vs. Dickerson, John
- Nicknames: Joseph Thompson vs. Joe Thompson
- Missing components: Mary T. Johnson vs. Mary Johnson
- Initials: John Jackson vs. J. Jackson
More complex phenomena include those that are:
For instance, Spanish names can include the surname of both the mother and father: Santiago Ramos-Guzman, where Ramos is the patronymic and Guzman the matronymic. The second surname is frequently dropped: Santiago Ramos.
Arabic names frequently have the definite article attached to some elements of the name. It is also frequently dropped: Hamid al-Sistani vs. Hamid Sistani.
For more information and examples of such ethnicity-related complications in name matching, see one of our other blogs.
- Cross-language matching
Some of our customers even have requirements for matching names written in different scripts. An example of this is:
Hebrew: יחזקאל אלון vs. Ezekiel Alon.
In Hebrew the order of writing is right-to-left as opposed to the Latin alphabet’s left-to-right. Also like other Semitic languages such as Arabic, not all the vowels are written. In our example the name in Hebrew, when transliterated letter by letter into English, reads yhzqal alon (there are no capital letters in Hebrew).
For more examples of matching different scripts, see our other blog.
Imagine when a difference in writing systems is combined with the variations caused by things like simple misspellings and ethnicity-specific phenomena!
Why Home-Grown Solutions Don’t Work for Name Matching
If your organization has a requirement for one of the use cases above, building your own in-house solution for Name Matching may be an attractive option because it seems easy to implement using the “traditional” algorithms and also seems to be “cheaper.”
However, as illustrated by the examples above, the challenges of Name Matching require a team that has long and deep experience in the area. Here are some specific pitfalls:
- Lack of expertise in Name Matching
The wide variety of ethnic names in a country like the U.S. or many other countries requires that your team have a good understanding of their characteristics.
Simple “traditional” algorithms such as edit distance and Metaphone algorithms along with a list of nicknames alone cannot address the issues of too many false negatives or false positives. Your team needs someone who could bring in and apply more advanced AI algorithms that work.
In case scalability is important, your team also needs someone who has experience in transitioning the software to commercial-quality production standards.
- Measuring accuracy objectively
A good vendor of a Name Matching products collects and maintains very large sets of training and test data of name variants for its products. The product is constantly run against the test data to ensure that accuracy (a measure of false positives and false negatives) is increasing and not declining. If you do intend to implement a home-grown solution, your team would need to collect such data and devise a way to quantitatively measure accuracy regularly. How will a non-specialist organization handle this?
- Total cost of ownership
Contrary to popular belief, a home-grown solution is not free if your employees are implementing it while they could be working on other tasks that could use their expertise. Building a name matcher requires a very serious time commitment. It’s not a matter of a few months.
The upkeep of the system rarely ever ends, either – false negatives, false positives, scalability issues, stability issues, security issues, etc. etc.
Key developers in building the in-house capability may leave; it may become impossible to maintain the capability at that point.
Organizations need to assess the true cost when considering building its own name matching solution.
Because it is an advanced technology relying on sophisticated algorithms, Name Matching is definitely not a commodity technology. The wise buyer, if they are not a specialist, would do well to acquire a well-designed, highly accurate, and robust product in the space.