How Cross-Language Name Matching is Crucial for Customer Verification and Vetting

Name Matching, Risk Management

Cross-language name matching is crucial for customer verification and vetting

Financial Firms Need to Verify and Vet Customer Identities

Banks and other financial institutions operating internationally are often faced with the challenge of verifying and vetting their customers’ identities and names across different language alphabets and scripts.

For instance:

  • A bank onboarding a new customer (or performing a periodic check on existing customers), whether an individual or an organization, must check the name on the customer-supplied identity documentation against data records and sanction lists for Know Your Customer (KYC) and compliance with anti-money laundering (AML) regulations. The customer-supplied documentation, such as a driver’s license or national ID card, may be in a non-Latin script (e.g., Cyrillic or Chinese) and needs to be matched against Latin-script data records and sanctions lists.
  • A bank in an Arabic-speaking country needs to compare the recipient name in an incoming SWIFT transfer, which is in Latin script, against the name on their customer account, where the name is in Arabic script.

Here are some examples of the types of names that financial institutions would need to compare:

Names in Non-Latin Scripts Names In Latin Script
احمد حسين محفوظ Ahmed Hussein Mahfuz
Джон Хендерсон John Henderson
メアリー・ルイーズ・フレミング Mary Louise Fleming
Φρέντερικ Χίλτον Frederick Hilton
穆萨·巴赫里 Musa al-Bahri

 

The process of checking names across different language scripts is called Cross-Language Name Matching.

Why Cross-Language Name Matching Is Hard

A challenge for any screening process is high levels of variation in the way names are spelled. For a discussion of why this is true when matching names in Latin script against names also in Latin script, see here.

In this blog we wish to focus on the particular challenges of Cross-Language Name Matching, which refers to the matching of names written in different writing systems. It is the challenge of matching, e.g., جو بايدن to its English equivalent Joe Biden.

It’s a mistake to think that Cross-Language Name Matching is a simple, straightforward task. It’s not a direct one-to-one mapping between different sets of characters.

This is true for many reasons, a few of which are:

  • Some languages such as Arabic do not write short vowels for the most part. For example, the name Muhammad is written as محمد. If we transcribe it into Latin script, it would be “mhmd,” and محمد is commonly transliterated in many different ways: Muhammad,  Muhammed, Muhamad, Mohammad, Mahammad, Mahammed, Mohamad, Mohamed,
  • Languages differ in the sounds they have. For example, both English and Arabic have a letter for b, but only English has one for p. So what happens when you have to translate a name in English containing the letter p into Arabic? Well, Arabic typically will use a letter with a sound that is phonetically close to p. For example, the English first name Paul is typically translated into Arabic as بول . When the Arabic is spelled out letter by letter in Latin script, this is bwl.
  • Another example of one language having a sound that another does not is from Russian, where English names containing an h, a sound Russian doesn’t have, are usually transliterated with a phonetically similar sound: John Houndsmith -> Джон Хаундсмит. The latter’s transliteration back into Latin characters is Jon Khaundsmit. Somewhat surprisingly, Russian can also represent h by an entirely different letter, which is usually the equivalent of English g (“hard g”): Harvard -> Гарвард, which is pronounced something like “Garvard.” To English-speaking ears, g is a lot different from h, but that’s how Russian transcribes it.
  • Another phenomenon is when a name in one language is spelled as one word, and the English version breaks it up into more than one. For example, the Korean personal name 윤석열 contains no spaces whereas the English transliteration does: Yoon Suk Yeol. English can also put a hyphen between the second and third elements of the name: Yoon Suk-Yeol. English can also run the last two elements together: Yoon Sukyeol.
  • Asian personal names usually put the family name first. In the Korean example above, Yoon is the family name. English usually puts the family name last: Suk Yeol Yoon.
  • A language like Chinese often has many characters with the same or similar pronunciation. Thus, a non-Chinese name may end up being written in multiple ways in Chinese. For example, the name Albert can be written in more than one way in Chinese such as 亚尔培特, 阿尔伯特, and 阿尔贝尔.

For more examples of the difficulties of cross-language name matching, see here.

A Machine-Learning-Based, AI Approach to Cross-Language Name Matching

Cross-Language Name Matching uses an advanced machine learning algorithm that is trained using large-scale, real-world name variant data including large numbers of names in their original scripts.

Advanced Cross-Language Name Matching performs the matching directly, with no need to translate non-Latin scripts into Latin prior to matching. This avoids the problem of mistakes (or noise, to use a more technical term) introduced by the translation process. Matching directly between the names in different scripts achieves higher accuracy.

This approach automatically learns a collection of intelligent, probabilistic name matching rules from the data. Since the rules are learned from real data, they are not bound by limitation of humans’ knowledge of what are possible name translations. They reflect countless name variants that occur in the real world.

Using this approach, Cross-Language Name Matching allows banks and other financial organizations to effectively match customer data across different scripts whenever a business process calls for it.