The Challenges of Arabic Name Matching for AML

Intelligence Analysis, Name Matching, Record Management, Risk Management

Money laundering is a major problem for financial institutions such as banks and investment companies. Terrorists and organized crime figures, as well as sanctioned governments, frequently attempt to use financial institutions to conceal the origins of funds resulting from illegal activity. To counter this, financial institutions are obliged to conduct due diligence procedures to ensure they are not facilitating money-laundering activities. They must comply with rules and regulations under the rubric of anti-money laundering (AML) and Know Your Customer (KYC).

An important aspect of this due diligence that banks and other financial institutions have to perform is screening potential and current customers against certain sanctions lists, such as those provided by the US Treasury Department’s Office of Foreign Asset Control (OFAC) or the European Union (EU). There are both individuals and companies on those lists, and they are what you might expect: terrorists, international drug traffickers, anyone engaged in the proliferation of weapons of mass destruction, plus rogue countries and regimes.

Name Matching is Hard, Particularly for Arabic names

A challenge for this screening process is high levels of variation in the way names are rendered. Aside from expected problems in matching names such as simple spelling errors, names that sound the same but are spelled differently, e.g., Sean vs. Shaun vs. Shawn, and many others (see this name matching page for examples), a special challenge is posed for names from different languages, ethnicities, and scripts. In particular, Arabic names are especially difficult because of the following issues:

  • Arabic names require transliteration since Arabic, of course, has its own script which is very different from the Latin-based one that most Western languages use. (Transliteration is the conversion of names written in one script into another.)
  • Arabic has sounds that are very different from any English ones, and there is no one, agreed-upon standard on how to transliterate them into Latin script:
    • The Arabic letter ق is a special kind of “k”-sound. It’s like English “k,” but it’s pronounced in the back of the throat. It is typically transliterated in English as either “k” or “q.”
    • Another sound in Arabic that has no counterpart in English is called ‘ayin (the letter representing it is ع). It’s one of the famous Arabic “guttural” sounds (sounds produced in the throat). In English, it is typically marked with an apostrophe or even left out altogether, thus creating spelling variants.
  • Arabic sometimes pronounces names differently from the way they are written in Arabic, e.g., a name like “al-Din” is always written in Arabic with its letter for “l” (“al” is the definite article, by the way), but in reality it is pronounced like the following letter, i.e., “ad-Din.” You see two ways of transliterating this in Latin, either “al-Din” or “ad-Din.” (Not to mention the hyphen, which can be present or absent.)
  • There are dialectal differences in how Arabic pronounces its letters: ج is pronounced like “g” in “good” in Egypt but like “j” in “jungle” elsewhere in the Middle East. You have therefore a choice in English of spelling the name “Gamel” or “Jamel.”
  • One Arabic name can also exhibit different segmentations: “Abd al-Rahman” vs. “Abdul Rahman” vs. “Abdurrahman”. They’re all the same name.
  • Different languages have different rules for transliterating Arabic letters. English transliterates “ش” as “sh”; French transliterates it as “ch.” The problem is that the French transliterations will frequently show up in English language documents if, for example, the English is a translation of the French). You consequently see the variants “Bashir” vs. “Bachir.”

Going beyond these transliteration problems for matching names where both are in Latin characters, matching as part of an AML process offers even greater difficulties. Lists like OFAC are mostly in Latin characters, but a bank based in the Middle East may have a database that contains names written in Arabic script. And this creates a problem.

How Do You Match محمود الازرقي against Mahmud al-Azraqi?

One way of doing this is to automatically translate names written in Arabic into Latin characters and then match the translated names against the target databases in Latin characters. However, this approach is less than optimal in terms of matching accuracy as it introduces another layer of possible errors during name translation. We have found it more accurate to match directly between the names in two different scripts. Our approach is based on machine learning where NetOwl has learned the cross-script name matching rules. This machine learning approach uses a rich source of real-world and large-scale data containing names in Latin and Arabic scripts.  In this way, customers who have databases of names written in Arabic can conduct due diligence against various sanctions lists with high accuracy.