Name Matching for Screening at National Borders

Name matching for border security is a critical component of modern immigration control and law enforcement. As cross-border travel has become much more common, governments must balance efficiency in processing travelers with the need to identify potential threats.

At the center of this challenge lies the seemingly simple, but technically complex, task of determining whether a match to a person’s name can be found in a persona non grata database. While the concept appears straightforward, name matching involves linguistic, cultural, and technological challenges that make it a sophisticated and evolving field.

It’s also a big task: millions of travelers need to be screened at all entry points: airports, seaports, and land border crossings. One of the major requirements is to screen all passengers against terrorist watchlists, immigration violation records, criminal databases, government lists of other people barred from entering the country, and international alerts such as Interpol notices. Name matching needs to be fast and scalable.

Names Often Exhibit Variants

At its core, the goal of using name matching for border security is to flag individuals who may pose a risk while minimizing false positives that can delay or inconvenience legitimate travelers. One of the primary challenges is variation in spelling. Names can vary due to a number of reasons such as:

  • Simple misspellings:
    • Gillian vs. Gilian
  • Name variants that sound alike but are spelled differently:
    • Christy vs. Christie
  • Nicknames:
    • English: Richard vs. Dick vs. Rick vs. Dickie (the last perhaps marginal)
    • Russian: Ekaterina vs. Katya
    • Spanish: Dolores vs. Lola
  • Initials:
    • John Maxwell Smith vs. John M. Smith
  • Missing name elements:
    • Ali al-Bustani al-Walid vs. Ali Walid — Arabic names can have many components, such as the definite article al-, that are frequently omitted in English.
  • Transliteration variants. For instance, a language like Arabic is written in a script different from English. When transforming the name from Arabic letters to English ones, differences in spelling frequently arise because there is no universal, agreed-upon standard for transliterating:
    • Mahmood el-Barakat vs. Mahmud al-Barakat
    • Muhammad al-Haddad vs. Mohamed al-Haddad
  • Even when there are agreed-upon transliteration standards, there can be more than one, as in Chinese, which has two main standards, Pinyin and Wade-Giles. They are quite a bit different:
    • Chinese: Xi Jinping (Pinyin) vs. Hsi Chin-p’ing (Wade-Giles)
  • Use vs. non-use of diacritics: For instance, Spanish and many other European languages have diacritics in spelling. They are usually omitted in English:
    • Raúl Jiménez vs. Raul Jimenez.
  • Partitioning differences:
    • Abdelrahman Qureishi vs. عبد الرحمن قريشي — Even though Abdelrahman is a single given name, and spelled as such in English (Abdelrahman), it is spelled as two words in Arabic (عبد + الرحمن).

Names Follow Different Conventions Across Cultures

Another complication arises from cultural naming conventions. Different cultures follow different rules for structuring the whole name, including the following:

  • Order of given names and family names
    • In Asian cultures, the family name appears first (except in Japanese, which mostly employs the Western order). In Western cultures, the family name appears last.
  • Omission of certain surnames:
    • Spanish names typically have both a patronymic (paternal surname) and matronymic (maternal surname) as part of a legal name. In informal contexts, the matronymic, which follows the patronymic, is often dropped:  Raúl Espinosa Guzmán vs. Raúl Espinosa.

These differences can lead to confusion when names are recorded or compared across systems that assume a name format (e.g. First Name – Middle Name – Surname) that may not be appropriate for some cultures. Effective name matching systems must account for these variations.

In addition to unintentional discrepancies, border security agencies must also contend with deliberate attempts to evade detection. Individuals seeking to bypass watchlists may use aliases, fake documents, or slight variations of their real names. This tactic, sometimes referred to as “identity obfuscation,” exploits the limitations of name-based matching systems. Additional fields then may need to be taken into account, such as date of birth, address, etc.

Machine Learning to the Rescue

We discussed traditional approaches to name matching in our blog What Is Name Matching?. The latest approach that achieves very high accuracy in minimizing both false positives and false negatives uses state-of-the-art machine learningalgorithms. It also uses a very large collection of real-world name variant data that contains names taken from multiple ethnicities.

This approach automatically learns a very large collection of intelligent, probabilistic name matching rules from the name data. Since the rules are automatically learned from real data, they are not bound by limitations of humans’ knowledge as in the rule-based approach (when humans write the name-matching rules), but they reflect countless name variants that occur in the real world.

It is also able to recognize the ethnicity of a name and to construct specific matching models for those names that handle the specific name phenomena of that ethnicity. For example, Arabic ethnicity-specific models can match Khalif and Qaaliif. This ethnicity-specific matching significantly improves accuracy.

Additionally, the Machine Learning approach handles cross-script name matching effectively. Traditional approaches such as Soundex and Edit Distance require transliteration of foreign scripts to a Latin representation before matching is done. This can introduce a large number of transliteration errors into the matching, negatively affecting accuracy. The machine learning approach matches directly between the two scripts, avoiding the need for a Latin transliteration of the non-Latin script.

For more on cross-script matching, see our blog What is Cross-Language Name Matching? blog.

Summary

Countries need strong name matching to truly secure their borders. Many are, however, still stuck with an old “traditional” approach to name matching and have to accept high rates of false positives and false negatives. In the latter case, border officials are not even aware of what they are missing. When a border security system is functioning using such outdated technology, there is great risk that it is both stopping bona fide travelers and missing bad guys.

In conclusion, name matching is a foundational yet complex element of border security. It involves more than simply comparing strings of text; it requires an understanding of diverse linguistic phenomena and cultural practices. By relying on machine learning algorithms, border security agencies can improve both the effectiveness and fairness of their systems. Ultimately, successful name matching supports the dual goals of protecting national security and facilitating the legitimate movement of people across borders.

Recent Posts

  • Cross-language name matching

    What is Cross-Language Name Matching?

    Name Matching is an even more complex problem when it involves multiple writing systems

    View Post

  • Event Extraction

    What is Event Extraction?

    Many critical applications such as risk, link, and geospatial analysis depend on accessing complex information buried in staggering amounts of…

    View Post

  • How to choose an Entity Extraction product

    How to Choose an Entity Extraction Product

    From accuracy to coverage, scalability, customization, and others. There are quite a few factors to consider when choosing an entity…

    View Post