How Name Matching Makes It Easier to Find your Ancestors

The Increasing Interest in Genealogy

Mapping one’s ancestry has now become of great interest to many people. In fact, the genealogy industry has experienced significant market growth in recent years and is now a multi-billion dollar market, which is projected to grow from $7.34 billion in 2025 to $16.16 billion by 2032. A number of global companies and non-profit organizations operate in this space such as Ancestry, MyHeritage, FamilySearch, etc.

Major drivers for this market growth are the rising global interest in ancestry and heritage and the increasing volume of digitized historical records such as:

- civil registrations (birth, marriage, death)
- census records
- immigration records
- church records (e.g., baptism)
- military records
- ship manifests
- etc.

For instance, Ancestry currently hosts 60 billion historical records spanning more than 80 countries, making it one of the largest online genealogical databases in the world.

The Challenges of Searching Ancestral Data

Searching historical records can be challenging though. Given the great variety of data types and formats as well as different time depths, many items in records like names are going to vary widely. Here are some of the reasons:

- Names that are missing one or more name components such as middle names: Francisco vs. Francisco Antonio
- Word order differences: John Robinson vs. Robinson, John
- The use of initials: Frederik John Henderson vs. Frederik J. Henderson
- The use of nicknames: Edward James vs. Ted/Teddy James
- Typos, which are more common with less familiar names and spellings
- Changes due to a name being originally in a foreign language: Müller vs. Muller, where this kind of “Americanization” was common among immigrants
- OCR errors from paper documents that may predate the advent of computers
- Records written in a different script in the country of origin: Mikhail Bulgakov vs. Михаил Булгаков
- Cultural naming conventions, such as:
  - - Spanish surnames, which include a patronym and a matronym, may drop the latter: Raul Mendoza Juarez vs. Raul Mendoza
    - Arabic names containing the definite article “al” frequently leave it out: Said al-Farouq vs. Said Farouq
    - Asian names where the family name comes first traditionally, e.g., Park Soo-jin, unlike in Western cultures, where the family name comes last: Soo-jin Park
- etc.

In addition, in order to make an unambiguous determination of identity, other fields frequently are included in the search query, such as address, date of birth, or date of death. Unfortunately, these other fields can also exhibit extensive variations.

- DoB: 10/9/1896 (American style) vs. 9/10/1896 (European style)
- PoB: Winchester, Mass. vs. Winchester, Massachusetts vs. Winchester, MA
- Address: One Manchester Place, Liverpool, UK vs. 1 Manchester Pl., Liverpool (Note there are two variations here, plus the presence of a country name in one vs. its absence in the other)
- etc.

A person’s relatives may also be included to help the search, and the names of those relatives may vary too.

In view of the fast-growing data volumes and the variability in them, an automated solution is necessary to help find better matching results and at scale.

Name Matching Improves Search

Luckily, an AI technology is available that eases the challenges of matching immensely: Name Matching.

Name Matching handles the above variations and also many others. It provides the required fuzzy name matching using Machine Learning techniques that create a different trained matching model for each entity type. Each model is specifically trained to match the likely variation in each type (e.g., people, organizations, dates, places, addresses).

Each search query field (e.g., name, date of birth, place of birth) is matched separately against its counterpart in a candidate historical record, and a score for match likelihood is generated. Then these separate field scores are combined into a single overall score for the entire record. Business logic rules can set thresholds for how high an overall score as well as each field score needs to be for the candidate historical record to be returned as a potential match.

For example, here’s an example of a user’s query searching for an ancestor:

Name:                       DoB:                  PoB:

Sean Andersen               10/9/1925             Brookline, MA

And here is a set of records that may be returned, ranked in decreasing order of likelihood of being a good match:

Name:                       DoB:                  PoB:

Sean M. Andersen            10/09/1925            Brookline, MA

S. Andersen                 11/2/1925             Brookline, MA

Shawn Andersson             10/10/1925            Brookline, Mass.

Name Matching also supports cross-lingual matching where the names are in different scripts: Latin, Russian, Korean, Japanese, Chinese, etc. Here’s a user query looking for someone who was born in Russia and emigrated to the U.S.:

Name:                       DoB                    PoB:

Piotr Botvinnik             12/01/1932             Moscow, Russia

And the returned records might look like this:

Name:                       DoB:                   PoB:

Пётр Ботвинник              01/12/1932             Москва, Россия

Pyotr Botwinnik             12/01/1932             Moscow

Peter Botvinnik             11/22/1932             Moscow

Piotr Botvinnik             1/12/1932              Unknown

Russia follows the European date notation scheme of Day/Month/Year. Hence 01/12/1932 in the Russian-language data record is equivalent to 12/01/1932 in the English query. In the third data record returned, Peter is the English equivalent of Piotr. Immigrants will frequently translate their names, or parts of them, into the language of the country they immigrate to.

Summary

Name Matching effectively supports ancestry research over a very large number of data records from different places and times. It offers powerful AI matching algorithms tailored to the requirements of different types of names and is able to rank candidate matches based on all the data fields in a record.

Previous Next

How Name Matching Makes It Easier to Find your Ancestors

The Increasing Interest in Genealogy

The Challenges of Searching Ancestral Data

Name Matching Improves Search

Summary

CATEGORIES

Recent posts