Got Data Quality Issues? Name Matching Can Fix Them for You

Identity Resolution, Name Matching, Record Management

In the era of Big Data, structured data is a cornerstone for many businesses and government agencies and drives many aspects of an organization. A diverse collection of key business activities including strategic planning, budget allocation, investment decisions, product development, marketing campaigns, and many others all are informed by structured data. Whether it be customer data, mailing lists, sales leads, patient records, vendor information, product catalogues or other information about key entities of interest, structured data is one of the most valuable and critical assets in today’s digital world. In practice, however, structured data is often inconsistent, incomplete, or duplicated. It is therefore extremely important for businesses to curate their data to make it as high-quality and reliable as possible.

The Challenge of Data Curation

Curating millions or hundreds of millions of records is not a trivial task. Data quality issues may include misspellings, abbreviations, initials, missing or extra words/whitespace, transpositions of letters/digits, nicknames, word order and punctuation discrepancies in addition to exact duplicate records. Data can also be domestic or international, in English or other languages and language scripts resulting in differences in records caused by transliteration variations.  It can also contain many different types of key attributes including names of people, organizations, and places, dates, addresses, and a variety of numeric values. Modern curation techniques have to be robust in the face of these variations, not only individually, but in combination, and must scale to meet today’s Big Data sized challenges.

Whether it is to support data analysis, validation, cleanup, updating, deduplication, consolidation of disparate databases, standardization, verification, or suppression of unwanted records, data curation requires sophisticated fuzzy matching algorithms that provide accurate, fast, and robust data matching.

Why Use NetOwl for Name Matching

NetOwl’s Name Matching and Identity Resolution products provide an effective, robust, scalable, and high-accuracy solution for data cleansing based not only on fuzzy matching of entity names but also other key entity attributes such as date of birth, place of birth, address, and nationality among others. It utilizes its unique proprietary search and indexing engine that allows for combinations of evidence from multiple matching attributes. Additionally, it allows for application-specific business rules to determine what combination of record attributes should be matched and how important each attribute is to the overall matching process.

NetOwl is designed for Big Data. It supports scalable, real-time searching of massive, Big Data databases with hundreds of millions of records.

NetOwl’s Name Matching and Identity Resolution utilize NetOwl’s award-winning machine learning-based, multicultural, cross-lingual name matching product to enable sophisticated matching of various entity types even across different languages.   NetOwl comes with built-in name and key attribute parsing capabilities that allow you to work seamlessly with both “pre-parsed” field data (e.g., “firstname”, “middlename”, “lastname”, “suffix”) or with “single string” values, providing optimal matching regardless of how the data is represented both in the original data as well as in any search/matching interface.

No matter what type of entity data you have, being able to match on other types of data that are associated with the record can be crucial for data cleansing.  Records focused on person entities may contain not only a “name” field, but other information like “spouse” (person-name), “employer” (organization-name), “home address” (address), “phone number” (phone), “date of birth” (date), and “place of birth” (place).  Similarly, records focused on “companies/organizations” may have key attributes to match on including “executives” (person-name), “subsidiaries” (organization-name), “locations” (place and/or address), among others.  NetOwl contains type-specific matching algorithms to optimally handle all of the different types of data associated with your records.

NetOwl’s Name Matching and Identity Resolution support matching of data both within and across multiple languages.  Names provided in any of the following can be matched:  Latin-based alphabets, Cyrillic alphabets, Arabic and Persian scripts, and Chinese characters (both “simplified” and “traditional”).  NetOwl’s cross-lingual name matching uses combination of transliteration-based matching algorithms and more direct script-to-script matching algorithms for optimal matching.

NetOwl’s Name Matching and Identity Resolution products are available on premise to safeguard data privacy and offer a REST API for easy integration.

In summary, NetOwl’s Name Matching and Identity Resolution are best suited to help organizations curate and manage their structured data accurately and in real-time.