Entity Extraction Makes Unknowns Known

Searching for the Unknown

People in many professions are often looking for unknowns:

  • In order to meet their compliance obligations, financial institutions need to investigate any past criminal or questionable involvements of a potential customer.
  • Institutions that make investments, particularly on a worldwide basis, need to have a good understanding of the overall health and stability of an investment target.
  • Law enforcement officers may have a physical description of a suspect, but no name or other identifying data.
  • Intelligence officers are in a similar predicament: they may know the identity of one person in a terrorist network, but not any others.

Unfortunately, a great deal of the data that contains information about an unknown is contained in unstructured text data. For all sectors of the private economy as well as Government agencies, the amount of text, particularly as generated by social media, has become staggeringly voluminous and well beyond human capability to process it.  Unstructured data grows exponentially. It is estimated that 80% of all data is unstructured. The velocity of unstructured data is also expanding, increasing the need to process data in real time.

Conventional Search Technology Isn’t Enough

Search technology, particularly traditional keyword search, which is excellent in other respects, frequently does not work well with this problem of the unknown piece of data. You can enter the name of a person or organization in Google or Bing to find them, but without the name, you’re stuck.

Knowledge workers in all sectors would be greatly helped if they could pose more abstract queries to get the information they need. Here are two examples, and there are countless others:

  • Financial analysts at investment banks need to be able to pose queries like “Show me all adverse events that ABC Corporation, including its executives, have been involved in.”
  • Intelligence and law enforcement need to have a more advanced form of search that allows them to make abstract queries such as “Show me all persons associated with MS13.”

Fortunately, there is a technology that enables such queries: Entity Extraction.

How Entity Extraction Find Unknowns

Entity Extraction is a technology developed over the last couple of decades that finds key entities, relationships, and events in unstructured text and turns them into structured data. Entity Extraction’s first achievement was in recognizing proper names, and then it extended its capability to extract more complex concepts such as relationships and events.

Here are some sample concepts that Entity Extraction typically extracts:

  • Named Entities, including the following:
    • People
    • Organizations
    • Places
    • Dates/Times
    • Numerics (such as phone numbers, money amounts, etc.)
  • Attributes of Entities:
    • For Person
      • Title
      • Age
      • Place of birth
      • Date of birth
      • Nationality
    • For Organization
      • Headquarters location
      • Date of founding
  • Relationships that exist between entities:
    • A person may be associated with a company as an employee.
    • A company may be associated with another company as its subsidiary.
  • Events involving entities:
    • Company merger and acquisition
    • Company or person indicted for a crime
    • Personnel changes in an organization
    • etc.

Extracting an event is the most complex form of extraction. An event typically involves up to several entities as event participants and can be assigned a date when it happened and a location where it happened (if these are mentioned in the unstructured text). The set of extractable events is based upon a defined ontology.

For example, here’s an unstructured description of a corporate acquisition: “XYZ Corp. announced that it will acquire ABC Corp. in June. The $2 billion acquisition brings further consolidation to a struggling sector.”

The output of Entity Extraction for these two sentences would be:

Event-Type: Corporate Acquisition

Acquiring Company: XYZ Corp.

Acquired Company: ABC Corp.

Value: $2 billion

Date: June

In effect, Entity Extraction has produced what is equivalent to a database record where the field labels (“Acquiring Company,” etc.) are predictable and can be searched by conventional database queries. Analysts can search for any aspect of what’s been extracted:

  • “Show me all acquisitions where the Acquiring Company was XYZ Corp. in the last two years.”
  • “Show me all acquisitions with a value of $2 billion and up which occurred in the last month.”
  • etc.

What Unknown Information Does Entity Extraction Find?

To give a few more examples of the kinds of unknowns that can be found by Entity Extraction:

  • A law enforcement agency may already know that a known individual is affiliated with a known gang, but Entity Extraction can identify previously unknownindividuals who have the same affiliation. In this way an entire criminal network can be established automatically.
  • Similarly for a commercial application, Entity Extraction can identify from unstructured sources such as news the identities of C-level executives of their competitors, particularly those leaving or joining in near real time. In this way companies can maintain good intelligence on their competitors in a dynamic, rapidly changing environment.
  • Data analysts would be able to see which financial institutions have been fined for violation of SEC regulations.

The structured data resulting from extraction can now also become input to link analysis, geospatial analysis, and data visualization tools, which have always required structured data as input and couldn’t deal with unstructured data. Entity Extraction provides the bridge.

In sum, Entity Extraction is a critical tool for finding previously unknown information buried in mountains of unstructured text.