Entity Extraction for Knowledge Discovery

“Big Data” Has Become the Norm for Many Organizations

The current flood of data of all sorts over the Internet is immense. It also comes very fast. Organizations of all types have to deal with petabyte-scale amounts of structured data that come from sensors, clickstreams, and the like. Enormous amounts of unstructured data are also being produced on the giant social media and other platforms. In addition, a very large part of world knowledge is increasingly online, including an extensive body of scientific and technical knowledge, news and current events, and just about every other area you can think of. And stupendous amounts of new material are being added continuously. Hence the term “Big Data.”

Discovering useful knowledge within such large data sources has become a major challenge for organizations. A new technology, Entity Extraction (aka Named Entity Extraction and Named Entity Recognition) can help.

Unstructured Data Is a Key Source of Insight

Knowledge Discovery is the process of finding insights within data. For example, intelligence organizations need to stitch together terrorist networks, and unstructured data represents a growing source of valuable intelligence. Typical types of unstructured data include the following (in multiple languages):

  • Reports
  • Social media data
  • News
  • Chat
  • etc.

The prime goal of intelligence analysis is to find hidden links and associations, i.e., to find not only known terrorists, but to identify other terrorists who were previously unknown and to understand the relationships among them.

Entity Extraction technology plays a big part in reaching this goal.

Unstructured Data Is Hard to Handle. That’s Where Entity Extraction Comes In.

Up to the recent past, unstructured data was very hard for information technology to process. The reason is that natural language is highly ambiguous, imprecise, and vague. When reading a text, a human will realize that “General Electric” is highly unlikely to refer to a person, but “General Eisenhower” does. Language is filled with these ambiguities. Another challenge of natural language is the multiplicity of ways in which the same concept can be expressed. The syntax and wording of two sentences may be different, but their meaning may be essentially identical:

  • “The FBI arrested two armed men” vs. “Two armed men were apprehended by FBI agents”

How Entity Extraction Analyzes Text

Fortunately, there’s a technology, Entity Extraction, which can discover new knowledge in unstructured text. It analyzes the latter and discovers the most important knowledge contained in it.

The knowledge that Entity Extraction identifies can be listed from basic to advanced, as follows:

  • Named Entities are the basic building blocks of all extraction. It typically identifies the following:
    • People
    • Organizations (such as companies, government organizations, etc.)
    • Places (such as countries, states, cities, etc.)
    • Dates/Times
    • Numerics (such as phone numbers, money amounts, etc.)
  • Attributes of Entities:
    • Title of a person
    • Age
    • Place of birth
    • Date of birth
    • Nationality
    • etc.

(For a company, the equivalents are headquarters location, date of founding, etc.)

  • Descriptive phrases linked to named entities that provide a wealth of information about them:
    • For people, these include such phrases as “President and CEO of Google” that is used to describe the person name “Sundar Pichai.”
    • For companies, examples include “the Chicago-based manufacturer” as a description of “Conagra Brands.”
  • Relationships that exist between entities:
    • A person may be associated with a company as an employee.
    • A company may be associated with another company as its subsidiary.
    • etc.
  • Events involving entities:
    • This is the most complex form of knowledge that extraction can discover in unstructured text. An event typically involves up to several entities and can be assigned a date when it happened and a location where it happened (if these are mentioned in the unstructured text). In this way, Knowledge Discovery takes uninterpreted and unstructured data and transforms it into knowledge that can be used by other automated processes.
    • An example is provided by a sentence such as:
      • “Just Eat Takeaway acquired Grubhub on June 10, 2020 for $7.3 billion.”

Event Extraction identifies the nature of the event (Corporate Acquisition), identifies the participants and their roles, the date of the event, as well as the acquisition price. All this data is output in a completely structured format with any linguistic ambiguities removed.

How Entity Extraction Enables Knowledge Discovery

Entity Extraction is a powerful technology that supports many aspects of Knowledge Discovery. For example:

  • Perhaps the simplest application of Entity Extraction is to identify the most commonly occurring names in a body of unstructured data such as a Twitter feed, emails, or in news feeds. When time-stamp information is available, a dashboard interface can be used to track frequency of items over time. This is a valuable means for a user to monitor popularity or importance of people or organizations over time.
  • Taking a further step, the more advanced form of extraction, Relationship Extraction, can be used to identity critical links in unstructured data. For example, a low enforcement agency may already know that an individual is affiliated with a known criminal organization, but Relationship Extraction can identify previously unknown individuals who have the same affiliation. In this way an entire criminal network can be established automatically.
  • Similarly for a commercial application, Event Extraction can identify C-level executives movements (e.g., promotions, retirement, etc.) as well as any adverse events that companies or their executives are involved in (e.g., bribery, lawsuits). In this way companies can study their competitors or the risk profiles of potential clients or partners.
  • These structured links can now also become input to link analysis and visualization tools, which have always required structured data as input and couldn’t deal with unstructured data. Relationship and Event Extraction provides the bridge.
  • Event Extraction also has the capability to identify the location where an event has taken place. For example, it can identify all travel events in which an individual has participated including the departure and arrival points. This enables the complete itinerary of individuals to be mapped, something of crucial importance in areas like law enforcement. This data can then be fed to GIS tools which will visualize it.

For other applications of Entity Extraction, see some of our other blogs that cover Entity Extraction being used to perform Knowledge Discovery in support of financial intelligence and threat detection and tracking. These are just a few of the areas where Entity Extraction provides critical capabilities.