Entity Extraction Helps Connect the Dots

April 04, 2019 | Entity Extraction, Homeland Security, Intelligence Analysis

Law enforcement and intelligence agencies receive high volumes of relevant information in the form of unstructured data that is critical to connecting the dots.  For law enforcement, this unstructured data comes both from internal sources such as crime reports, case narratives, and interviews, and from the general public through tip lines.  Online news and social media posts may also contain information critical to a case.  For intelligence agencies, unstructured data is found both in the results of electronic monitoring as well as open-source data sources.  On-line news and social media are also of critical importance to intelligence.

Manually reviewing unstructured data – which can contain information critical to connecting the dots – takes precious time and resources that are in short supply in law enforcement and intelligence agencies.  Imagine trying to read all of this unstructured data during an emergency situation where actionable information has to be produced quickly.  Unstructured data makes it very difficult to connect the critical dots and connections that can uncover a criminal gang or a terrorist plot.

Entity Extraction for Connecting the Dots

Advanced text analytics technologies like Entity Extraction can expedite the processing and review of unstructured data and free up agents and analysts to do more complex analytical work. It can also connect the dots exploiting critical information from seemingly unrelated data sets.

The most basic level of Entity Extraction is the extraction of elements such as names, phone numbers, vehicle descriptions, license plates, people descriptions, place names, addresses, and many others. Once these concepts have been identified and associated with the original documents as metadata, they become available for advanced semantic search in addition to the usual keywords.

More recently, researchers and product vendors have developed more advanced forms of Entity Extraction including Relationship Extraction and Event Extraction. The former offers a unique and advanced capability to identify a broad range of relationships (e.g., a person’s associates, a person’s employment with an organization). The latter extracts significant events (e.g., attacks, travels, financial transactions, etc.) and also identifies the participants in them, as well as the place and time of the event if the document specifies them. Relationship and Event Extraction enable a far more advanced analysis that goes beyond the simplistic links afforded by co-occurrence of the entities in the same sentence, paragraph, or document.  (Despite the introduction of these more advanced techniques involving relationships and events, the usual term for the technology is still Entity Extraction.)

This allows tools like link analysis, which are themselves critical to connecting the dots, to be automatically populated by relationships and events extracted from unstructured data.  Without Entity Extraction, humans would have to read the unstructured data and manually populate the tools, which is a time-consuming process.

Entity Extraction can also support name normalization.   A limitation of Entity Extraction is that extraction proceeds one document at a time, so it doesn’t know that a name in Document A is resolvable to the same or a similar name in Document B.  Name normalization allows names to be more easily resolved and aggregated across documents. This will support semantic search, faceted search, and advanced analysis (e.g., timelines, charts). For example, name normalization could remove all corporate designators from all corporate entities extracted, so that, e.g., occurrences of “XYZ” and “XYZ Inc.” would be normalized to “XYZ.”  This makes aggregation of the data across documents much more straightforward.  Relationships and events involving “XYZ” would be grouped together and thus produce a much richer picture of the organization.

Another important feature of Entity Extraction is geotagging.  Place names in particular can refer to different physical locations (e.g., “Springfield” occurs in many U.S. states). Geotagging offers semantic disambiguation of place names by applying machine learning algorithms to the overall unstructured context to resolve the ambiguity. In addition, geotagging not only works for place names but also for other kinds of entities (e.g., where a person has travelled, where an organization has its headquarters). This enables, for example, an intelligence analyst to track a target’s movements when that information is scattered across multiple unstructured documents. Entity Extraction can also output a confidence score for the geotagged locations. An added bonus is that it is technically feasible to geotag relative location phrases (e.g., “30 miles northwest of Springfield”).

In summary, Entity Extraction is an attractive choice for both law enforcement and intelligence to turn unstructured data into actionable insights that will connect the dots.