Finding the Needle in the Haystack: Entity Extraction and National Security

April 02, 2018 | Entity Extraction, Homeland Security, Intelligence Analysis

entity extraction for intelligence

Intelligence agencies collect and analyze information to support national security and protect the country’s citizens, economy, and institutions from domestic and foreign threats of all kinds. In a changing world, those threats can include state and non-state actors such as terrorist organizations, hacker collectives, and drug cartels.

Intelligence agencies are faced with the three Vs of Big Data: ever growing volume, velocity, and variety of data. Intelligence analysts must go through vast amounts of unstructured data to identify the critical information necessary to protect our homeland. Unstructured data grows exponentially, with 80% of all data said to be unstructured. Social media alone (e.g., Facebook’s 2 billion users) produces enormous amounts of data. The velocity of Big Data is expanding too, requiring data to be processed in real time. Last but not least is the vast variety in the data. Unstructured data comes in many forms (e.g., news, social media, web and dark-web content, message traffic, email, hard-copy documents) and in many different languages, for which there may be few skilled analysts.

There is an additional layer of complexity in the task. The primary goal of intelligence analysis is the discovery of information not previously known, for instance, the name of a new terrorist.  Traditional keyword search techniques, which work well in other scenarios, are often not applicable in intelligence analysis since the names of the entities of interest may not yet be known. Instead, intelligence analysts need more advanced forms of search such as semantic search to allow them to ask, for instance, “show me documents containing terror organization members.”

How NetOwl’s Entity Extraction Helps Intelligence Analysis

NetOwl Extractor goes beyond the extraction of names to extract descriptive phrases, links, and events. Descriptive phrases are especially useful when names are not yet known. Links and events connect entities and help gain a better understanding of criminal networks and bad actors. For instance links and events connect people to other people (e.g., associates), organizations (e.g., a person’s affiliation) and places (e.g., places visited). Based on extensive work with Government analysts, NetOwl’s rich extraction ontology includes hundreds of semantic concepts and offers the broadest set of links and events available.

Additionally, by geocoding location entities mentioned in text and then associating other entities like people, groups, and artifacts to those locations through extracted links and/or events, NetOwl produces the rich geospatial structured output needed for GIS analysis.

NetOwl’s Entity Extraction supports 9 languages and offers a seamlessly integrated language ID capability where the language of the input text is automatically detected and the text is processed accordingly.

NetOwl has been engineered for high throughput and scalability. It can be deployed on any number of nodes of a public or private cloud such as Amazon, Microsoft, or others and scales processing horizontally with the available computing power.

NetOwl turns unstructured data into structured data that is amenable to advanced analytics, enabling faceted search, profile generation, link analysis, geospatial analysis, and trend analysis among others. Moreover, by structuring unstructured content, NetOwl facilitates the fusion of knowledge from unstructured and structured data sources.

NetOwl frees the analyst from scanning large amounts of information and delivers the items of critical importance for homeland security, like entities, links and events.