Mining Social Media for Situational Awareness with Entity Extraction

Entity Extraction, Homeland Security, Intelligence Analysis, Social Media Analysis

During emergency situations such as natural disasters or man-made events, social media like Twitter, Weibo, or Facebook become active communication channels and a valuable source of timely and actionable information for situational awareness and relief planning efforts by decision makers, emergency responders, and the public.

There are however a number of challenges involved in tapping information from social media. First, the data volumes in social media can be staggering and require scalable, Big Data-level solutions. Second, it can be hard to identify the relevant information hidden in such large data volumes. Third, data must be processed in real time given the pressing nature of unfolding crises. Think floods, wildfires, disease outbreaks, terrorist attacks, acts of war, riots, etc., where time is of essence and can be a matter of life or death. Fourth, the language in social media is often brief, informal, and has unreliable grammar, spelling, and punctuation, all of which makes it more difficult to process social media data accurately.

Why Use NetOwl’s Entity Extraction for Situational Awareness?

There are a number of ways in which NetOwl provides superior performance to exploit social media for situational awareness and crisis monitoring.

NetOwl’s Entity Extraction goes beyond the extraction of names to extract descriptive phrases, links, and events. Descriptive phrases are especially useful when the entities involved are not yet known by name as it is often the case in emergency reports that may discuss unnamed victims (e.g., “two children”, “a runner”, “an elderly couple”). NetOwl also extracts informal addresses (e.g., “the third block of 14th Avenue) and relative locations (e.g., “two miles southwest of Charlottesville“). NetOwl-extracted links connect entities and provide critical information like the location of a victim, the license plate of a vehicle, or the address of a building. Events capture unfolding situations and their participants like attacks, riots, deaths, wildfires, etc. NetOwl’s rich extraction ontology includes hundreds of semantic concepts and offers the broadest set of links and events available.

Additionally, by geocoding location entities mentioned in text and then associating other entities like people, groups, and events to those locations, NetOwl produces the rich geospatial structured output needed for geographic visualization and GIS analysis.

NetOwl has been trained extensively on social media data to be robust given imperfect input such as misspellings, missing or misplaced punctuation, partial sentences, and other forms of non-grammatical and noisy input.

NetOwl’s Entity Extraction supports 9 languages and offers a seamlessly integrated language ID capability where the language of the input text is automatically detected and the text is processed accordingly.

NetOwl has been engineered for high throughput and scalability. It can be deployed on any number of nodes of a public or private cloud infrastructure such as those offered by Amazon, Microsoft, and others and it scales processing horizontally with the available computing power.

NetOwl turns unstructured data into structured data that is suitable for advanced analytics, enabling faceted search, profile generation, link analysis, geospatial analysis, and trend analysis among others. Moreover, by structuring unstructured content, NetOwl facilitates the fusion of knowledge from unstructured and structured data sources.

In summary, NetOwl’s extensive, state-of-the-art, scalable, and real-time Entity Extraction makes it possible to exploit social media for enhanced situational awareness and more effective emergency response.