Large-Scale Real-Time Media Monitoring with Entity Extraction

Entity Extraction, Geotagging, Intelligence Analysis, Risk Management

There are many different applications where users need to monitor news that might come from traditional news sources, social media, RSS newsfeeds, or any one of the myriad sources now available. Examples include:

  • A media company that needs to monitor global news in a multitude of languages;
  • Financial analysts seeking to track information of economic interest, including stock price movements, exchange rates, etc. and events around the world that could affect their portfolios such as political unrest;
  • A data provider looking to collect adverse news information to support due diligence and screening;
  • Medical researchers tracking disease outbreaks;
  • Companies with long international supply chains that need to monitor all the environmental, political, etc. events that could disrupt those chains.

What is common to all the data sources for these applications and others is that they contain an enormous amount of unstructured data, well beyond the capacity of any purely manual attempt to review them. According to the International Data Corporation, new data is generated at a rate of up to 1.7 MB per person, per second. Still, organizations can use news and social media data to their advantage thanks to Entity Extraction software. Entity Extraction software can find the relevant information buried within these large amounts of unstructured data.

10 Things That Your Entity Extraction Software Should Be Able to Do

For Entity Extraction software to be truly useful for media monitoring, it is important that the software be able to:

  1. Automatically aggregate the news items that are on the same topic;
  2. Identify named entities within the news items such as, at a minimum, the names of people, organizations, places, time expressions, and monetary amounts (plus any terms more relevant to technical/scientific areas such as epidemiology);
  3. Identify links like the locations of companies, the relationships between companies, and the affiliations of people;
  4. Identify a large set of relevant event types such as political changes, crime, cyber security incidents, conflicts, and natural disasters, and also identify who the players are in the events (“Who did what to whom?”);
  5. Perform geotagging, that is, disambiguate and assign coordinates to the extracted places. Once place names are geotagged, the extracted information, including events, can be viewed on a map;
  6. Support the real-time generation of alerts for highly relevant news;
  7. Above all, allow users to perform a semantic search of all the extracted information above. Users need to be able to search for all events of a given type, e.g., all occurrences of disease outbreaks, political unrest, or violence;
  8. Be accurate as measured by standard metrics: maximize recall (i.e., have a low number of false negatives or missed information) and precision (i.e., have a low number of false positives or incorrect extraction);
  9. Be fast to support real-time monitoring;
  10. Be able to process data in multiple languages.

When your organization depends on critical news in order to flourish, it’s essential to use Entity Extraction software that goes beyond the usual standard.