Traditional GIS relies on coordinates that are often contained in structured data or document metadata. Until recently, GIS technology hasn’t been able to exploit the great amount of locational information such as place names lacking explicit geo-coordinates that are found in unstructured data (which is anything from memos to email to text). In particular, the only way to incorporate unstructured data into GIS has been intensive manual curation and normalization of the unstructured data to make it usable by current tools.

Geotagging is a text analytics technology that removes this roadblock:  it assigns absolute physical locations to place names.  It does make use of a list of place names and their geo-coordinates, known as a gazetteer, but it goes beyond simple look-up to solve one of the key challenges of place names, their ambiguity: one place name can refer to multiple locations, e.g., there’s a “Springfield” in many states. Using advanced natural language processing (NLP) and geospatial calculations, Geotagging can resolve the ambiguity by evaluating the overall context supplied by the surrounding unstructured data.  If there’s mention of “Worcester” or “Boston,” then it’s likely to be the “Springfield” in Massachusetts, not Illinois.

Geotagging also handles more than place names.  It can calculate the absolute location from a relative location expressed in the phrase “50 miles northeast of Bagram.” Geotagging of relative locations is very useful in particular for military and law enforcement applications, which deal frequently with unnamed locations.

Beyond Place Entities: Geotagging Events and other Types of Entities

Perhaps Geotagging is most effective when it is combined with the ability to extract relationships and events.

Relationship Extraction extracts key associations between entities in unstructured text. Consider a piece of unstructured data like  “Hobson Corporation has its headquarters in Brussels.” Relationship Extraction will extract both “Hobson Corporation” and “Brussels.” It knows that the former is a company and the latter a location. Relationship Extraction can also make use, as described above, of its Geotagging capability to figure out what is the absolute physical location of “Brussels” and provide its geo-coordinates.

Relationship Extraction also recognizes that there is a “Located_In” relationship between “Hobson Corporation” and “Brussels.” This information is then structured, which makes it easily convertible into a standard format that a GIS tool can accept and render in visual terms. Based on this input from NetOwl, the tool can, for example, select an appropriate icon for a headquarters, place it on a map, and then display the names of the relevant entities, the corporation and the location, by clicking on the icon. All of this is the result of Relationship Extraction.

Geotagging can also be combined with Event Extraction to locate events on a map.  For a sentence like “John Rochambeau visited New York City,” Event Extraction will extract the entities “John Rochambeau” and “New York City,” and also identify that “John Rochambeau” is the person visiting and that “New York City” is the location visited. This capability enables events from unstructured text to be handled by a GIS tool in a way similar to the way relationships are handled.

As an added feature, Event Extraction can also identify the time of an event, again if the information is present in the unstructured data. This enable events to be aggregated and plotted on a timeline. For instance, disease outbreak events can be tracked over time visually.

Other event types can be combined with locational information and can be used to place event-indicators on the map such as:

  • Locations where a meeting happened
  • Locations where a bomb exploded

Geotagging means that geospatial applications can provide geospatial intelligence from unstructured data and display a far richer geospatial view that goes well beyond a “set of points” on a map.