What is Geotagging

What does Geotagging do?

Geotagging finds the geographic coordinates of place names mentioned in unstructured text. For example, consider the following unstructured text:

  • “George Washington’s home was in Mount Vernon”

“Mount Vernon” is a name that refers to an actual physical location. Geotagging determines what that location is and produces the latitude/longitude coordinates for it, in this case, 38°44′07″N 77°05′43″W.

In the following we’ll discuss how Geotagging works and also how useful and valuable it is.

(By the way, the Geotagging we’re discussing is not the adding of geo-locational metadata to various media such as photos, social media posts, videos, etc., which is a frequent meaning. We are referring to locating on a map a place name that is mentioned in unstructured text.)

Why is Geotagging Useful?

Geotagging is critical to many Government and commercial applications because it makes texts available for geospatial analysis using a variety of geographic information systems (GIS).

GIS systems such as the ones provided by ESRI and other companies traditionally visualize on maps information that is already structured. An example in a Government context would be a list of locations of military installations along with their latitudes/longitudes. This list may be resident in a database with the name of the installation and the associated latitude/longitude in two different fields. This structured locational data can simply be input to a GIS product following, if necessary, a transformation of the data into the product’s proprietary input format or an open standard.

By contrast, Geotagging focusses on unstructured data containing locational information. Unstructured data refers to any natural language text produced by humans such as news, social media content, web pages, etc. Naturally, unstructured data contains place names and other locational information which are of great value. Geotagging offers the capability of transforming this unstructured locational data into structured locational data.

Where is Geotagging Useful?

There are many areas where Geotagging is useful:

  • Intelligence Analysis relies a great deal on geolocational information. Data such as where terrorist activities are occurring (bombings, assassinations, etc.) are essential to understanding patterns behind the data. Visualizing the information in a GIS aids analysts in gaining insight.
  • International Shipping is an industry where knowledge of news of adverse events around the world such as political unrest, bad weather, etc. would be enhanced through the ability to visually display unstructured locational data.
  • Public Health Monitoring would be greatly enhanced if unstructured text sources containing locational data on disease outbreaks could be automatically mapped.

How does Geotagging Work?

Geotagging involves multiple steps:

  • First identify a place name in unstructured text. Entity Extraction is a technology that is used to identify a name and what kind of name it is (person, organization, place, etc.). In the following example, Abraham Lincoln is recognized as a Person name. Springfield is recognized as a City name.
    • “Abraham Lincoln lived in Springfield before he became president”
  • Looking up the place name in a gazetteer. The next step after extracting the place name is to look it up in a gazetteer. A gazetteer is simply a list of names and their associated lat/longs. There are publicly available gazetteers from the United States Geological Survey and the National Geospatial-Intelligence Agency as well as many private ones.
  • Resolve ambiguities among possible candidates based on the textual context. Place names are frequently ambiguous, so the gazetteer look-up will sometimes return multiple possibilities. For example, the name Springfield is quite common in the U.S.:
    • Springfield (in Massachusetts)
    • Springfield (in Illinois)
    • etc.

Geotagging provides the capability of deciding which is the right candidate by analyzing other place names and indicators in the surrounding context to rank the coordinate candidates.

Why is Geotagging Challenging?

Geotagging may sound straightforward, but it’s actually quite challenging for the following reasons:

  • Place names can be spelled differently than the names in the gazetteer, so the matching has to be fuzzy. Middle Eastern or Central Asian place names in English frequently exhibit varying spellings:
    • Mazar-e Sharif vs. Mazir-i Sharif (city in Afghanistan)
  • Place names can be ambiguous. For instance, many person names are also place names: Paris, Austin, Jordan, Washington, Lincoln, Clinton, etc. You don’t want to geotag a person name as a place!
  • Locations may be described in relative terms (e.g., 100km northeast of Paris). Geotagging should be capable of calculating the precise geographic location even if it’s not named.
  • Gazetteers may not be available for some languages. It is possible to use English language gazetteers for foreign names, but it requires a machine translation of names in foreign languages to English.

Advanced Geotagging

Geotagging has usually been limited to place names, but by utilizing relationship extraction and event extraction, events as well as other types of entities that are linked to places can be analyzed geospatially.

  • Examples:
    • Locations of a person who traveled to places:
      • “John Robertson traveled to Beirut”
    • Attack events in particular region
      • “Three terrorists attacked a convoy in Kabul”
    • Locations of outbreaks of diseases
      • “Virginia reported 996 new cases of Corona virus yesterday”
    • Locations of the offices of an organization
      • “XYZ Corporation has its headquarters in Brussels”

To take the first example, Event Extraction will produce the following structured representation for it:

  • TRAVEL_EVENT
    • PERSON: John Robertson
    • DESTINATION: Kabul

As with simple place names, geotagging can now geoenable this now-structured data by placing a person icon for John Roberson on Beirut (or any other places he traveled to) on a map. Events such as an individual’s movements can now be tracked and analyzed in an entirely visual way.

Summary

In sum, Geotagging is a critical technology for maximizing the value of geospatial information found in your unstructured text sources. Crucial information is increasingly found in unstructured, natural language data, and Geotagging allows it to be combined with already structured data for more complete analysis.