Contact us and see what NetOwl can do for you!
How to Choose a Geotagging Product
What is Geotagging?
Geotagging is a technology that identities, disambiguates, and assigns latitude/longitude coordinates to place names and other locational expressions in unstructured text data. It’s a critical technology to make text data available for geospatial analysis via GIS tools.
We have discussed how geotagging works in another of our blogs. Here we’ll be discussing what you should look for in a geotagging product.
The features that you need depend to some extent on your planned use cases. Typical applications include geospatial intelligence and media monitoring.
Criteria for Choosing a Geotagging Product
There are several factors to be considered in evaluating a geotagging product:
How Accurately Does It Identify Place Names?
It should effectively distinguish place names from other kinds of names, such as person and organization names, to reduce false positives, which are common in traditional geotagging based on mostly gazetteer look-up. For example:
- “Obama” is the name of a person, but it’s also a city in Japan.
- “Frederick” is a person’s first name, but it’s also a city in Maryland.
- “Blackstone” is the name of a company, but it’s also a town in Virginia.
Finding place names with high accuracy is important to many applications. What’s underlying it and making it possible is that the product is analyzing the linguistic context that names are occurring in and figuring out which type of name (i.e., place, person, or company) is being referred to. Some products mostly use long lists of place names around the world and do not make any serious attempt at analyzing the context, so you see a lot of false positives that are really not place names. If high accuracy is important to you, this capability is critical.
How Well Does It Distinguish Ambiguous Place Names?
It should also be able to distinguish the same place name located in different geographical areas, such as Springfield (Is it a city in Massachusetts or Illinois or many other states?) or “Alexandria” (Is it a city in Egypt or Louisiana or many other places?), in order to assign correct latitude/longitude values.
Based on text context, it should be able to decide which is the most likely location. For example, in the case of our Springfield example, it should have the capability to examine the surrounding context of the mention of Springfield and recognize that the occurrence of Boston or Worcester (or any other locality in Massachusetts) in the vicinity indicates that Massachusetts is the correct state. Alternatively, if Chicago, Peoria, etc. is mentioned, then Illinois is the correct choice.
Beyond the above two criteria, an advanced geotagging product should also possess the following capabilities:
Can it handle relative location phrases (e.g., “a town five miles west of Frankfurt”)?
Relative location phrases are common in unstructured text data and some applications, such as intelligence analysis for the military, have a great need for this capability. A geotagging product needs to be able to detect the occurrence of such a phrase and also to calculate the precise physical coordinates of the specified location. In our example above, it needs to identify the location of Frankfurt and then further to calculate the location indicated by “five miles west of Frankfurt.”
Can it combine geotagging with relationship and event extraction?
Relationship and event extraction is a highly sophisticated form of extraction that identifies the semantic connections between entities mentioned in unstructured text (we have already discussed in other blogs relationship and event extraction). Combining relationship and event extraction with geotagging provides a very powerful capability. For example, the following example sentence shows an individual engaged in a travel event with a precise destination:
- “Bill Andrews travelled to Oklahoma City.”
A geotagging product should be able to extract the travel event, as well as the fact that Bill Andrews is doing the travelling, and that the destination is Oklahoma City. In addition, of course, it will select the right latitude/longitude values for Oklahoma City.
This capability matters to an application that is interested in locating where events happen in the world. In effect, if your unstructured data contains multiple travel events involving Bill Andrews, you’ll be able to automatically aggregate that data and get a complete picture of where that individual has been. This is of great importance, in particular, to law enforcement and intelligence applications, as in the past an analyst would have had to go through all that unstructured data and compile it slowly and laboriously.
Can it convert various coordinate systems (e.g., MGRS, UTM) to latitude/longitude values?
It’s important that a geotagging product smoothly handle the various types of coordinate expressions. In particular, it should be able to extract MGRS and UTM coordinates from unstructured text and convert them to latitude/longitude coordinates.
Can it geotag in multiple foreign languages?
A geotagging product should have the capability to identify and disambiguate place names in non-English texts (e.g., Arabic, Chinese) and assign geocoordinates to them. As part of this, it needs to translate the place names into English and perform fuzzy name matching against the gazetteers when they are available only in English.
Does it output a confidence ranking for each possible geocoding candidate?
Sometimes an application will need to have multiple latitude/longitude coordinates returned in the case of ambiguities (think of the Springfield, VA or MA example again). Other applications will only need the best answer.
Which way to go depends on your requirements. Maybe you want the second- or third-best possibility as well as the top one.
Does it allow for easy creation of custom gazetteer data?
There are a lot of place names in the world. If you have specific requirements that aren’t captured by the built-in gazetteers in any of the products on the market, then you need to know how easy it is to add new places to the supplied gazetteer.
This blog has outlined some important criteria to consider in selecting a geotagging product. Most important overall is obviously accuracy, but other capabilities such as geotagging more complex things than just place names, handling foreign languages, or easy creation of new gazetteer data are critical to some applications.