Font Size:  Small  Medium  Large

A Study on How Location Information is Expressed in Tweets and Geo-locating Tweets with Less Ambiguity

Sunshin Lee, Mohamed M. Farag, Edward A. Fox


Since many tweets talk about events with location information, it is important to geo-locate those tweets whenever possible. This paper describes how latent location indicative information can help to geo-locate tweets with less ambiguity.
Many focus on tweets that have geo-coordinates, but often that only accounts for around 2% of the tweets, which we show is insufficient to accurately predict national trends. Other researchers focus only on tweets that have geonames. Further, many only employ small datasets, e.g., covering several months. Due to possible location ambiguity, e.g., “Washington” matches 88 different locations in the U.S.A., finding a tweet’s location often is challenging.
Furthermore, explicit location information is insufficient in tweets, because tweets are often brief and incomplete due to the “140 character” limitation. However, location indicative words may include latent location information, for example, “Water main break near White House” is related to a location “1600 Pennsylvania Ave NW, Washington, DC 20500” indicated by the keyword ‘White House’.
We studied over 6 million tweets about water main breaks, sinkholes, potholes, car crashes, and car accidents, covering 17 months. We found that up to 91.1% of tweets have at least one type of direct location information (geo-coordinates, geonames), or location indicative words. We found that in most cases adding location indicative words helps geo-locate tweets with significantly less ambiguity. We also studied state-level labeling of events’ location based on event-related tweets. We found relatively consistent results with tweets that were unambiguously geo-coded at the state level, while significantly different results with geo-coordinates.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.