-
Notifications
You must be signed in to change notification settings - Fork 1
Home
geo-tag is a module to tag each tweet json a geo-information. The geo-information includes its stateID, stateName, countyID, countyName, cityID, cityName, coordinate(longitude, latitude), and source which it is used to infer. (We currently only consider the U.S. domestic field.)
Cloudberry and other clients need the geo-informaton in the tweets to implement some corresponding functions.
To infer each tweet releted geo-information, we take strategy as follow:
-
Extract coordinate information with two steps. First step is to check
coordinatesfield and get it. Additionally, we mark thecoordinate_sourcetocoordniates. If it is none, we take the second step which is to checkcoordinatesfrombounding_boxfield and pick a random point from the polygon(rectangle). Furthermore, we mark thecoordinate_sourcetobounding_box. Besides, there are three modes you could choose and the default one is UNIFORM_DISTRIBUTION_RANDOM. -
To infer the city, county and state information, we first check the
placefield in the tweet to get the full cityName and infer other information fromcity.json, so thesourceisplace. -
If
placefield is none, we continue to check if we have coordinates. If so, we use STRTREE andbounding_boxto infer the location depend on the longitude and latitude. Hence, thesourceiscoordinate. -
If we do not have coordinate, we would continue to check
locationfield in theuserfield, and also infer other information including inferred coordinate fromcity.json, so thesourceisuserandcoordinate_sourceisuser_location.
We have a class named TwitterJSONTagger and tag_one_tweet is a function interface for you to use this module. We take a tweet(json format) as input parameter.
