Tutorials
Filtering Tweets by location
Introduction
When working with Tweet data, there are two classes of geographical metadata:
- Tweet location - Available when user shares location at time of Tweet.
- Account Location - Based on the ‘home’ location provided by user in their public profile. This is a free-form character field and may or may not contain metadata that can be geo-referenced.
These are described separately in the next two sections.
Important Notes:
- Geographical coordinates are provided in the [LONG, LAT] order. The one exception is the deprecated ‘geo’ attribute, which has the reverse [LAT, LONG] order.
- All PowerTrack Geo-Operators expect coordinates in the [LONG, LAT] order.
Tweet locations ("geo-tagged" Tweets)
Twitter enables users to specify a location for individual Tweets. PowerTrack offers multiple ways to filter for Tweets by Tweet-specific location data through its various operators (see our documentation for Twitter PowerTrack Operators for details). Tweet-specific location information falls into two general categories:
- Tweets with a specific latitude/longitude “Point” coordinate
- Tweets with a Twitter “Place” (see our blog post on Twitter Places: More Context For Your Tweets and our documentation on Twitter geo objects for more information).
Tweets with a Point coordinate come from GPS enabled devices, and represent the exact GPS location of the Tweet in question. This type of location does not contain any contextual information about the GPS location being referenced (e.g. associated city, country, etc.), unless the exact location can be associated with a Twitter Place.
Tweets with a Twitter “Place” contain a polygon, consisting of 4 lon-lat coordinates that define the general area (the “Place”) from which the user is posting the Tweet. Additionally, the Place will have a display name, type (e.g. city, neighborhood), and country code corresponding to the country where the Place is located, among other fields.
Important note: Retweets can not have a Place attached to them, so if you use an operator such as has:geo, you will not match any Retweets
Twitter Place JSON
Below is example JSON from a Tweet geo-tagged with the "Boulder, CO" Twitter Place.
{
"place": {
"id": "fd70c22040963ac7",
"url": "https:\/\/api.x.com\/1.1\/geo\/id\/fd70c22040963ac7.json",
"place_type": "city",
"name": "Boulder",
"full_name": "Boulder, CO",
"country_code": "US",
"country": "United States",
"contained_within": [
],
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[-105.301758, 39.964069],
[-105.301758, 40.094551],
[-105.178142, 40.094551],
[-105.178142, 39.964069]
]
]
},
"attributes": {}
}
}
Exact location JSON
In the case of Twitter’s enriched native format, the root level “geo” and “coordinates” attributes provide the decimal degree coordinates for the exact location. Tweets containing this metadata can also include “Twitter Place” data described above, although the presence of both is not guaranteed.
Note that the “coordinates” attributes is formatted as [LONGITUDE, latitude], while the “geo” attribute is formatted as [latitude, LONGITUDE].
{
"geo": {
"type": "Point",
"coordinates": [40.0160921, -105.2812196]
},
"coordinates": {
"type": "Point",
"coordinates": [-105.2812196, 40.0160921]
}
}
Tweet location operators
place:
Filter for specific Places by their name or ID. To discover “Places” associated with a specific area, use Twitter’s reverse_geocode endpoint in the REST API. Then use the Place IDs you find with the place: operator to track Tweets that include the specific Place being referenced. If you use the Place name rather than the numeric ID, ensure that you quote any names that include spaces or punctuation.
place_country:
Each Twitter “Place” comes with a country code, indicating the country in which the Place is located. The country_code: operator allows you to filter on this ISO alpha-2 character code (see HERE for country code references).
has:geo:
The has:geo operator matches for the presence of either Point or Place geo information within the Twitter payload. Note that this does not allow you to specify specific locations or types of geo data, it simply requires that results have Tweet-specific location information of some kind.
point_radius:
The point_radius: operator allows you to specify a circular geographic area and match Tweets containing Tweet-specific location data that fall within that area. To use, define a central lon-lat coordinate, and then set the radius (up to 25 miles). Any Tweet containing a geo Point that falls within this region will be matched. Addtionally, Tweets containing Twitter Places will match where the geo polygon defined for the Place falls fully within the defined point-radius area. Places whose polygons fall outside the defined point-radius area to any extent will not match.
Usage resembles the following: point_radius:[lon lat radius]
bounding_box:
The bounding_box: operator allows you to specify a 4-sided geographic area and match Tweets containing Tweet-specific location data that fall within that area. To use, define lon-lat coordinates that represent the opposite corners of the box, such that each side of the box is up to 25 miles in length. Any Tweet containing a geo Point that falls within this region will be matched. Addtionally, Tweets containing Twitter Places will match where the geo polygon defined for the Place falls fully within the defined point-radius area. Places whose polygons fall outside the defined point-radius area to any extent will not match.
Usage resembles the following: bounding_box:[west_long south_lat east_long north_lat]
Profile locations (account "home")
Another option in filtering for Tweets by location information is to match for location information within a Twitter user’s profile. Several data fields fall into this category, but all represent types of information which are set by the user at the account level. These values are generally not frequently changed by the user, and do not necessarily represent the location that the user is currently Tweeting from, although they may.
In addition to the profile location provided by Twitter, Gnip provides an optional Profile Geo enrichment that formalizes the data in the profile location, and makes it more convenient to filter.
Profile location metadata
"user": {
"location": "Denver, CO",
"description": "Part-time fiddler, wanderer, yogi, scubadiver #savegamehenge Full time explorer, festvarian, music lover, nerd, @TwitterBoulder",
"created_at": "Wed Aug 05 05:46:48 +0000 2009",
"utc_offset": null,
"time_zone": "null",
"geo_enabled": true,
"lang": "en"
}
With the Profile Geo enrichment enabled, the above Twitter Profile Location results in the following user.derived.locations attribute in the root-level user object. Note that the user.derived.locations attribute is defined as an array of locations. While only one location is currently provided, the Profile Geo enrichment may in the future be able to resolve multiple locations mentioned is a user’s Profile Location. See HERE for more information on the Profile Geo enrichment.
Profile geo metadata
Note: that all Profile Geo coordinates are provided in the [Longitude, Latitude] order.
"user": {
"location": "Denver, CO",
"description": "Part-time fiddler, wanderer, yogi, scubadiver #savegamehenge Full time explorer, festvarian, music lover, nerd, @TwitterBoulder",
"derived": {
"locations": [{
"country": "United States",
"country_code": "US",
"locality": "Denver",
"region": "Colorado",
"sub_region": "Denver County",
"full_name": "Denver, Colorado, United States",
"geo": {
"coordinates": [-104.9847, 39.73915],
"type": "point"
}
}]
}
"created_at": "Wed Aug 05 05:46:48 +0000 2009",
"utc_offset": null,
"time_zone": "null",
"geo_enabled": true,
"lang": "en"
}
Profile location operators
Profile Geo Operators
The following Operators are available for building rules if you have the Profile Geo enrichment enabled:
has:profile_geo
This filter matches for the presence of Profile Geo enrichment data in a specific Tweet, regardless of the value. This will only match Tweets where the user's "home" setting was successfully geo-referenced to at least the country level. For example, a Tweet from a user with an account home set to "the internet" will not match this Operator, but a home of 'USA' will.
profile_country:
Matches Tweets where Gnip’s Profile Geo enrichment data is available, and contains the defined country code. Note: this will only match Tweets where Gnip has been able to provide formal Geography information for the profile location provided by the Twitter user, consistent with the description of the enrichment here.
profile_region:
Matches Tweets where Gnip’s Profile Geo enrichment data is available, and includes the specified “region.” Note that profile_region: will perform an exact string match. This will only match Tweets where Gnip has been able to provide formal Geography information for the profile location provided by the Twitter user, consistent with the description of the enrichment here.
profile_locality:
Matches Tweets where Gnip’s Profile Geo enrichment data is available, and includes the specified “locality.” Note that profile_locality: will perform an exact string match. This will only match Tweets where Gnip has been able to provide formal Geography information for the profile location provided by the Twitter user, consistent with the description of the enrichment here.
profile_subregion:
Matches Tweets where Gnip’s Profile Geo enrichment data is available, and includes the “subRegion” field from the “address” object. In addition to targeting specific counties, these operators can be helpful to filter on a metro area without defining filters for every city and town within the region. This will only match Tweets where Gnip has been able to provide formal Geography information for the profile location provided by the Twitter user, consistent with the description of the enrichment here.
Standard profile location operators
The following Operators are available for filtering on location mentions in a user’s Twitter Profile Location and are not dependent on having the Profile Geo enrichment enabled:
bio_location:
The bio_location: operator performs a tokenized match against the user’s account-level location field. Note that this field is user-generated and does not necessarily reflect an actual location, and generally does not change from Tweet to Tweet.
Other geo operators
The following operator filters on fields which, while not explicitly location fields, may contain account-based location information.
bio:
Matches a keyword or phrase within the user bio of a Tweet. This is a tokenized match within the contents of the 'description' field within the User object.
Usage examples
In many cases, you may want to pull in as much data as possible related to a location, regardless of whether this information exists as Tweet-specific information, or information contained in the user’s account metadata. Below are examples of how the various operators can be combined in a rule to catch more types of data. In these examples, the rule is looking for mentions of the hashtag ‘#FlagstaffFire’ that contain various types of location data.
- Geo-tagged Tweets within a bounding box or associated with a Twitter Place that mentions “Boulder”:
#FlagstaffFire (bounding_box:[-105.301758 39.964069 -105.178505 40.09455] OR place:Boulder) - Tweets from users that have Profile Locations that mention Boulder, CO and not Boulder, NV, using standard Profile Location Operators:
#FlagstaffFire bio_location:boulder -(bio_location:nevada OR bio_location:", NV") - Tweets from users that have Profile Locations that mention Boulder, CO and not Boulder, NV, using Profile Geo Operators:
#FlagstaffFire profile_locality:boulder profile_region:colorado - Tweets with any hint of coming from the Boulder, CO area using Profile Geo Operators:
#FlagstaffFire (point_radius:[-105.292778 40.019444 25mi] OR place:"Boulder, CO" OR (profile_locality:boulder profile_region:colorado ))
Next steps
Ready to build your solution?