Twitter entities
Introduction
Entities provide metadata and additional contextual information about content posted on Twitter. The entities
section provides arrays of common things included in Tweets: hashtags, user mentions, links, stock tickers (symbols), Twitter polls, and attached media. These arrays are convenient for developers when ingesting Tweets, since Twitter has essentially pre-processed, or pre-parsed, the text body. Instead of needing to explicitly search and find these entities in the Tweet body, your parser can go straight to this JSON section and there they are.
Beyond providing parsing conveniences, the entities
section also provides useful ‘value-add’ metadata. For example, if you are using the Enhanced URLs enrichment, URL metadata include fully-expanded URLs, as well as associated website titles and descriptions. Another example is when there are user mentions, the entities metadata include the numeric user ID, which are useful when making requests to many Twitter APIs.
Every Tweet JSON payload includes an entities
section, with the minimum set of hashtags
, urls
, user_mentions
, and symbols
attributes, even if none of those entities are part of the Tweet message. For example, if you examine the JSON for a Tweet with a body of “Hello World!” and no attached media, the Tweet’s JSON will include the following content with entity arrays containing zero items:
"entities": {
"hashtags": [
],
"urls": [
],
"user_mentions": [
],
"symbols": [
]
}
Notes:
- media and polls entities will only appear when that type of content is part of the Tweet.
- if you are working with native media (photos, videos, or GIFs), the Extended Entities object is the way to go.
Entities object
The entities
and extended_entities
sections are both made up of arrays of entity objects. Below you will find descriptions for each of these entity objects, including data dictionaries that describe the object attribute names, types, and short description. We’ll also indicate which PowerTrack Operators match these attributes, and include some sample JSON payloads.
A collection of common entities found in Tweets, including hashtags, links, and user mentions. This entities
object does include a media
attribute, but its implementation in the entiites
section is only completely accurate for Tweets with a single photo. For all Tweets with more than one photo, a video, or animated GIF, the reader is directed to the extended_entities
section.
Entities data dictionary
The entities object is a holder of arrays of other entity sub-objects. After illustrating the entities
structure, data dictionaries for these sub-objects, and the Operators that match them, will be provided.
Field | Type | Description |
---|---|---|
hashtags | Array of Hashtag Objects | Represents hashtags which have been parsed out of the Tweet text. Example:
{
"hashtags": [
{
"indices": [
32,
38
],
"text": "nodejs"
}
]
}
|
media | Array of Media Objects | Represents media elements uploaded with the Tweet. Example:
{
"media": [
{
"display_url": "pic.twitter.com/5J1WJSRCy9",
"expanded_url": "https://twitter.com/nolan_test/status/930077847535812610/photo/1",
"id": 9.300778475358126e17,
"id_str": "930077847535812610",
"indices": [
13,
36
],
"media_url": "http://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg",
"media_url_https": "https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg"
"sizes": {
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
},
"large": {
"h": 1366,
"resize": "fit",
"w": 2048
},
"medium": {
"h": 800,
"resize": "fit",
"w": 1200
},
"small": {
"h": 454,
"resize": "fit",
"w": 680
}
},
"type": "photo",
"url": "https://t.co/5J1WJSRCy9",
}
]
}
|
urls | Array of URL Objects | Represents URLs included in the text of a Tweet. Example (without Enhanced URLs enrichment enabled):
{
"urls": [
{
"indices": [
32,
52
],
"url": "http://t.co/IOwBrTZR",
"display_url": "youtube.com/watch?v=oHg5SJ…",
"expanded_url": "http://www.youtube.com/watch?v=oHg5SJYRHA0"
}
]
}
Example (with Enhanced URLs enrichment enabled):
{"urls": [
{
"url": "https://t.co/D0n7a53c2l",
"expanded_url": "http://bit.ly/18gECvy",
"display_url": "bit.ly/18gECvy",
"unwound": {
"url": "https://www.youtube.com/watch?v=oHg5SJYRHA0",
"status": 200,
"title": "RickRoll'D",
"description": "http://www.facebook.com/rickroll548 As long as trolls are still trolling, the Rick will never stop rolling."
},
"indices": [
62,
85
]
}
]
}
|
user_mentions | Array of User Mention Objects | Represents other Twitter users mentioned in the text of the Tweet. Example:
{
"user_mentions": [
{
"name": "Twitter API",
"indices": [
4,
15
],
"screen_name": "twitterapi",
"id": 6253282,
"id_str": "6253282"
}
]
}
|
symbols | Array of Symbol Objects | Represents symbols, i.e. $cashtags, included in the text of the Tweet. Example:
{
"symbols": [
{
"indices": [
12,
17
],
"text": "twtr"
}
]
}
|
polls | Array of Poll Objects | Represents Twitter Polls included in the Tweet. Example:
{"polls": [
{
"options": [
{
"position": 1,
"text": "I read documentation once."
},
{
"position": 2,
"text": "I read documentation twice."
},
{
"position": 3,
"text": "I read documentation over and over again."
}
],
"end_datetime": "Thu May 25 22:20:27 +0000 2017",
"duration_minutes": 60
}
]
}
|
Hashtag object
The entities
section will contain a hashtags
array containing an object for every hashtag included in the Tweet body, and include an empty array if no hashtags are present.
The PowerTrack #
Operator is used to match on the text
attribute. The has:hashtags
Operator will match if there is at least one item in the array.
Field | Type | Description |
indices | Array of Int | An array of integers indicating the offsets within the Tweet text where the hashtag begins and ends. The first integer represents the location of the # character in the Tweet text string. The second integer represents the location of the first character after the hashtag. Therefore the difference between the two numbers will be the length of the hashtag name plus one (for the ‘#’ character). Example:
"indices":[32,38]
|
text | String | Name of the hashtag, minus the leading ‘#’ character. Example:
"text":"nodejs"
|
Media object
The entities
section will contain a media
array containing a single media object if any media object has been ‘attached’ to the Tweet. If no native media has been attached, there will be no media
array in the entities
. For the following reasons the extended_entities
section should be used to process Tweet native media:
+ Media type
will always indicate ‘photo’ even in cases of a video and GIF being attached to Tweet.
+ Even though up to four photos can be attached, only the first one will be listed in the entities
section.
The has:media
Operator will match if this array is populated.
Field | Type | Description |
display_url | String | URL of the media to display to clients. Example:
"display_url":"pic.twitter.com/rJC5Pxsu"
|
expanded_url | String | An expanded version of display_url. Links to the media display page. Example:
"expanded_url": "http://twitter.com/yunorno/status/114080493036773378/photo/1"
|
id | Int64 | ID of the media expressed as a 64-bit integer. Example:
"id":114080493040967680
|
id_str | String | ID of the media expressed as a string. Example:
"id_str":"114080493040967680"
|
indices | Array of Int | An array of integers indicating the offsets within the Tweet text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Tweet text. The second integer represents the location of the first non-URL character occurring after the URL (or the end of the string if the URL is the last part of the Tweet text). Example:
"indices":[15,35]
|
media_url | String | An http:// URL pointing directly to the uploaded media file. Example:
"media_url":"http://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg"
For media in direct messages, It is not possible to access images via an authenticated twitter.com session. Please visit this page to learn how to account for these recent change. You cannot directly embed these images in a web page. See Photo Media URL formatting for how to format a photo's URL, such as |
media_url_https | String | An https:// URL pointing directly to the uploaded media file, for embedding on https pages. Example:
"media_url_https":"https://p.twimg.com/AZVLmp-CIAAbkyy.jpg"
For media in direct messages, It is not possible to access images via an authenticated twitter.com session. Please visit this page to learn how to account for these recent change. You cannot directly embed these images in a web page. See Photo Media URL formatting for how to format a photo's URL, such as |
sizes | Size Object | An object showing available sizes for the media file. Example:
{
"sizes": {
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
},
"large": {
"h": 1366,
"resize": "fit",
"w": 2048
},
"medium": {
"h": 800,
"resize": "fit",
"w": 1200
},
"small": {
"h": 454,
"resize": "fit",
"w": 680
}
}
}
See Photo Media URL formatting for how to format a photo's URL, such as |
source_status_id | Int64 | Nullable. For Tweets containing media that was originally associated with a different tweet, this ID points to the original Tweet. Example:
"source_status_id": 205282515685081088
|
source_status_id_str | Int64 | Nullable. For Tweets containing media that was originally associated with a different tweet, this string-based ID points to the original Tweet. Example:
"source_status_id_str": "205282515685081088"
|
type | String | Type of uploaded media. Possible types include photo, video, and animated_gif. Example:
"type":"photo"
|
url | String | Wrapped URL for the media link. This corresponds with the URL embedded directly into the raw Tweet text, and the values for the
"url":"http://t.co/rJC5Pxsu"
|
Media size objects
All Tweets with native media (photos, video, and GIFs) will include a set of ‘thumb’, ‘small’, ‘medium’, and ‘large’ sizes with height and width pixel sizes. For photos and preview image media URLs, Photo Media URL formatting specifies how to construct different URLs for loading different sized photo media.
Sizes object
Field | Type | Description |
thumb | Size Object | Information for a thumbnail-sized version of the media. Example:
"thumb":{"h":150, "resize":"crop", "w":150}
Thumbnail-sized photo media will be limited to fill a 150x150 boundary and cropped. |
large | Size Object | Information for a large-sized version of the media. Example:
"large":{"h":1366, "resize":"fit", "w":2048}
Large-sized photo media will be limited to fit within a 2048x2048 boundary. |
medium | Size Object | Information for a medium-sized version of the media. Example:
"medium":{"h":800, "resize":"fit", "w":1200}
Medium-sized photo media will be limited to fit within a 1200x1200 boundary. |
small | Size Object | Information for a small-sized version of the media. Example:
"small":{"h":454, "resize":"fit", "w":680}
Small-sized photo media will be limited to fit within a 680x680 boundary. |
Size object
Field | Type | Description |
w | Int | Width in pixels of this size. Example:
"w":150
|
h | Int | Height in pixels of this size. Example:
"h":150
|
resize | String | Resizing method used to obtain this size. A value of fit means that the media was resized to fit one dimension, keeping its native aspect ratio. A value of crop means that the media was cropped in order to fit a specific resolution. Example:
"resize":"crop"
|
Photo Media URL Formatting
Photo media on Twitter can be loaded in different sizes. It is best to load the smallest size image that is larger enough to fit into a particular image viewport. To load different sizes, the Size Object and media_url (or media_url_https) need to be combined in a particular format. We'll use the media entity example object already provided for our example in constructing a photo media URL.
The media_url
or media_url_https
on their own can be loaded, which will result in the medium variant being loaded by default. It is preferable, however, to provide a fully formatted photo media URL when possible.
There are three parts of a photo media URL:
Base URL | The base URL is the media URL without the file extension. For example: "media_url_https": "https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg", The base URL is then: https://pbs.twimg.com/media/DOhM30VVwAEpIHq |
Format | The format is the type of photo the image is formatted as. Possible formats are jpg or png, which is provided as the extension of the media URL. For example: "media_url_https": "https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg", The format is then: jpg |
Name | The name is the field name of the size to load. For example: { The name when loading the large-sized photo would be: large |
We take these three parts (base URL, format and name) and combine them into the photo media URL to load. There are 2 formats for loading images this way, legacy and modern. All image loads should stop using the legacy format and use the modern format. Using the modern format will result in better CDN hit rate for the caller, thus improving load latencies by being less likely to have to generate and load the media from the Data Center.
Legacy format | The legacy format is deprecated. Photo media loads should all move to the modern format. <base_url>.<format>:<name> For example: https://pbs.twimg.com/media/DOhM30VVwAEpIHq.jpg:large |
Modern format | The modern format for loading photos was established at Twitter in 2015 and has been defacto since 2017. All photo media loads should move to this format. <base_url>?format=<format>&name=<name> For example: https://pbs.twimg.com/media/DOhM30VVwAEpIHq?format=jpg&name=large Note: the items in the query string for the photo media URL are in alphabetical order. If media loading were to add any additional query items, alphabetical ordering would continue to be necessary. For example, if there was the hypothetical new query item called preferred_format, it would go after format and name in the query string. |
URL object
The entities
section will contain a urls
array containing an object for every link included in the Tweet body, and include an empty array if no links are present.
The has:links
Operator will match if there is at least one item in the array. The url:
Operator is used to match on the expanded_url
attribute. If you are using the Expanded URL enrichment, the url:
Operator is used to match on the unwound.url
(fully unwound URL) attribute. If you are using the Enhanced URL enrichment, the url_title:
and url_decription:
Operators are used to match on the unwound.title
and unwound.description
attributes.
Field | Type | Description |
display_url | String | URL pasted/typed into Tweet. Example:
"display_url":"bit.ly/2so49n2"
|
expanded_url | String | Expanded version of
"expanded_url":"http://bit.ly/2so49n2"
|
indices | Array of Int | An array of integers representing offsets within the Tweet text where the URL begins and ends. The first integer represents the location of the first character of the URL in the Tweet text. The second integer represents the location of the first non-URL character after the end of the URL. Example:
"indices":[30,53]
|
url | String | Wrapped URL, corresponding to the value embedded directly into the raw Tweet text, and the values for the indices parameter. Example:
"url":"https://t.co/yzocNFvJuL"
|
If you are using the Expanded and/or Enhanced URL enrichments, the following metadata is available under the unwound
attribute:
Field | Type | Description |
url | String | The fully unwound version of the link included in the Tweet. Example:
"url":"https://blog.twitter.com/en_us/topics/insights/2016/using-twitter-as-a-go-to-communication-channel-during-severe-weather-events.html"
|
status | Int | Final HTTP status of the unwinding process, a '200' indicating success. Example:
200
|
title | String | HTML title for the link. Example:
"title":"Using Twitter as a ‘go-to’ communication channel during severe weather"
|
description | String | HTML description for the link. Example:
"description":"Using Twitter as a ‘go-to’ communication channel during severe weather"
|
User mention object
The entities
section will contain a user_mentions
array containing an object for every user mention included in the Tweet body, and include an empty array if no user mention is present.
The PowerTrack @
Operator is used to match on the screen_name
attribute. The has:mentions
Operator will match if there is at least one item in the array.
Field | Type | Description |
id | Int64 | ID of the mentioned user, as an integer. Example:
"id":6253282
|
id_str | String | If of the mentioned user, as a string. Example:
"id_str":"6253282"
|
indices | Array of Int | An array of integers representing the offsets within the Tweet text where the user reference begins and ends. The first integer represents the location of the ‘@’ character of the user mention. The second integer represents the location of the first non-screenname character following the user mention. Example:
"indices":[4,15]
|
name | String | Display name of the referenced user. Example:
"name":"Twitter API"
|
screen_name | String | Screen name of the referenced user. Example:
"screen_name":"twitterapi"
|
Symbol object
The entities
section will contain a symbols
array containing an object for every $cashtag included in the Tweet body, and include an empty array if no symbol is present.
The PowerTrack $
Operator is used to match on the text
attribute. The has:symbols
Operator will match if there is at least one item in the array.
Field | Type | Description |
indices | Array of Int | An array of integers indicating the offsets within the Tweet text where the symbol/cashtag begins and ends. The first integer represents the location of the $ character in the Tweet text string. The second integer represents the location of the first character after the cashtag. Therefore the difference between the two numbers will be the length of the hashtag name plus one (for the ‘$’ character). Example:
"indices":[12,17]
|
text | String | Name of the cashhtag, minus the leading ‘$’ character. Example:
"text":"twtr"
|
Poll object
The entities
section will contain a polls
array containing a single poll
object if the Tweet contains a poll. If no poll is included, there will be no polls
array in the entities
section.
Note that these Poll metadata are only available with the following Enterprise APIs:
- Volume streams (Decahose )
- Real-time PowerTrack
- Historical PowerTrack
- Twitter Search APIs (Full-Archive Search and 30-Day Search)
Field | Type | Description |
options | Array of Option Object | An array of options, each having a poll position, and the text for that position. Example:
{"options": [
{
"position": 1,
"text": "I read documentation once."
}
]
}
|
end_datetime | String | Time stamp (UTC) of when poll ends. Example:
"end_datetime": "Thu May 25 22:20:27 +0000 2017"
|
duration_minutes | String | Duration of poll in minutes. Example:
"duration_minutes": 60
|
Retweet and Quote Tweet details
From the Twitter API perspective, Retweet and Quote Tweets are special kinds of Tweets that contain the original Tweet as an embedded object. So Retweets and Quote Tweet objects are parents of a child 'original' Tweet (and thus double the size). Retweets have a top-level "retweeted_status" object, and Quoted Tweets have a "quoted_status" object. For consistency, these top-level Retweet and Quote Tweet objects also have a text property and associated entities. However, the entities at the top level can differ from the entities provided by the embedded 'original' entities. In case of Retweets, new text is prepended to the original Tweet body. For Quoted Tweets, new text is appended to the Tweet body.
In general, the best practice is to retrieve the text, entities, original author and date from the original Tweet in retweeted_status whenever this exists. An exception is getting Twitter entities that are part of the additive Quote. See below for more details and tips.
Retweets
An important detail with Retweets is that no additional Twitter entities can be added to the Tweet. Users can not add hashtags, URLs or other details when they Retweet. However, the Retweet (top-level) text attribute is composed of the original Tweet text with “RT @username: ” prepended.
In some cases, especially with accounts with long user names, the combination of these new characters and the original Tweet body can easily exceed the original Tweet text length limit of 140 characters. In order to preserve support for 140 character based display and storage, the top-level body truncates the end of the Tweet body and adds an ellipsis (“…”). Consequently, some top-level entities positioned at the end of the original Tweet might be incorrect or missing, for instance in the case of a truncated hashtag or URL entry.
This Tweet, https://twitter.com/FloodSocial/status/907974220298125312, has the following Tweet text:
Just another test Tweet that needs to be exactly 140 characters with trailing URL and hashtag http://wapo.st/2w8iwPQ #Testing
In the above example, both the URL and hashtag were affected. Since the hashtag was completely truncated and the URL partially truncated, these are missing from the the top-level entities. You will also notice the additional user_mentions top-level entity coming from the “RT @floodsocial: ” prefix on the text field.
However, the Tweet text and entities in retweeted_status perfectly reflect the original Tweet with no truncation or incorrect entities, hence our recommendation to rely on the nested retweeted_status object for Retweets.
Quote Tweets
Quote Tweets were introduced in 2016, and differ from Retweets in that when you "quote" a Tweet you are adding new content "on top" of a shared Tweet. This new content can include nearly anything an original Tweet can have, including new text, hashtags, mentions, and URLs.
Quote Tweets can contain native media (photos, videos, and GIFs), and will appear under the entities object.
Since Twitter entities can be added, the Quote entities are likely different from the original entities.
In this example, a new URL and hashtag were positioned at the end of the Quote Tweet.
This Tweet, https://twitter.com/FloodSocial/status/907983973225160704, has the following Tweet text:
strange and equally tragic when islands flood... trans-atlantic testing of quote tweets | @thisuser @thatuser http://bit.ly/2vMMDuu #testing
In this case, the top-level entities do not reflect the Quote details.
However, the Tweet text and entities in extended_tweet perfectly reflect the Quote Tweet with no truncation or incorrect entities, hence our recommendation to rely on the nested extended_tweet object for Quote Tweets.
Entities for user object
Entities for User Objects describe URLs that appear in the user defined profile URL and description fields. They do not describe hashtags or user_mentions. Unlike Tweet entities, user entities can apply to multiple fields within its parent object — to disambiguate, you will find a parent nodes called url and description that indicate which field contains the entitized URL.
In this example, the user url field contains a t.co link that is fully expanded within the entities/url/urls[0] node of the response. The user does not have a wrapped URL in their description.
JSON example
{
"id": 6253282,
"id_str": "6253282",
"name": "Twitter API",
"screen_name": "twitterapi",
"location": "San Francisco, CA",
"description": "The Real Twitter API. I tweet about API changes, service issues and happily answer questions about Twitter and our API. Don't get an answer? It's on my website.",
"url": "http:\/\/t.co\/78pYTvWfJd",
"entities": {
"url": {
"urls": [
{
"url": "http:\/\/t.co\/78pYTvWfJd",
"expanded_url": "http:\/\/dev.twitter.com",
"display_url": "dev.twitter.com",
"indices": [
0,
22
]
}
]
},
"description": {
"urls": [
]
}
}
}
Entities for Direct Messages
Entities for Direct Messages are very similar to entities for Tweets. However, there are a few differences concerning the media entities.
Unlike media shared in Tweets, media shared in Direct Messages requires authorization to view. This authorization can be presented via an authenticated twitter.com session or by signing a request with the User’s access token using OAuth 1.0A.
Also, in Tweets, media URLs are only in the media entities, but in Direct Messages, media URLs are in both media and URLs entities.
JSON example
{
"id": 411031503817039874,
"id_str": "411031503817039874",
"text": "test $TWTR @twitterapi #hashtag http:\/\/t.co\/p5dOtmnZyu https:\/\/t.co\/ZSvIEMOPb8",
"created_at": "Thu Dec 12 07:15:21 +0000 2013",
"entities": {
"hashtags": [
{
"text": "hashtag",
"indices": [
23,
31
]
}
],
"symbols": [
{
"text": "TWTR",
"indices": [
5,
10
]
}
],
"urls": [
{
"url": "http:\/\/t.co\/p5dOtmnZyu",
"expanded_url": "http:\/\/dev.twitter.com",
"display_url": "dev.twitter.com",
"indices": [
32,
54
]
},
{
"url": "https:\/\/t.co\/ZSvIEMOPb8",
"expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
"display_url": "pic.twitter.com\/ZSvIEMOPb8",
"indices": [
55,
78
]
}
],
"user_mentions": [
{
"screen_name": "twitterapi",
"name": "Twitter API",
"id": 6253282,
"id_str": "6253282",
"indices": [
11,
22
]
}
],
"media": [
{
"id": 411031503833792512,
"id_str": "411031503833792512",
"indices": [
55,
78
],
"media_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
"media_url_https": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
"url": "https:\/\/t.co\/ZSvIEMOPb8",
"display_url": "pic.twitter.com\/ZSvIEMOPb8",
"expanded_url": "https:\/\/ton.twitter.com\/1.1\/ton\/data\/dm\/411031503817039874\/411031503833792512\/cOkcq9FS.jpg",
"type": "photo",
"sizes": {
"medium": {
"w": 600,
"h": 450,
"resize": "fit"
},
"large": {
"w": 1024,
"h": 768,
"resize": "fit"
},
"thumb": {
"w": 150,
"h": 150,
"resize": "crop"
},
"small": {
"w": 340,
"h": 255,
"resize": "fit"
}
}
}
]
}
}