Introduction
At its core, X is a public, real-time, and global communication network. Since 2006, X's evolution has been driven by both user use-patterns and conventions and new product features and enhancements. If you are using X data for historical research, understanding the timeline of this evolution is important for surfacing Posts of interest from the data archive.
X was launched as a simple SMS mobile app, and has grown into a comprehensive communication platform. A platform with a complete set of APIs. APIs have always been a pillar of the X network. The first API hit the streets soon after X was launched. When geo-tagging Posts was first introduced in 2009, it was made available through a Geo API (and later the ability to ‘geo-tag’ a Post was integrated into the X.com user-interface). Today, X's APIs drive the two-way communication network that has become the source of breaking news and sharing information. The opportunities to build on top of this global, real-time communication channel are endless.
X makes available two historical APIs that provide access to every publicly available Post: Historical PowerTrack and the Full-Archive Search API. Both APIs provide a set of operators used to query and collect Posts of interest. These operators match on a variety of attributes associated with every Post, hundreds of attributes such as the Post's text content, the author’s account name, and links shared in the Post. Posts and their attributes are encoded in JSON, a common text-based data-interchange format. So as new features were introduced, new JSON attributes appeared, and typically new API operators were introduced to match on those attributes. If your use-case includes a need to listen to what the world has said on X, the better you understand when operators started having JSON metadata to match on, the more effective your historical PowerTrack filters can be.
Next, we will introduce some key concepts that set the stage for understanding how updates in Post metadata affect finding your data signal of interest.
Key concepts
From user-conventions to X first-class objects
X users organically introduced new, and now fundamental, communication patterns to the X network. A seminal example is the hashtag, now nearly universally used across all social networks. Hashtags were introduced as a way to organize conversations and topics. On a network with hundreds of millions messages a day, tools to find Posts of interest are key, and hashtags have become a fundamental method. Soon after the use of hashtags grew, they received official status and support from X. As hashtags became a ‘first-class’ object, this meant many things. It meant hashtags became clickable/searchable in the X.com user interface. It also meant hashtags became a member of the X entities family, along with @mentions, attached media, stock symbols, and shared links. These entities are conveniently encoded in a pre-parsed JSON array, making it easier for developers to process, scan, and store them.
Retweets are another example of user-driven conventions becoming official objects. Retweeting emerged as a way of ‘forwarding’ content to others. It started as a manual process of copying/pasting a Post and prepending it with a “RT @” pattern. This process was eventually automated via a new Retweet button, complete with new JSON metadata. The ‘official’ Retweet was born. Other examples include ‘mentions’, sharing of media and web links, and sharing a location with your Post. Each of these use-patterns resulted in new x.com user-interface features, new supporting JSON, and thus new ways to match on Posts. All of these fundamental Post attributes have resulted in PowerTrack Operators used to match on them.
Post metadata, mutability, updates, and currency
While Post messages can be up to a fixed number of characters long, the JSON description of a Post consists of over 100 attributes. Attributes such as who posted, at what time, whether it’s an original Post or a Retweet, and an array of first-class objects such as hashtags, mentions, and shared links. For the account that posted, there is a User (or Actor) object with a variety of attributes that provide the user’s Profile and other account metadata. Profiles include a short biographical description, a home location (freeform text), preferred language, and an optional web site link.
Some account metadata never change (e.g. numeric user ID and created date), some change slowly over time, while other attributes change more frequently. People change jobs and move. Companies updates their information. When you are collecting historical Posts, it is important to understand how some metadata is as it was when Posted, and other metadata is as it is when the query is submitted.
With all historical APIs, the user's profile description, display name, and profile 'home' attributes are updated to the values at the time of query.
“Native” media
X.com and X mobile apps support adding photos and videos to Post by clicking a button and browsing your photo galleries. Now that they are integrated as first-class actions, videos and photos shared this way are referred to as ‘native’ media.
Many querying Operators work with these ‘native’ resources, including has:videos
, has:images
, and has:media
. These will match only on media content that was shared via X features. To match on other media hosted off of the X platform, you’ll want to use Operators that match on URL metadata.
So, before we dig into the Historical PowerTrack and Full-Archive Search product details, let’s take a tour of how X, as a product and platform, evolved over time.
X timeline
Below you will find a select timeline of X. Most of these X updates in some way fundamentally affected either user behavior, Post JSON contents, query Operators, or all three. Looking at X as a API platform, the following events in some way affected the JSON payloads that are used to encode Posts. In turn, those JSON details affect how X historical API match on them.
Note that this timeline list is generally precise and not exhaustive.
2006
- October
- @replies becomes a convention.
- $cashtags first emerge, but using for stock ticker mentions does not become common until early 2009. $Cashtags became a clickable/searchable link in June 2012.
- November - Favorites introduced.
2007
- January - @replies become a first-class object with a UI reply button with
in_reply_to
metadata. - April - Retweets become a convention.
- August - #hashtags emerge as a primary tool for searching and organizing Posts.
2009
- February - $cashtags become a common convention for discussing stock ticker symbols.
- May - Retweet ‘beta’ is introduced with “Via @” prepended to Post body.
- June - Verified accounts introduced.
- August - Retweets a first-class object with “RT @” pattern and new
retweet_status
metadata. - October - List feature launched.
- November - Post Geotagging API is launched, providing the first method for users to share location via third-party apps.
2010
- June - X Places introduced for geo-tagging Posts.
- August - Post button for websites is launched. Made sharing links easier.
2011
- May - Follow button introduced, making it easier to follow accounts associated with websites.
- August - Native photos introduced.
2012
- June - $Cashtags become a clickable/searchable link.
2014
- March - Photo tagging and up to four photos supported. Extended X Entities metadata was introduced.
- April - Emojis are natively supported in X UI. Emojis were commonly used in Posts since at least 2008.
2015
- April - A change in X's ‘post’ user-interface design results in fewer Posts being geo-tagged.
- October - X Polls introduced. Polls originally supported two choices with a 24-hour voting period. In November, Polls started supporting four choices with voting periods from 5 minutes to seven days. Poll metadata made available (enriched native format only) in February 2017.
2016
- February - Searchable GIFs natively hosted in Post compose.
- May - “Doing More with 140” (dmw140) announced, stating plans for new ways of handling Replies and attached media with respect to a Post's 140-character message.
- June - Native video support
- June - Quoted Retweets generally available.
- June - Stickers introduced for adding to photos.
- September - ‘Native attachments’ introduced with trailing URL not counted towards 140 characters (“dmw140, part 1”).
2017
- February - X Poll metadata included in Post metadata (enriched native format only).
- April - ‘Simplified Replies’ introduced with replied-to-accounts not counted towards 140 characters (“dmw140, part 2”).
2018
- May - GDPR updates user.time_zone set to null, user.utc_offset set to null, user.profile_background_image_url set to default value
- June - Updating quoteTweet formatting changes
2022
- September 29 - The ability to edit Posts is rolled out to a small test group. Edited Post metadata are added to the Post object where relevant. This includes edit_history and edit_controls objects. These metadata will not be returned for Posts that were created before editable functionality was added. No associated Operators for these metadata. To learn more about how Post edits work, see the Edit Posts fundamentals
Filtering tips
Being familiar with the X timeline of when and how new features were added can help you create more effective queries. Here, a query means a filter or rule that is applied by the X historical APIs to the Post archive, using PowerTrack Operators to match on Post JSON. An example is the lang:
Operator, which is used to match Posts in a specified language. Twitter provides a language classification service (supporting over 50 languages), and X APIs provide this metadata in the JSON that is generated for every Post. So, if a Post is written in Spanish the “lang” JSON attribute is set to “es”. So, if you build a filter with the lang:es
clause, it will only match on Post messages classified as Spanish.
The timeline information can also help better interpret the Post data received. Say you were researching the sharing of content about the 2008 and 2012 Summer Olympics. If you applied only the is:retweet
Operator to match on Retweets, no data would match in 2008. However, for 2012 there would likely be millions of Retweets. From this you potentially could erroneously conclude that in 2008 Retweets were not a user convention, or that simply no one Retweeted about those Olympics. Since Retweets became a first-class object in 2009, you need to add a ”RT @”
rule clause to help identify them in 2008.
Both Retweets and Post language classifying are examples of Post attributes with a long history and many product details. Below we will discuss more details of these and other attribute classes important to matching on and understanding X Data.
Recognizing false negatives
When it comes to writing filters, one important takeaway is that the metadata Operators match on all have “born on” dates. If you build a filter with an Operator that acts on metadata introduced after the Post was posted, you’ll have a false negative. For example, say you are interested in all Posts that mention ‘snow’ and share a video. If you build a rule with the has:videos
Operator, which matches on Posts with native videos, that clause will not match any Posts before 2015.
However, sharing of videos has been common on X long before 2015. Before then users shared links to videos hosted elsewhere, but in 2015, X built new ‘sharing video’ features directly into the platform. For finding these earlier Posts of interest, you would include a rule clause such as url:”youtube.com”
.
Note, with the Search APIs, there are some examples of metadata being ‘backfilled’ as its index was rebuilt. One good example are $cashtags, which became widely used to discuss stock symbols in 2009. After the $cashtag operator was introduced in 2015, the Search index was rebuilt, and in the process the symbol entity was extracted from all Post bodies, including early 2006 when $
was used mainly for slang; “I hope it $now$ $oon!”.
Identifying and filtering on Post attributes important to your use-case
Some metadata, such as X account numeric IDs, have existed since day one (and are an example of account metadata that never changes). Other metadata was not introduced until well after X started in 2006. Examples of new metadata being introduced include Retweets metadata, Post locations, URL titles and descriptions, and ‘native’ media. Below are some of the most common types of Post attributes that have been fundamentally affected by these X platform updates.
Filtering/matching behavior for these depends, in most cases, on which historical Post API is used. To help determine which product is the best fit for your research and use-case, the attribute details provided below include high-level product information.
X Profiles
Since at its core X is a global real-time communication channel, research with Post data commonly has an emphasis on who is communicating. Often it is helpful to know where a X user calls home. Often knowing that an account bio includes mentions of interests and hobbies can lead you to Post of interest. It is very common to want to listen for Posts from accounts of interest. Profile attributes are key to all of these use-cases.
Every account on X has a Profile that includes metadata such as X @handle, display name, a short bio, home location (freeform text entered by a user), number of followers and many others. Some attributes never change, such as numeric user ID and when the account was created. Others usually change day-to-day, week-to-week, or month-to-month, such as number of Posts posted and number of accounts followed and followers. Other account attributes can also change at any time, but tend to change less frequently: display name, home location, and bio.
The JSON payload for every Post includes account profile metadata for the Post's author. If it is a Retweet, it also includes profile metadata for the account that posted the original Post.
The mutability of a Post's profile metadata depends entirely on the historical product used. The Search APIs serve up historical Posts with the profile settings as it is at the time of retrieval. For Historical PowerTrack, the profile is as it was at the time the Post was posted, except for data before 2011. For Posts older than 2011, the profile metadata reflects the profile as it was in September 2011.
Original Post and Retweets
Retweets are another example of user-driven conventions becoming official objects. Retweeting emerged as a way of ‘forwarding’ content to others. It started as a manual process of copying/pasting a Post and prepending it with a “RT @” pattern. This process was eventually automated via a new Retweet button, complete with new JSON metadata. The ‘official’ Retweet was born and the action of retweeting became a first-class Post event. Along with the new Retweet button, new metadata was introduced such as the complete payload of the original Post.
Whether a Post is original or shared is a common filtering ‘switch.’ In some cases, only original content is needed. In other cases, Post engagement is of primary importance so Retweets are key. The PowerTrack is:retweet
Operator enables users to either include or exclude Retweets. If pulling data from before August 2009, users need to have two strategies for Retweet matching (or not matching). Before August 2009, the Post message itself needs to be checked, using exact phrase matching, for matches on the “@RT ” pattern. For periods after August 2009, the is:retweet
Operator is available.
Post language classifications
The language a Post is written in is a common interest. Post language can help infer a Post's location and often only a specific language is needed for analysis or display. (X profiles also have a preferred language setting.)
For filtering on a Post's language classification, X's historical products (Search API and Historical PowerTrack) are quite different. When the Search archive was built, all posts were backfilled with the X language classification. Therefore the lang:
Operator is available for the entire post archive. With Historical PowerTrack, X's language classification metadata is available in the archive beginning on March 26, 2013.
Geo-referencing Posts
Being able to tell where a Post was posted (i.e., geo-referencing it) is important to many use-cases. There are three primary methods for geo-referencing Posts:
- Geographical references in a Post message
- Posts geo-tagged by the user.
- Account profile ‘home’ location set by a user
If geo-referencing is key to your use-case, be sure to review our filtering posts by locationand post geo metadata tutorials.
Geographical references in a Post message
Matching on geographic references in the Post message, while often the most challenging method since it depends on local knowledge, is an option for the entire Post archive. Here is an example geo-referenced match from 2006 for the San Francisco area based on a ‘golden gate’ filter:
https://x.com/biz/statuses/28311
Posts geo-tagged by the user
In November 2009 X introduced its Post Geotagging API that enabled Posts to be geo-tagged with an exact location. In June 2010 X introduced X Places that represent a geographic area on the venue, neighborhood, or town scale. Approximately 1-2% of Posts are geo-tagged using either method.
The available geo-tagging history is dependent on the Historical API you are using. With the Search APIs the ability to start matching on Posts with some Geo Operators started in March 2010, and with others on February 2015. If you are using Historical PowerTrack, geo-referencing starts on September 1, 2011. When the Historical PowerTrack archive was built, all geo-tagging before this date was not included.
Account profile ‘home’ location set by a user
All X users have the opportunity to set their Profile Location, indicating their home location. Millions of X users provide this information, and it significantly increases the amount of geodata in the X Firehose. This location metadata is a non-normalized, user-generated, free-form string. Approximately 30% of accounts have Profile Geo metadata that can be resolved to the country level.
As with Post geo, the methods to match and the time periods available depends on the Historical API you are using. Historical PowerTrack enables users to attempt their own custom matching on these free-form strings. To help make that process easier, X also provides a Profile Geo Enrichment that performs the geocoding where possible, providing normalized metadata and corresponding Operators. Profile Geo Operators are available in both Historical PowerTrack and the Search APIs. With Historical PowerTrack, these Profile Geo metadata is available starting in June 2014. With the Search APIs, this metadata is available starting in February 2015.
Shared links and media
Sharing web page links, photos and videos have always been a fundamental X use-case. Early in its history, all of these actions involved including a URL link in the Post message itself. In 2011 X integrated sharing photos directly into its user-interface. In 2016, native videos were added.
Given this history, there are a variety of filtering Operators used for matching on this content. There are a set of Operators that match on whether Posts have shared links, photos, and videos. Also, since most URLs shared on X are shortened to use up fewer of a Post's characters (e.g. generated by a service such as bitly or tinyurl), X provides data enrichments that generate a complete, expanded URL that can be matched on. For example, if you wanted to match on Posts that included links discussing X and Early-warning systems, a filter that references ‘severe weather communication’ would match a Post containing this http://bit.ly/1XV1tG4 URL.
In March 2012, the expanded URL enrichment was introduced. Before this time, the Post payloads included only the URL as provided by the user. So, if the user included a shortened URL it can be challenging to match on (expanded) URLs of interest. With both Historical PowerTrack and the Search APIs, these metadata are available starting in March 2012.
In July 2016, the enhanced URL enrichment was introduced. This enhanced version provides a web site’s HTML title and description in the Post payload, along with Operators for matching on those. With Historical PowerTrack, these metadata become available in July 2016. With the Search APIs, these metadata begin emerging in December 2014.
In September 2016 X introduced ‘native attachments’ where a trailing shared link is not counted against the 140 Post character limit. Both URL enrichments still apply to these shared links.
For other URL product-specific details on URL filtering, see the corresponding articles for more information.
Next steps
Now that we’ve explored the timeline of when key X features were introduced and learned how these metadata changes affect filtering at a high-level, the next step is to get into the many product-specific details: