So what is Snowflake?
Snowflake is a service used to generate unique IDs for objects within Twitter (Tweets, Direct Messages, Users, Collections, Lists etc.). These IDs are unique 64-bit unsigned integers, which are based on time, instead of being sequential. The full ID is composed of a timestamp, a worker number, and a sequence number. When consuming the API using JSON, it is important to always use the field id_str instead of id. This is due to the way Javascript and other languages that consume JSON evaluate large integers. If you come across a scenario where it doesn’t appear that id and id_str match, it’s due to your environment having already parsed the id integer, munging the number in the process. Read below for more information on how Twitter generates its ids.
The problem
Some programming languages such as Javascript cannot support numbers with > 53-bits. This can be easily examined by running a command similar to: (90071992547409921).toString()
in a browser console or by running the following JSON snippet through your JSON parser.
{"id": 10765432100123456789, "id_str": "10765432100123456789"}
In affected JSON parsers the ID will not be converted successfully and will lose accuracy. In some parsers there may even be an exception.
The solution
To allow Javascript and JSON parsers to read the IDs, Twitter objects include a string version of any ID when responding with JSON. Status, User, Direct Message, Saved Search and other IDs in the Twitter API are therefore returned as both an integer and a string in JSON responses.
For example, a status object contains an id
and an id_str
. The following JSON representation of a status object shows the two versions of the ID fields for each data point.
[
{
"coordinates": null,
"truncated": false,
"created_at": "Thu Oct 14 22:20:15 +0000 2010",
"favorited": false,
"entities": {
"urls": [
],
"hashtags": [
],
"user_mentions": [
{
"name": "Matt Harris",
"id": 777925,
"id_str": "777925",
"indices": [
0,
14
],
"screen_name": "themattharris"
}
]
},
"text": "@themattharris hey how are things?",
"annotations": null,
"contributors": [
{
"id": 819797,
"id_str": "819797",
"screen_name": "episod"
}
],
"id": 12738165059,
"id_str": "12738165059",
"retweet_count": 0,
"geo": null,
"retweeted": false,
"in_reply_to_user_id": 777925,
"in_reply_to_user_id_str": "777925",
"in_reply_to_screen_name": "themattharris",
"user": {
"id": 6253282,
"id_str": "6253282"
},
"source": "web",
"place": null,
"in_reply_to_status_id": 12738040524,
"in_reply_to_status_id_str": "12738040524"
}
]
What developers need to do
The first thing to do is to attempt to decode the JSON snippet above using your production code parser. Observe the output to confirm the ID has not lost accuracy.
- If your code converts the ID successfully without losing accuracy you are OK but should consider converting to the _str versions of IDs as soon as possible.
- If your code loses accuracy, convert your code to using the _str version. If you do not do this your code will be unable to interact with the Twitter API reliably.
- In some language parsers, the JSON may throw an exception when reading the ID value. If this happens in your parser you will need to ‘pre-parse’ the data, removing or replacing ID parameters with their _str versions.
Summary
- If you develop in Javascript, know that you will have to update your code to read the string version instead of the integer version.
- If you use a JSON decoder, validate that the example JSON, above, decodes without throwing exceptions. If exceptions are thrown, you will need to pre-parse the data.