This is step 4 of the learning path, How to detect signal from noise and build powerful filtering rules.
Now that you have established what “signal” means to you (see Step 2) and you have an initial rule set up (see Step 3), it’s important to identify what “noise” you want to exclude from the data returned to you. Excluding noise is key to ensuring that you do not consume unwanted or unnecessary data and to help reduce the processing time and effort on your end.
Here are some tactics to consider when looking for ways to reduce noise:
Identify common patterns of noise
- Can you identify any commonality among hashtags, phrases, or keywords when reviewing Tweets that are not of interest to you? Exclude any such noise by explicitly negating these terms in your rule(s).
- Does your rule include keywords or terms that take on a different meaning in different contexts (for example, the word “apple” meaning the fruit or the tech company)? In this case, you will need to find ways to be more specific by negating or adding terms that can help specify the desired context.
|
You may want to filter out Tweets from bots (use case dependant)
- Exclude Tweets from users who have the word "bot" or "TwitterBot" in their profile bio:
-bio:bot -bio_name:bot -bio_location:bot -bio:TwitterBot -bio_name:TwitterBot -bio_location:TwitterBot
Please note: Our Developer Policy stipulates that bots must be labelled with the hashtag “#TwitterBot”. This rule allows you to filter out accounts with this label.
- Exclude Tweets from users who are not following many users:
-friends_count:0..10
Please note: Although bot accounts can have a large follower base, they tend to follow very few users themselves. In the above example, we use the range functionality (0..10) to filter out any account that follows between 0 and 10 users. You can achieve the same thing by specifying that the account must follow 10+ users:
friends_count:10
|
Use attribute filters
- For example, you may want to filter out Tweets from users who aren’t very influential on the platform (which could be defined as users who don’t have many followers). Use filtering operators such asfollowers_count:andis:verified.
|
Create narrowcasted rules
- Exclude Retweets. This can be especially helpful as you get started with your rule(s), to reduce the number of Tweets returned and make it easier to run an initial analysis of the data.
-is:retweet
Please note: The above exclusion will ensure that you filter original content only. This has nothing to do with Tweet engagement metrics.
- Where possible, use “AND” logic (whitespace) instead of “OR” logic.
- Instead of using operators such as has:symbols, has:hashtags, and has:mentions, be more specific and explicitly outline the symbols, hashtags, or mentions that generate signal for your use case.
- Exclude promoted Tweets using the negated -is:nullcast operator.
Please note: “Nullcasted Tweets” are Tweets created through the Ads platform.
|
At this point, a very simplified version of what your rule might look like goes as follows:
(SIGNAL) -NOISE
↑ ↑
Grouping 1 Grouping 2
|
These tactics are exemplified in our final article in this learning path, "Walkthrough: what this means in practice". However, we still have one more article before we get there.