Scraping Tweets with Twint — No API Key Needed
Simple Twint tutorial where you can code along and easily follow.
Getting to Know Twint?
I have come across Twint recently while searching for the way to scrape data from Twitter. Twint (Twitter Intelligence Tool) is a Python library that allows us to scrape Tweets from Twitter profiles without accessing Twitter’s API. You will be surprised of how easy and powerful Twint is.
How does It Work?
Twint basically relies on Twitter’s search operators. It allows us to scrape any Tweets from specific users, relatable subjects of your interest, certain topics, hashtags, locations or trends. Even sensitive information such as e-mail addresses or phone numbers. The powerfulness of the tool is up for your creation.
Perks?
Right off the bat, Twint is easy to use and very fast for an initial setup. It can fetch almost all Tweets while Twitter API limits to last 3200 Tweets only. Plus, you can use it as much as you like— without Twitter sign up and rate limitations.
Let’s Get Started!
The installing process is simple. I recommend you to clone Twint repository to your computer and install all required dependencies in requirements.txt
file.
git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt
Once you are done, you are ready to scrape Twitter!
Before we begin…
I have experienced the error, RuntimeError: This event loop is already running, while trying to scrape tweets from a username. To prevent the error, we need to import nest_asyncio
and run this following line of codes.
#in case you don't have it in your environment
!pip install nest_asyncioimport nest_asyncio
nest_asyncio.apply()
Scraping from username…
Let’s try to scrape Tweets from Elon Musk which contain a word ‘doge.’
import twint
# Configure
c = twint.Config()
c.Username = "elonmusk"
c.Search = "doge"
c.Limit = 10# Run
twint.run.Search(c)
Username
is a name of Twitter account we want to pull. Search
is our keyword (this can be a list of words as well) we are looking for. Limit
is how many Tweets we wants to pull.
In Twint configuration, you can pretty much play around with it. For example, you can choose to scrape within a period of time by adding:
c.Since = '2020-01-01'
c.until = '2021-01-01'
This means that we want all Tweets from 2020/01/01 to 2021/01/01.
You can also be specific with the type of data you want your result to show. For example, c.Images = True
is showing Tweets with images, c.Videos = True
is showing Tweets with videos, c.Media = True
is shwoing Tweets with both images and videos.
Scraping Popular Tweets…
We can filter our pull by choosing minimum number of interactions.
c = twint.Config()
c.Username = "elonmusk"
c.Popular_tweets = True
c.Min_likes = 5000
c.Min_replies = 1000
c.Min_retweets = 100# Run
twint.run.Search(c)
Popular_tweets
is showing a popular Tweets. Min_likes
, Min_replies
, and Min_retweets
are showing Tweets that meet the chosen criteria in number of likes, replies, and retweets respectively.
Storing Tweets into a DataFrame…
We can also store our pull using pandas.
import pandas as pdc = twint.Config()
c.Username = "elonmusk"
c.Search = "doge"
c.Pandas = True# Run
twint.run.Search(c)elon_df = twint.storage.panda.Tweets_df
c.Pandas = True
allows us to transform the data into a pandas dataframe by using twint.storage.panda.Tweets_df
.
Overall
As you can see how easy it is to set up, Twint is a powerful tool that allows to generate unlimited Twitter data out off thin air. To learn more about Twint, please visit Twint github repository.