Scraping Tweets with Twint — No API Key Needed

Ramil Chaimongkolbutr
3 min readJun 18, 2021

--

Simple Twint tutorial where you can code along and easily follow.

Picture from https://www.promptcloud.com/

Getting to Know Twint?

I have come across Twint recently while searching for the way to scrape data from Twitter. Twint (Twitter Intelligence Tool) is a Python library that allows us to scrape Tweets from Twitter profiles without accessing Twitter’s API. You will be surprised of how easy and powerful Twint is.

How does It Work?

Twint basically relies on Twitter’s search operators. It allows us to scrape any Tweets from specific users, relatable subjects of your interest, certain topics, hashtags, locations or trends. Even sensitive information such as e-mail addresses or phone numbers. The powerfulness of the tool is up for your creation.

Perks?

Right off the bat, Twint is easy to use and very fast for an initial setup. It can fetch almost all Tweets while Twitter API limits to last 3200 Tweets only. Plus, you can use it as much as you like— without Twitter sign up and rate limitations.

Let’s Get Started!

The installing process is simple. I recommend you to clone Twint repository to your computer and install all required dependencies in requirements.txt file.

git clone --depth=1 https://github.com/twintproject/twint.git
cd twint
pip3 install . -r requirements.txt

Once you are done, you are ready to scrape Twitter!

Before we begin…

I have experienced the error, RuntimeError: This event loop is already running, while trying to scrape tweets from a username. To prevent the error, we need to import nest_asyncio and run this following line of codes.

#in case you don't have it in your environment
!pip install nest_asyncio
import nest_asyncio
nest_asyncio.apply()

Scraping from username…

Let’s try to scrape Tweets from Elon Musk which contain a word ‘doge.’

Credit: Elon Musk’s twitter
import twint
# Configure
c = twint.Config()
c.Username = "elonmusk"
c.Search = "doge"
c.Limit = 10
# Run
twint.run.Search(c)

Username is a name of Twitter account we want to pull. Search is our keyword (this can be a list of words as well) we are looking for. Limit is how many Tweets we wants to pull.

Example of result

In Twint configuration, you can pretty much play around with it. For example, you can choose to scrape within a period of time by adding:

c.Since = '2020-01-01'
c.until = '2021-01-01'

This means that we want all Tweets from 2020/01/01 to 2021/01/01.

You can also be specific with the type of data you want your result to show. For example, c.Images = True is showing Tweets with images, c.Videos = True is showing Tweets with videos, c.Media = True is shwoing Tweets with both images and videos.

Scraping Popular Tweets…

We can filter our pull by choosing minimum number of interactions.

c = twint.Config()
c.Username = "elonmusk"
c.Popular_tweets = True
c.Min_likes = 5000
c.Min_replies = 1000
c.Min_retweets = 100
# Run
twint.run.Search(c)

Popular_tweetsis showing a popular Tweets. Min_likes , Min_replies , and Min_retweets are showing Tweets that meet the chosen criteria in number of likes, replies, and retweets respectively.

Storing Tweets into a DataFrame…

We can also store our pull using pandas.

import pandas as pdc = twint.Config()
c.Username = "elonmusk"
c.Search = "doge"
c.Pandas = True
# Run
twint.run.Search(c)
elon_df = twint.storage.panda.Tweets_df

c.Pandas = True allows us to transform the data into a pandas dataframe by using twint.storage.panda.Tweets_df .

Example of the Dataframe

Overall

As you can see how easy it is to set up, Twint is a powerful tool that allows to generate unlimited Twitter data out off thin air. To learn more about Twint, please visit Twint github repository.

--

--

Responses (1)