How to Predict the Stock Market by Scraping Social Media Data

Today, I want to talk about how insane social media is. In particular, how much information is buzzing within the social media realm on Twitter, Facebook, Instagram, and Reddit.

With all this information available, there must be some way to harness it for world-domination (or… just predicting the stock market, I guess.)

Why Use Twitter Data?

Thousands of Tweets are shared on Twitter per second, hundreds of millions per day, hundreds of billions per year – all of which is an absolutely untapped resource for gathering data.

You might be thinking, “Tweets? Really? Aren’t Tweets stupid?” And yes, some Tweets (not gonna lie) are pretty stupid.

potato tweet

But setting aside the amount of Tweets that are jokes, memes, or irrelevant to our task – Twitter is also a place where people discuss current events and major news. The key thing about Twitter is also that there is very little delay between an event occurring and people tweeting about it.

Every minute of everyday, people are discussing events, companies, and people. In particular, people are discussing their views and opinions of a company, which ultimately is what the price of a company is based on.

That is how useful Twitter can be in relation to the stock market.


Back in 2017 when the Asian doctor on the United Airlines flight was attacked by its staff, Twitter was alive with the news almost immediately. People tweeted out videos and opinions of the incident instantaneously, and the incident spread like wildfire.

It’s incredibly powerful to have people live-tweeting huge events and major scandals as they are happening.

Tweeting about #Godzilla

As an example: If a giant Godzilla appeared in New York City, you’d probably hear about that the first through Twitter – it would be quicker than the news channels, your grandma phoning you, or any other form of traditional media. 

godzilla tweet
godzilla tweet

>> These people are working faster than the stock market can react to this kind of news, and that’s really the jackpot of instant information. <<

That means: if there was some way to track Tweets in relation to major events or major controversies, there would be a way to hear about serious situations and react to it before the stock market registers the same situations.

I’m just the bearer of good news today, because guess what? All that social media data is free for you and me to gather!

How to Get Twitter Data?

So you might be wondering, “That sounds wonderful and all, Max. But how do I get that data? I want it – give it to me!”

Don’t fret, young grasshopper. I have all the techniques up my sleeve ready for you to use. There are two specific ways to grab all that social media data:

  1. Web-scraping: Extracting the data out of the HTML of the websites. (You can check out our extensive Step-by-step walkthrough article of how to do that here!)
  2. APIs: certain organizations make a free service (an API) that makes it very easy to get the data you need.

>> Great news for us: Every Tweet is available for us to save through the Twitter API. <<

Your only real task here is to reduce the massive amount of data to the main relevant pieces of information.

There is a long list of ways to do this depending on what exactly you’re looking for and what kind of data you have. Either you want to:

  • only look at specific pieces of social media chatter
    or maybe… you want to reduce every social media post down to a statistic
  • Once you have the social media side of the data, you will need to combine it with the stock market side to compare the information symmetry.

Gathering stock market data can also be done by the two methods we described above: web scraping or APIs. The data you’re interested in is probably something like the company stock prices, just so you can track the stock’s activity.

It’s a Data Analysis Party!

Now, finally: it’s time for the fun stuff aka. Data Analysis time!

To analyze the data, you’ll need to run a programming language like Python. Honestly, Python is the best language you can use to analyze your data in these kinds of situations. (If you want to find out other reasons why Python is the hands-down king of all coding languages, you can read that article here.)

>> Now, you’ll want to try to uncover relations between the data by identifying predicting elements from the social media side that have results in the stock market side. <<

Woo. That sounded complicated and difficult.

Basically, you should look for a relationship between events or opinions in social media that translate to activity in the stock market like a price drop or a price increase. This could also be in the form of very general trends, or you can also have extreme events that can result in very sudden, unforeseen shift in the stock market and stock prices.

Once you find a connection like this – you can check if Twitter was discussing it before the stock market had reacted to it.

Was Twitter discussing it 2 mins before the stock price took a hit? 1 min? 20 seconds? That amount of time to react is years in the field of stock exchanges. 


The basic conclusion of this article: Setting up a system like this would be a great way to predict major events that might impact the stock-market seconds or even minutes before the stock market reflects the event. If applied correctly, you could short stocks if you predict prices will plummet, or you could buy stocks if you predict prices will increase.

So that’s basically it! Those are the fundamentals behind how to predict the stock market by mining social media data.

Like this article?

Share on facebook
Share on Facebook
Share on twitter
Share on Twitter
Share on linkedin
Share on LinkedIn
Share on pinterest
Share on Pinterest
Scroll to Top