Better Collective A/S,
Sankt Annæ Plads 28, 1250 Kobenhavn K,
Denmark (DK)
Phone: +45-29919965
Email: [email protected]
CVR/Org.nr: 27652913
18+ | Wagering and T&Cs apply | Play Responsibly | Commercial content | Advertising disclosure
With the World Cup entering its final stages, bettingexpert decided to look into the great abyss that is Twitter and football fans. We sat out with the goal of uncovering which of the nations were targeted by the most negativity by the fans on Twitter during the group stages of the tournament.
We scraped almost 350.000 tweets in English directed at the teams from when the World Cup started until the group-stage matches were done. We started scraping on November 20 and stopped on December the 5th. On these tweets, we conducted sentiment analysis. This allowed us to conclude which nations were targeted the most by negative tweets and who was getting the most ‘love’ from the fans out there.
Key findings:
Looking at this list with the nations and the number of tweets directed at each of them, it’s hardly a surprise that, despite good results and good performances on the pitch, England is targetted by most negativity on Twitter. The Brits are notoriously hard on ‘The Three Lions’ and this year’s World Cup is no different.
Despite a low number of tweets, the sentiment around Morocco is very positive - possibly as a result of their inspiring and surprising performances on the pitch.
We used an API (Application Programming Interface) to scrape around 1.200.000 tweets related to nations progressing from the group stages of the World Cup. We extracted only English tweets (around 343.000 tweets) and cleaned the data using Python programming language. The data cleansing consisted of the process of removing URLs, Hashtags, Mentions, Punctuation, Duplicates, Null Values, and special characters from tweets. Once the data was cleaned we used NLP (Natural Language Processing) techniques, TextBlob Python Library to be more specific, to analyse the sentiments of the data.
TextBlob returns the polarity of a sentence. Polarity lies between [-1,1], where -1 defines a negative sentiment, and +1 defines a positive sentiment. Once we had all cleaned tweets labelled as positive or negative, we grouped the data by nation and calculated the percentage of negative and positive tweets for each nation.
To find the tweets related to each of the nations, we used a hashtag. For example, we used #ENG to find tweets related to the English national team. We started scraping on November 20, 2022, and stopped on December 2, 2022.