Ever since bookmakers began offering odds on football matches, serious bettors and casual punters alike have concerned themselves with the question of how to predict football results and the task of developing profitable football prediction systems and models.
In this article we will introduce the keys to creating an accurate football prediction model and/or system as well as taking a look at the most popular ways of answering the questions, how to predict football scores, how to predict football draws and how to develop accurate team ratings.
What You Will Find In This Football Prediction Guide
- How To Start Predicting Football Matches
- Issues To Consider When Predicting Football Matches
- What Is Goal Expectancy?
- What Data Predicts Football Matches
- Goal Expectation Based On Shot Location
Post Your Own Football Prediction Tips
It's no secret that football is the most popular sport among the legion of bettingexpert tipsters. In fact, while we have numerous experts posting regular betting tips on a diverse range of sports, when it comes to betting tips, no other sport comes close to football.
Football Tips at bettingexpert
If you consider yourself a football prediction expert and want to compete for your share of £3,000 in monthly cash prizes, sign up with bettingexpert today and begin posting your football tips now.
place bets on football matches, and post football tips, making a football prediction and placing football bets is about your long term profit and loss.
We are not too interested in how accurate our methods are in relation to competing prediction models, nor are we interested in a purely mathematical assessment of our approach. All that were are concerned with are two things:
- Does our method identify value bets
- How many value bets does it identify
The first point should be obvious. Any system for predicting football matches needs to identify value betting opportunities. If it doesn't, then it's useless. It may find value in only very specific markets, for example 1X2 while being completely deficient in identifying value in other betting markets such as Goal Totals. It's important to keep in mind that you are going against your bookmaker, nobody else. In this sense, it's your methodology against theirs.
The second point often goes without consideration, but it is important. Let's say you have a method for determining the outcome of football games. It finds value, but unfortunately it finds a value bet only once in every fifty scheduled games. So for an entire Premier League season, you would get roughly seven to eight bets. We will discuss this example with a little more depth shortly, but for now, it's important to develop a football betting system that will generate enough bets to make the time spent maintaining it worthwhile.
How To Predict Football Results: Issues To Consider
So the question again is, how to predict football scores and results? Let's take a look at the key issues to consider when you begin developing a method for an accurate football prediction.
A key question when asking how to predict football is, what data are we going to use to assess team performance and potential? In terms of football, the most popular of these include:
- Goal Differential
- Shots On Goal
- Shots On Target
- Location Of Shots On Goal
The key questions when it comes to data are:
- Will this data accurately assess team performance?
- Is this data readily available?
We will consider these questions shortly when we discuss how to assess predictive data for football betting. But for now we should refine our line of questioning.
When we ask as whether or not a statistical category will assist us in assessing team performance accurately, what exactly are we asking?
What we are really asking is, will this data provide us with an accurate assessment of the number of goals a given team will score and concede in their next match. This is often referred to as Goal Expectancy. We will talk about Goal Expectation later in this article, as well as Poisson Distribution, which is a method for calculating a range of football match probabilities simply using Goal Expectancy figures for the competing teams.
Data Sample Size
This is and will always be a thorny question. At the end of the day, what really matters, is that the sample size you choose, provides you with a consistent stream of value bets. That should be obvious.
When we talk sample size, we are asking how much data is required to establish an accurate assessment of a team's true potential in an upcoming match. This is often referred to in terms of the number of matches a team has played. So for example, you may use goal difference over each club's previous 20 matches, or their shot on goal ratio over the previous 10 matches and so on.
It's also important to consider how we will value each match in our sample size. For example, we could consider each team's goal difference over the last 20 matches, but we could place a factor of 2 on goal differential in the previous 10 of the 20 matches essentially making a club's performance in the last 10, half as much as the previous 10 . Or we may smooth the sample size, making the previous match worth a factor of 20, all the way to making the 20th match worth a factor of 1.
Collecting Football Data
Fortunately there are many sites on the internet offering free football data or data at a more than affordable prices.
There is also the option of using web scraping software and scraping the data you desire from websites that publish them. Keep in mind that sites who publish advanced football data pay high subscription fees for this data and do not appreciate users scraping their resources. So tread politely I.e don't make it obvious that you're scraping their data. While it can make the process more time consuming, most web scraping software allows for inconspicuous scraping.
Lastly, and we have already touched on this briefly, there is the issue of technical aptitude. The more capable you are technically, the more likely it is that you'll be able to predict football games accurately and find value bets consistently.
To illustrate this, let's consider our earlier example. Our betting model that finds around seven or eight value bets across a Premier League season. As we said earlier, this isn't an efficient betting model if you are updating your spreadsheets manually week after week. It's just not worth your time..
However, If you are a technically capable bettor and have your data updated automatically via a feed, not to mention odds updated via another feed, and have committed to apply your prediction system to fifty football leagues around the world, then you'll have around 300 football bets over the course of a season. It you have an efficient set up such as this, then your betting model is worth your time.
The reality is however, most people are not elite when it comes to computer and internet programming. So our advice would be to improve your technical skills. You do not need to become a high level programmer, but you should work on developing, at the very least, your spreadsheet skills and if possible, your proficiency in working with a database. It's also worth your time learning how to use web scraping software so as to provide yourself with more powerful football data.
Like we said, you do not need to become Bill Gates. But every enhancement you make to your data manipulation skills is going to enhance your chances of predicting football matches and becoming a profitable football bettor.
How To Predict Football Results: What Is Goal Expectancy?
Simply put, goal expectancy is the number of goals we expect a team to score in a given match given their own potential to score, potential to concede and the potential of their opponents to do likewise. Of course, how we calculate this is a matter of serious debate and largely the topic of this article.
While the subject of goal expectancy has been discussed enthusiastically in recent years, with everyone from casual football forum posters to highbrow academics each having their say on the topic, all that we must understand here is that the more accurately we predict the goal expectancy of each club in a given football match, the more likely we are to find value bets and as a result, earn consistent profits from your football betting.
In short, the profitability of any football betting model hinges on its capacity to forecast accurate scorelines match by match and in turn, translate those forecasts into betting odds. This is where what is known as Poisson Distribution comes into play.
What is Poisson Distribution?
In essence, Poisson Distribution calculates the probability of each possible scoreline in an upcoming football match given the goal expectancy for each club competing in the match.
The intricacies of the Poisson Distribution need not be fully understood for you to make use of it, because Microsoft’s Excel has a built-in Poisson function. In statistical terms, the formula for Poisson in Excel is:
=POISSON(x, mean, cumulative)
Mean is our goal expectancy for an individual team. We must also set ‘cumulative’ to FALSE, which results in POISSON returning the probability that a random variable, in this case goals, takes on a value exactly equal to x.
In other words, if you want the probability that a team will score 2 goals in a given match, and your calculated goal expectancy for that team in this match is 2.127 goals, then your formula is:
The output from this is .2696 – i.e. there is a 26.96% probability of this team scoring exactly 2 goals in this match.
To derive meaningful football match odds, we need to know the probability for all goals though, or at least all likely goals. In the example below we have calculated the likelihood for both Arsenal and Sunderland scoring exact goal totals given their pre-match goal expectancy. For this match the pre-match goal expectancy (based on a hypothetical model) were:
- Arsenal goal expectancy = 2.12673
- Sunderland goal expectancy = 0.75001
Given this predicted scoreline, an application of Poisson Distribution gives the following output:
These are the raw numbers for the probability of both teams scoring an exact goal total. For example, there is a 10.16% chance that Arsenal score 4 goals. From here it is a relatively simple process to calculate the odds for most markets.
We set up a table in a spreadsheet showing the estimated probability of all results. Below is such a table with Arsenal goal totals along the horizontal and Sunderland down the vertical. As we can see, the likelihood that the match will end Arsenal 1 Sunderland 0 is 11.98%.
Not sure what the probabilities translate to in terms of odds? Simply set up a table corresponding with the one above, and display the odds in decimal format.
It’s also a simple process to calculate your odds on all the Under Over markets, e.g. the Under 1.5 outcome is the sum of the probabilities of the 0-0, 1-0 and 0-1 score line. The Under 2.5 is those three plus the 1-1, 2-0 and 0-2, and so on.
Fortunately we've done the leg work for you and created a spreadsheet for you to download featuring the probabilities for many popular football betting markets. You can download it here. Simply enter the home and away teams and your calculated goal expectancy for each team for a particular match and the sheet will calculate odds for:
- Match Result 1X2
- Draw No Bet
- Both Teams To Score
- Over Under 2.5 Goals
Issues With Poisson
The first issue when it comes to using Poisson Distribution to calculate football match probabilities is that it assumes goals are scored independent of one another. Events may be independent if we are using a random number generator, but it is not the case when referring to football results and the manner in which events on a football field interact and proceed.
The second and somewhat related issue with Poisson Distribution is that it can underestimate the likelihood of either one or both clubs scoring 0 goals. This can as a result, diminish the likelihood of a match ending as a draw. If you are interested in this subject, just do a search for Poisson Distribution Zero Inflation Football. There are a number of academic papers written on the subject and it may improve the accuracy of your predictions.
A comprehensive discussion of these issues is beyond the scope of this introductory article, but if you plan to advance beyond the basics of football forecasting and modeling, we recommend you investigate each in more detail.
How To Predict Football Results: What Data To Consider
Randomness plays a key role in the game of football. As a result, football prediction models can never be perfect - indeed there would be little point in playing the games were this the case.
When we wonder how to predict football scores and developing accurate goal expectancy ratings, determining the statistical categories we should include is a crucial question.
Incorporating Home Ground Advantage
Let's begin with home ground advantage. There is no debate that home ground advantage exists in any sport where teams play in alternating home and away stadiums. What is debated however is the extent to which playing a match on a home field enhances the chances of the home team winning the contest and further, whether or not certain clubs have a greater home field advantage than others. Our position is that while some clubs may appear to have a greater home ground advantage than others, this is typically variation visible in a limited sample size and that over the long term, all clubs enjoy roughly the same home field advantage.
In terms of football, the advantage for a club playing at home as opposed to playing away is roughly a swing of +0.74 goals. The table below shows home ground advantage in terms of goals for and against, home vs away across ten of Europe's most popular leagues the last five seasons. As we can see, the ten league average is +/- 0.37 goals..
|League||Home Goals||Away Goals||Home Advantage|
|Belgian Pro League||1.62||1.22||+0.40|
|English Premier League||1.54||1.19||+0.35|
|French Ligue Un||1.44||1.07||+0.37|
|Italian Serie A||1.50||1.14||+0.36|
|Russian Premier League||1.39||1.08||+0.31|
|Spanish La Liga||1.63||1.13||+0.50|
Applying this to a simple football predictive model is easy. Let's say we have two teams. We have projected that the home team has a goal expectancy in this match of 1.75 while the away has a goal expectancy of 1.55.
A simple way to incorporate home ground advantage is to take the average of +0.37, split it in two to return 0.185. We add this figure to the home team's goal expectancy and deduct the same figure from the away club, to leave us with:
- Home goal expectancy = 1.935
- Away goal expectancy = 1.365
Let's now look at possession share data. An article on the topic of predictive value in football statistics in the Guardian featured the following quote:
”Last season Swansea had lots of the ball but little in the attacking third. It is telling that Rob Mastrodomenico of Global Sports Statistics, which uses data and advanced models to help predict future matches, says: "From a purely modelling point of view we don't use possession. Shot-based stats are more relevant if you are looking for a team to score."
Possession for its own sake is meaningless. What is important is the quality of the possession, and that is very much a subjective value, and one that most of us do not have the resources, specifically time, to evaluate.
Simply put, five minutes of possession on the edge of the opponent’s penalty area should be worth more than the same amount of time spent passing the ball across the field in your own half. At least it’s worth more if you wish to use this category as a goal expectancy parameter, and that is the intention of the exercise.
Using a statistic, such as possession data, simply because it is available doesn’t make sense, but one statistic that is useful is that of shots.
The use of goal differential to determine team strength is without doubt the most widely employed of all football statistics. It is clearly the most available of all statistical categories and over a large enough sample size, goal differential can indeed provide an accurate indicator of a team's overall potential.
Over an inadequate sample size however, goals can tend to be rather random. Yes, the better team usually beats the inferior team, but not always. Over 90 minutes, an inferior team can often prevail. Much of football's great appeal is in its ability to throw up unexpected results.
So while a relegation battler defeating a title contender 1-0 may provide great theatre, over a small sample size such as one game, goal differential is likely to deceive us.
So how many matches do we need to take into account before goal differential begins to give us an accurate picture of each team's quality? This is a complicated question. It is roughly the case that goal differential over 40 league matches will provide an accurate assessment of each team's quality. That's one Premier League season and then into the next. Quite a sum of games. We could use this sample size to account for each team's quality, however such a large sample size is not going to respond with any subtlety to one team's decent and another's rise.
While in an academic sense goal differential over a larger sample size of 40 matches may assess team's fairly over those 40 matches, bookmakers are not basing their odds on events that happened over a season ago. They are assessing each team on data that is far more insightful and that allows them to assess teams over a smaller sample size, keeping themselves ahead of most betting into the market.
Shots On Goal
Unfortunately shots, while more relevant if you are looking for a team to score, aren’t perfect statistics. Not all shots are equal. Again from the same Guardian article:
”You can plunge even deeper. Sam Green, an advanced data analyst at Opta, has used a database of thousands of matches to develop a model that quantifies the chance of a shot going in depending on its location.”
The article cites an example of a Newcastle vs Reading match in 2015. Newcastle lost 2-1 at home to Reading, while having 56% possession and 16 shots to 7. But, as Sam Green points out: "Reading created two excellent chances – Pavel Pogrebnyak's miss in the 27th minute (goal probability 49%) and Adam Le Fondre's opener (from point-blank range, 69%), as well as his second (17%) – while Newcastle only had one very good chance: Papiss Cissé's shot in the 30th minute (from just outside the six-yard box, 34%)."
Using his model, Newcastle had a goal expectation of 1.4, with Reading slightly better at 1.6. The bald stats told one story, the more detailed analysis another.
As the above section says, the probability of a shot resulting in a goal varies significantly, yet the numbers in the box score simply give a total number. Over a period of time, we can say (the number varies from league to league) that for every n shots on target, we can expect one goal.
If we ignore the strength of a shot (where it was taken and how it was taken, by foot or by head), broad numbers assess that a goal is scored once for every ten shots taken and that a goal is score once for every three shots taken on target.
Shot on goal data has become the new rock 'n roll in football analytics. It's easy to see why. Shot data helps us to understand how a match was played, and as a result provides us with more meaningful data than goals scored alone. If a team wins 1-0 but is out shot 3-15, which statistic would you want to rely upon when determine the quality of both teams?
While shots (and shots on goal) will improve the accuracy of our goal expectancy ratings, how much improvement might any model see if we could incorporate a ‘strength of shot’ into the calculation?
How To Predict Football Results: Shot Location
Compared to basic shots taken stats, the maths involved in calculating goal expectation based on shot location is slightly more cumbersome with the method reliant upon a fair amount of data to fire it up. The concept however is intuitive and easy to grasp. In the case of goal expectation models based on shot location, we suggest using three simple inputs to determine the chances of a particular outcome.
The outcome we are interested in is, of course a goal, as opposed to a shot that strikes the post, a shot that is saved, a shot that is blocked or simply misses the goal completely.
Many factors go into determining the likelihood of success of a goal attempt, from where the shot is aimed to the strength of the prevailing wind. The most important however are:.
- The perpendicular distance to the goal line in yards
- The horizontal distance from a centred vertical, again in yards
- Whether the ball was kicked or headed.
These three pieces of information go a long way to explaining how successful a goal attempt will be based on the actual outcome of many real life efforts. So we are just a couple of steps away from attaching an average likelihood to any goal opportunity.
The first step involves a bit a simple addition or subtraction in the case of shots with the boot and finally there’s a graph to read off the answer.
Goal Chance Based On Location/Method
|Distance (yards)||Factor For Perpendicular Distance||Factor For Horizontal Distance||Factor For Attempt With Foot||Factor For Attempt With Head|
The chart above lists the factors we need to prepare a shot location for conversion into a likelihood of scoring a goal. To clarify let's consider two examples.
Firstly, imagine a shot from the edge of the box and in line with one of the posts. The perpendicular distance from the goal line is therefore 18 yards and the corresponding factor in the second column above is 2.2.
The shot is in line with a post, the goal is 8 yards wide and so the horizontal component of the shot is 4 yards from a central line through penalty and kick off spots. The factor for the 4 yard horizontal component is 0.52 from the third column.
The final factor from the fourth column, for a shot with the foot is always minus 0.32, regardless of distance. If we add these three factors together we get 2.4.
If we now consider a header from the penalty spot, the same process of summing the factors gives us 2.28 from (1.47+0+0.81). This time, note that the factor for a header contained in the fifth column is positive, rather than a minus number in the case of shots.
The final step merely requires the sum of the three factors are matched on the plot above to the corresponding probability that a goal is scored.
For the shot from the edge of the box, 2.4 can be found just before halfway between 2 and 3 on the horizontal axis and if we take a vertical line up to the curved line and then move horizontally to the vertical axis, we arrive at a point just below 0.1. The value is around 0.08, so there’s about an 8% chance that either the shot or the header from closer in will result in a score. This equates to about 1 goal in 12 attempts.
The value of 2.28 for header from the penalty spot comes in as a slightly greater than 9% scoring chance.
This is a simple model, which omits many possible variables. However, often even such approximations have a happy knack of quickly capturing much of the likely outcome and that is the case here. With matches containing many more attempts than goals and a proliferation of shot data now available, this process allows a game to be re-evaluated into terms that can begin to see if a victory was deserved or rather lucky.
At the very least it gives a feel for the chances of a player who lines himself up to shoot from 25 yards out, even if he is in a more favourable central position on the field….although you could probably add a couple of percentage points to his chances if he also has the wind at his back.
An Example: Chelsea 1 Everton 0, February 2014
The thin line between three points or one point was admirably illustrated in a 2014 Premier League match between Chelsea and Everton. Chelsea won the match 1-0, scoring the only goal of the match in the 92nd minute, from the heart of the six yard box. The Blues had comprehensively out-shot their visitors by 25 to 8, but many had only a small chance of being successful. Only an Ivanovic shot on the hour and Terry’s last gasp toe poke had a goal expectation in excess of 25%. The remaining 23 efforts, mostly from distance were much less threatening.
In the tables below we can see the shots and headers attempted by both sides in the match, along with the vertical distance in yards from the goal line (the X coordinate) and the horizontal component (the Y coordinate) as measured from the centre of the goal. The probability that a shot may result in a goal has then been calculated.
A crude measure of Chelsea’s overall goal expectation can be gained by totaling the individual probabilities for each attempt. It was certainly a case of leaving the best until last for the home side, Willian’s opening effort after 4 minutes had only a 2.6% chance of ending up in the net, but Terry’s winning strike had a hefty 43% chance of finally producing a goal. The cumulative probability of all 25 shots came to just under 1.7 goals.
By contrast, Everton’s 8 attempts had a cumulative goal expectation that was just shy of six tenths of a goal, with the best chances falling Mirallas and McCarthy with two first half shots, each having goal expectations of around 12%.
Chelsea Shots/Goal Expectation
|Minute||Team||X Yards||Y Yards||Shot or Header?||Goal Expectation|
Everton Shots/Goal Expectation
|Minute||Team||X Yards||Y Yards||Shot or Header?||Goal Expectation|
A shot model such as this allows us to move from a mere shot differential of 25-8 in Chelsea’s favour, to account for how likely each shot was to result in a goal. Chelsea also “won” the goal expectation contest by 1.7 to 0.6 goals.
Given this data and analysis, Chelsea were a better side on the day by a margin of 1.1 goals. So the margin of victory appears fair, although the total number of goals (2.3) is inflated in the model.
So again, we have refined our assessment of a given football match, with a more accurate interpretation to incorporate into our team by team goal expectancy ratings. Which of the three options do you think offers the most accurate assessment of both team's performance and potential?
- Goals only: Chelsea 1 Everton 0
- Shots only: Chelsea 25 Everton 8
- Shot location: Chelsea 1.7 Everton 0.6
As football data becomes increasingly granualar and detailed, the future of football data analysis and predictive modelling is anybody's guess.
How To Predict Football Results: Conclusion
While there are almost limitless ways to make a football prediction, the fundamentals stand firm. Our primary concern when predicting football matches is in assessing the quality of each club in a given league. We do this by analysing statistical categories that provide a more accurate assessment of a team's ability to both score goals and to concede them. Given a reasonable enough sample size, we can then predict football match results based on goal expectancy for both competing clubs.
While this article is far from exhaustive, it is hoped that it may serve as an introduction outlining the key concepts, issues and developments in football modeling, providing a foundation from which to begin developing your own approach to predicting football matches.
Whichever way you want to approach football forecasting, your model or system must identify value bets and a number of value bets that make maintaining your model or system worth your time. In order to achieve this your predictive approach must accurately assess team performance and therefore future potential more accurately than bookmakers do.