Statistical Noise And Predicting Football Performance
Musings on Backing, Laying, Trading, Punting, In-Running and more on the Betting Exchanges and related items of interest in the wide world of sports investing
Which football statistics are meaningful and which are relatively meaningless when it comes to predicting performance? Today on the blog Cassini takes a look at modern football analytics what statistics can help us in our betting and which won't.
As many of you will know, last week was the occasion of the latest annual MIT Sloan Sports Analytic Conference, an event that grows larger each year, as more and more attention and money is invested in sports.
Every day, it seems that there are more and more statistics available for sports including, of course, football. As anyone who has read my earlier bettingexpert articles posts on Elo and Poisson will know, I am a big proponent on using statistics as the basis for pricing up football matches, but I am not someone who ‘worships at the altar of advanced statistics’.
The nature of the game means that randomness is significant and models can never be perfect - indeed there would be little point in playing the games were this the case - but a statistically based model is a good place to start.
The question is, what statistics should we include in our models? Several ideas were discussed in my earlier articles published here, but a column by Sean Ingle in The Guardian last week caught my attention, specifically the discussion on possession and on shot quality.
One paragraph from the article said this:
”Last season Swansea had lots of the ball but little in the attacking third. It is telling that Rob Mastrodomenico of Global Sports Statistics, which uses data and advanced models to help predict future matches, says: "From a purely modelling point of view we don't use possession. Shot-based stats are more relevant if you are looking for a team to score."
Opta's figures back that up. Of the 181 games won in the Premier League before last weekend, the team who had the most possession only won 103 – 57% in total. The team who had more shots on target than their opponents won 128 matches – 71% of the total.
I don’t use possession at all in my model, and apparently I am in good company. The issue for me is that possession for its own sake is meaningless; what is important is the quality of the possession, and that is very much a subjective value, and one that most of us do not have the resources, specifically time, to evaluate.
Simply put, five minutes of possession on the edge of the opponent’s penalty area should be worth more than the same amount of time spent passing the ball across the field in your own half. At least it’s worth more if you wish to use this category as a goal expectancy parameter, and that is the intention of the exercise. Using a statistic simply because it is available doesn’t make sense, but one statistic that is useful is that of shots.
Unfortunately shots, while more relevant if you are looking for a team to score, aren’t perfect statistics. Not all shots are equal. Again from Sean Ingle’s article:
”You can plunge even deeper. Sam Green, an advanced data analyst at Opta, has used a database of thousands of matches to develop a model that quantifies the chance of a shot going in depending on its location.”
When Newcastle lost 2-1 at home to Reading last month they had 56% possession and 16 shots to seven. But, as Green points out: "Reading created two excellent chances – Pavel Pogrebnyak's miss in the 27th minute (goal probability 49%) and Adam Le Fondre's opener (from point-blank range, 69%), as well as his second (17%) – while Newcastle only had one very good chance: Papiss Cissé's shot in the 30th minute (from just outside the six-yard box, 34%)."
Using his model, Newcastle had a goal expectancy of 1.4, with Reading slightly better at 1.6. The bald stats told one story, the more detailed analysis another.
As the above section says, the probability of a shot resulting in a goal varies significantly, yet the numbers in the box score simply give a total number. Over a period of time, we can say (the number varies from league to league) that for every n shots on target, we can expect one goal, and I do use these numbers myself, but how much improvement might any model see if we could incorporate a ‘strength of shot’ into the calculation?
My view is that overall it would make little difference. Any model should take the strength of the opposition into account, and a stronger opponent is exactly that, because they concede less goals and usually, allow less shots too. Eight shots and one goal by a visiting team at Old Trafford is worth more than sixteen shots and two goals at home to Queens Park Rangers.
Possibly these ‘strength of shot’ statistics are available somewhere, but for most of us hobbyist punters, the extra time needed to factor these in is unlikely to be worthwhile. The small differences to the final goal expectancies are likely to average themselves out in most cases, and when estimating goal expectancy for the next fixture, most models will look back over a number of games, perhaps with added weighting for more recent matches, or for matches against similar teams.
Location, And More
There’s one more issue for me, which is that the ‘quality of shot’ rating has to be subjective. A model that ‘quantifies the chance of a shot going in depending on its location’ alone is flawed. Unlike with property, where location is everything, when taking a shot, it is not everything. Little details like whether one player, or more, is in the way, or whether the goalkeeper is out of position etc. are very much part of the equation.
I’m sure these are factored in to the probability formula and we don’t have all the details, but some sports differentiate between scores based on ‘type’, for example the fast-break in basketball. The fast-break basket may be a lay-up and count for the same two points as a drive to the basket against a full-zone defence and a lay-up, and may be scored from the exact same spot, but because the scores are different in character, they are tracked separately.
Similarly in football, a shot created from a set piece, a free-kick or corner, is quite different in nature from a shot from a ‘fast-break’. It is for this reason that I don’t put so much weight on a late goal that makes the score a two goal margin as I do a goal from a tied position that wins a game.
A team pressing for an equaliser leaves themselves open, something that is seen more often in Ice Hockey where a team trailing by one or two actually withdraws the goalkeeper in favour of another attacker. When an attack breaks down, it’s not unusual to see the puck slide from a distance into an empty net. Should that goal carry the same weight as earlier goals?
As the recent Manchester United v Real Madrid Champions League game showed, the impact of a red card is also huge. A goal scored against ten men might well be legitimately devalued.
Consider that the probability of Manchester United advancing was 70% prior to the red card, and 50% immediately after.
And then there is the question of statistics that are assigned value in some quarters, but are summarily dismissed in others. One such statistic is that pertaining to historical results of matches between two teams. Justification for a bet is supported by a statement along the lines of: Fulham have played Stoke City five times since 2010 and four of those matches were won by the home side. Thus the evens available on the next fixture is great value.
My personal opinion is that data like this is mildly interesting, but not necessarily of any great value. If the two teams had met five times in the past two weeks, fielding identical teams with the same managers and other conditions identical, then there might be something in the statistic, but when the initial match in the sample was three years ago, it’s meaningless. The chances are that the game from two or three seasons ago bears almost no similarities to the game ahead, other than the names of the two clubs. Personnel, both playing and managerial, change as do playing styles and the relative positions of both teams.
The problem with using statistics like this is that they are misleading. Four wins out five for the home side in this fixture doesn’t mean that there is an 80% chance of the home team prevailing. The home team MIGHT have an 80% chance of winning, but this wouldn’t be because of a result from two or more years ago. It will be because they are rated much stronger and in better form and possibly more motivated to win than the visitors.
And just today, I read courtesy of Infostrada Sports that the Copa del Rey final will be played at the Santiago Bernabeu, and that Atletico have WON ALL 3 (emphatic bolding theirs, not mine) finals versus Real which have been played at that venue. Mildly interesting as a curiosity, but a help to you for making money? No – unless Joe Punter rushes out and backs Atletico based on this ‘statistic’. The three finals were held in 1960, 1961 and 1992!
Rankings Don’t Count
One final comment on statistics and their usefulness is this thought on the subject of rankings. They are fundamentally flawed and absolutely worthless. The problem is that when teams are ranked, the mind sees the gap between teams as the same, e.g first to second looks very much like the same difference as between 15th and 16th.
However, this is often misleading. Look at this season’s EPL table for an example. Manchester United are head and shoulders (12 points) clear of the next best ranked team, yet the same number of points spans (at the time of writing) nine teams, 7th place all the way down to 15th.
To conclude, using statistics are an essential tool in search of value, but think about the statistics you choose to use, how you incorporate them, and make sure that they are actually useful. It’s the signal we are after, and not the noise.
You can follow Cassini on Twitter @calciocassini
And visit Cassini's blog : GreenAllOver.blogspot
You must be logged in to post a comment! Sign up + or log in in the top right corner.