The Emergence Of Shot On Goal Analysis


Shooting data is gaining greater and greater attention from the football mainstream. Today on the blog Mark Taylor takes a look at its rise and development.


Anyone interested in using numbers to explain and predict the diverse and often unpredictable events that happen during the course of a football match find themselves living in exciting times. The plethora of new data, much of it free and some available for a small monthly fee, has opened up the arena of analytics to anyone who has the desire to delve deeper. Where most notably baseball has already trodden, football is beginning to take similarly confident strides.

Pre internet, very little data existed in a readily available format. Data began and ended with goals allowed and scored, but it was an indication of the latent desire to investigate the beautiful game that even this sparse data was put to great use. The Poisson distribution, where the likelihood of an exact number of events occurring can be estimated if we know average rate at which that event occurs, was never part of a mainstream mathematics education. But it fitted goal scoring (almost) perfectly and soon became a fast track route into modeling individual match scores.

Here Comes The Data

The data explosion that culminated in Manchester City releasing for free a whole season’s worth of individual player (and hence team) data in 2012 probably began when Joe Buchdahl began his still influential Football Data site. Match odds, numbers of corners, fouls and cards were suddenly available in csv format for the first time, but it was the inclusion of shooting data that kick started an interest in the second level of granular data that existed below goals.

Data need not be complex or detailed to yield rich rewards. The rudimentary counting figures for shots, shots on target and goals scored provided by the football data site can easily be used to estimate shooting accuracy and conversion rates. Goals are relatively rare events, but by moving one step back in the process we can quickly build up a larger collection of data for the precursors of goals.

The Seductive Nature Of Shooting Data

Much of the groundwork for analysing both quality and quantity of shots has been undertaken in hockey and translating such figures into predictive metrics in football was a logical step to take once shooting data was readily available. Both shooting efficiency and shooting frequency are excellent indicators of future performance, once external factors, such as game states and the ever present need to regress all counting data towards a side’s more usual mean is applied.

Inevitably, some teams over or under perform against the expectations of a shot based model. But just as apparent home specialists appear one season, but return to normal levels in subsequent seasons, most over or under performers are dragged back to produce future results more in keeping with their posted shooting statistics. However, some sides continually buck the average trends, implying that simply counting shots and goals, while effective at predicting the course of many teams, can omit more subtle and repeatable facets of certain team’s week to week performance.

A Case Study: Stoke Under Pulis

If we collect enough shooting data, the individual samples for the majority of teams will even out in overall quality, but if we want to drill down and look at how the persistent outliers differ from the pack we need to add pitch position to the mix. Stoke throughout most of their Premiership life under Tony Pulis, invariably out performed shot based models. Their shot count was eclipsed by most other sides, yet they were extremely efficient and this vital efficiency advantage has persistently proved the difference between survival and a return to the Championship.

All becomes clear if we add detail to their shooting statistics and chart the area of the pitch from where they take their chances. In a typical Pulis led season, Stoke took just under 10% of their chances from inside the six yard box, compared to a typical figure of a shade over 3% for Arsenal. So what Stoke lacked in quantity, they compensated for in quality of opportunity, because the closer to goal you are when you shoot, the more likely you are to score.

In 2011/12 the average shot in the Premiership was made 15 yards from goal and 9 horizontal yards away from a perpendicular line drawn through the penalty spot. Stoke were, on average a yard closer and two yards nearer to the centre of the goal, reinforcing their tendency to put the ball into the most dangerous areas possible.

To highlight the positional shooting advantage enjoyed by Stoke, in the table below I’ve listed the average x,y coordinate for all shots for each side in Stoke City games against their Premiership rivals during 2011/12. Only Wolves managed a more advantageous shooting position. Numerically, The Potters were often outgunned, but they did consistently shoot from better positions than their rivals.

TeamDistance from centre of goalDistance from goal
Wolves 7.1 13.2
Stoke 6.5 14
Blackpool 10.2 14.3
Arsenal 8.5 14.6
Tottenham 6.2 15
Blackburn 5.7 15.2
West Brom 8.7 15.4
Everton 7.4 15.6
Aston Villa 6.7 15.9
Bolton 6.6 16
Wigan 10.1 16.1
West Ham 7.8 16.2
Man Utd 8.4 16.2
Man City 8.6 17.2
Sunderland 8.3 17.3
Fulham 6.9 17.4
Chelsea 6.6 17.6
Birmingham 7 17.6
Liverpool 8.5 17.7
Newcastle 8 17.8

Extreme, style based outliers, such as Stoke are relatively easy to spot, but such variations in the position from where sides are able to take their chances are common across all teams. Therefore we can increase the complexity of our shooting model and hopefully it’s validity by batching together shooting areas, such as the six yard box and attempt to establish an average conversion rate for chances created in these areas by which we can grade all sides in the division.

We are now in a position to more confidently compare the rate at which sides convert their chances once account is taken of the position from which these chances originate. Once again regression towards a mean is an ever present attraction for the very best and the very worst. A side which still over or under performs our new improved model is likely to be very good or very bad, but it is also likely to be fortunate or not, as well. In predicting future performance we should expect the skill to persist, but not the luck. And this is even more applicable to individual players, where sample size is considerably diluted compared to team numbers.

The Next Stage Of Analysis

Moving a model from one based on raw counting figures, to one based around shooting from areas on the pitch to ultimately x,y defined coordinates is only constrained by the availability of data. Many online sites and apps provide detailed visuals of each individual shot made in a game. And most now go the extra yard by including blocked shots as well. Collecting such data in a form that is readily usable is a chore, but the rewards in producing an enhanced predictive model or merely to have the ability to fully appreciate the importance of in game actions, such as shots and shooting position is well worth the effort.

Pitch position however, is merely the most important of numerous factors that may go towards defining the likelihood of success for a shot. Defensive pressure is an obvious secondary factor, along with keeper position and power or placement of the attempt. Defenders are also more likely to find themselves in shooting positions as their side chases a game and they tend to be less proficient shooters than out and out strikers.

Who To Follow, Who To Read

Individual and team shooting efficiency is a particularly fast moving and well developed topic within the analytics community, recent months have seen an explosion in interest and many bloggers have extensively investigated some of the topics I have briefly touched on in this post.

James Grayson’s blog provides an excellent primer for those interested in using hockey style methods to project future performances. James highlights the need to regress measurements towards a mean, thus avoiding the pitfalls inherent in taking even season long numbers at face value.

11tegen11 has taken full account of the position from where chances originate to identify the teams from the top Dutch league which consistently create high quality chances.

Simon Gleave also writes extensive about Dutch football and often incorporates shooting analysis in his posts, as well as appreciating the importance of regressing figures.

Differentgame has produced a zone based shooting model and is currently exploring numerous uses to which it can be put. Some really exciting conclusions about goal keeping ability, as well as scoring potential are being posted here.

Danny Pugsley has also written extensively about the hockey based approach and this article is particularly recommended for the copious amounts of raw data from this season. It also provides a hint as to why Pulis’ fifth season at Stoke also proved to be his last as he struggled to adapt Stoke’s usual approach.

Counterattack9 has explored the difficult to represent numerically, but vitally important relationship between strikers and the defensive pressure they have to overcome with this cutting edge post based around a Brazilian goal from the on-going Confederations Cup.

Statsbettor has also taken a zonal approach to analysing the likely talent gap that exists within the higher echelons of individual Premiership finishing ability.

Mixedknuts has seamlessly added shooting metrics to an entertaining ride through the potential transfer targets around the European leagues during the summer break.

Lastly, I have myself posted extensively about shooting and conversion models and the effect of game states, venue and shot placement at my own blog The Power of Goals.



And follow Mark of Twitter: @MarkTaylor0

NFL and football fan. I've seen my two favourite sides, Stoke and the San Diego Chargers play at the new Wembley....and both lost.