Using NFL Statistics To Predict Game Outcomes
How can we use passing and rushing statistics to determine NFL game probabilities? How can we apply logistic regression to do so? Today on the blog Mark Taylor takes us through the process.
In this post here I introduced a simple set of metrics which can be used to estimate the talent levels, both on offense and defense for American Football teams. After a few games, sides will have experienced differing levels of opposition and their raw game statistics may either be inflated because they have faced weaker opponents or depressed because they have faced up to some of the best.
If we wish to use recent matches to evaluate a team it is essential that such strength of schedule issues are addressed and even more so in the NFL, where a severely incomplete fixture list means that these issues persist into the post season.
If a side has passed for an average of seven yards per play against a set of defences which allow just six yards per play, then we can be fairly confident that we are looking at an above average passing team. But if the same seven yards per passing attempt was achieved against defences that allow eight yards, then they’ve failed to exploit a soft schedule and are most probably below average.
Descriptive vs Predictive Statistics
Statistics can be further broken down into those which explain the result of the matches in which they were recorded and included such events as a fumble recovery , which may not be a repeatable talent. These are descriptive statistics. Other figures correlate with scoring and match winning events that may occur in the future. These are predictive stats.
The numbers showcased in the previous post have proven over seasons to have many of the attributes required of predictive stats and as well as producing early season rankings which correlate well with final standings, they can also be easily used to estimate game outcomes and spreads for individual matches.
As we approach Super Bowl weekend, Baltimore and San Francisco, the two top rated sides by these simple stats at the end of September are contesting their respective Championship games. Also six of the last eight sides remaining at the Divisional stage were already lying within the top ten for sides ranked by defensive and offensive efficiency in the previous post. Identifying likely post season contenders at such an early stage can lead to inflated prices about single figure contenders come January, but the bread and butter of NFL betting remains match betting, usually against the spread.
Probably more than any sport, American Football is a carefully constructed confrontation between one defense and an opposing offense. Match ups, both between individuals or units often determine the outcome of a game and passing or rushing efficiency stats when similarly paired can highlight any pregame weaknesses. An average passing side, when faced with a well below average passing defense can produce exceptional aerial yards per attempt in that particular game because of a favourable match up. Similarly, usually explosive offense can become laboured and below par if faced by an outstanding defensive unit.
Applying Logistic Regression To Predict Outcomes
If we record the efficiency stats which both sides took into a game, along with the outcome, we can use a form of regression called logistic regression, which uses inputs to predict categorical outcomes, such as wins or losses. Game winning probabilities for future games can then be calculated, which lead naturally to expected margins of victory or defeat.
Seattle’s narrow defeat at Atlanta was probably the most dramatic game of an action packed Divisional weekend and provides an ideal illustration of how to use strength of schedule corrected efficiency stats to produce game odds.
An Example: Seattle at Atlanta – NFC Divisional Playoff
Over the course of the season, Seattle had rushed for 4.80 yards per carry against a selection of defences which had allowed in all of their combined games almost 30,000 yards on nearly 7,000 attempts, for an average of just 4.25 yards per carry. Seattle had outrushed the average achieved by the defences they had faced by a considerable margin. The easiest way to illustrate this extremely good rushing ability is to divide Seattle’s opponents yardage allowed per carry into Seattle’s rushing yardage. 4.8/4.25 gives 1.13, so Seattle’s rushing efficiency could be considered to be 113% of average.
On the other side of the match up, Atlanta’s defense allowed team’s which habitually rushed for 4.27 yards per carry to rush for 4.8 yards per carry and a similar calculation shows that the Falcon’s allowed 112% of average yardage on defense. Whereas figures above 1 indicate a tendency to produce above average performances by the offense, similar numbers in excess of 1 for the defense indicate below average performance.
Atlanta were, therefore involved in a potentially damaging ground match up. They were well below average when trying to prevent yardage on the ground and Seattle were well above average when they themselves ran the football. The simplest way to combine the offensive rushing record of Seattle and the defensive run stopping ability of Atlanta is to multiply the two together. 1.12 x 1.13, giving a final rushing expectancy in this game for the west coast visitors of 1.27.
The Schedule Adjusted Raw Yardage Stats of Each Team Prior To Kickoff in Atlanta.
|Team||Offensive Run Efficiency||Offensive Pass Efficiency||Defensive Run Efficiency||Defensive Pass Efficiency||Win %||Points Spread|
By the same means, Seattle’s passing expectancy in last Sunday’s early game came to 1.22 compared to rushing figures of 0.89 and passing figures of 0.93 for Atlanta. These combined numbers are couched in offensive terms, so Seattle’s 1.27 on the ground and 1.22 through the air, gave them potentially an advantage over Atlanta’s 0.89 and 0.93, even when home field, a week’s extra rest and time zone effects had been accounted for.
Quantifying this advantage is made relatively easy by the NFL's reluctance to allow tied games. Very few matches remain scoreless in overtime, so we have a sport where there are almost always only two possible results for the home team. So we can use logistic regression, which demands just two possible outcomes and a representative sample of historical match ups to predict how often such a game as Seattle's at Atlanta would result in a home win.
For those interested in the technical details of the regression, the respective rushing and passing coefficients for the home side are currently 1.04 and 1.91, -0.99 and -1.88 for the visitors and the constant is 0.22. If we apply those numbers to last week's NFC Divisional game between Seattle and Atlanta we get -0.62, we'll call this number X. The final step to convert X to a win probability for the home team is to insert it into this equation;
Home Win Probability = e^(X)/(1+(e^(X))
For our example the win probability for the home team, Atlanta, came to 0.35. Seattle was therefore a 0.65 chance, giving them a likely average margin of victory of about 4 points and value against the spread, where Atlanta were favoured by about a field goal. On the day the Falcon's drove 40 yards in the final 25 seconds to beat the Seahawks by 2.
For those interested in looking deeper into the use of logistic regression to forecast game outcomes, even those where a draw isn’t uncommon, here’s the wiki link.
Logistic regression is a very powerful tool to predict game outcomes, but it also allows us to both validate our model and discover the relative importance of each facet of the game. For example, if we match two sides together with identical expected rushing and passing expectations on the day, we find that the home team is predicted to win about 57% of the time and that is the historical win rate for home sides. So our model appears to reflect reality.
Secondly, home field teams which can pass the ball 10% better than average with their rushing expectation and those statistics of their opponents held at exactly an average of 1, are predicted to win 64% of the time. However, if they instead run the ball 10% better than average with all other match day parameters held to par, they only have a 61% chance of victory. Recent NFL history seems to be telling us, via the regression results, that it is better to be a good passing side (and conversely have a good pass defense) than it is to have an identical level of excellence on the ground.
Having set the scene in the initial post and further extended the statistics here to include individual game prediction, later in January, I will use the method to break down the Super Bowl, which is likely to see the Patriots, (who are favoured by 6 by the model) take on San Francisco, (who are favoured by 5), although Baltimore and Atlanta will hope to upset the odds this weekend.
Read more of Mark's work on his The Power Of Goals blog
And follow Mark of Twitter: @MarkTaylor0