How Cluster Luck Reveals Overrated MLB Playoff Teams In 2013


With the MLB playoffs set to commence this week, today Ed Feng tells us how to identify which clubs may have been a touch lucky and unlucky through the 2013 season.

PNC Park

As the Major League Baseball postseason approaches, you're looking to evaluate teams that will succeed in the playoffs. You to want identify teams that have earned their place in the postseason versus those that got lucky.

Fortunately, we have some powerful tools to analyse luck for baseball teams.

Pythagorean Expectation

First, we have the idea of Pythagorean expectation. This simple formula turns runs scored and allowed into an expected win total. (If you've seen the movie Moneyball, you may remember how they butchered the explanation of this equation, the only one in the entire movie. Pythagorean expectation does not give the number of wins required to make the playoffs.)

Deviations from Pythagorean expectation are a sign of luck for a baseball team. For example, if a team has won more games than expected, such as this year's New York Yankees, then it has been lucky. Most likely, they won more than their fair share of one run games while losing big in losses.

However, there is another, lesser known aspect of luck in baseball. While Pythagorean expectation uses actual runs scored and allowed, we can ask how many runs a team should have scored or allowed. The sabermetric community has developed ways to estimate the number of expected runs from box score statistics.

More importantly, deviations of actual runs from expectation has predictive power. Let me explain.

How Runs Are Created In Baseball

A team scores runs through a two step process:

  • get batters on base
  • drive those runners home

The number of base runners is relatively easy to estimate from the box score. Almost all base runners result from a hit or walk. But how can one use box score statistics to estimate the second quantity, the rate at which base runners are driven home?

Not surprisingly, Bill James took a first stab at this question. He said runners are driven home in proportion to a team's slugging percentage, given by a team's total bases (1 base for a single, 2 for a double, etc) divided by plate appearances. You can show that the runs created formula of Bill James is on base percentage times total bases.

One can improve on these ideas to develop a more accurate formula for runs created. Dave Smyth suggested a separate contribution for home runs, since these hits guarantee at least one run. In addition to home runs, he added a second term inspired by the two step process from above. However, home run hitters are no longer counted as base runners.

Over the last 12 seasons of major league baseball (2001 through 2012), Smyth's Base Runs formula overestimates the actual runs a team scores by 7.24 runs. Since teams have scored 748 runs on average, the error is less than 1%, a remarkable result.

The same runs created formula also applies to a team's defence. Base Runs overestimates the number of runs a team allows by 7.52 runs over the season, again less than 1% error.

Cluster luck

Deviations of a team's actual runs from the Base Runs prediction are due to "cluster luck", a term coined by Joe Peta, the author of Trading Bases. Some teams happen to cluster their hits and walks together, resulting in more runs. Others tend to spread their runs more evenly through the innings, resulting in fewer runs.

To provide evidence that deviations from Base Runs are luck, I looked at year to year correlations in the deviation of actual runs from expected runs. Over the last 11 seasons, the deviation one season explains 4.4% of the variance of the deviation the next season (correlation coefficient of 0.21).

The deviation of actual runs from expected runs regress strongly to the mean each season. Hence, teams should not expect this cluster luck to continue. In addition, teams should not expect cluster luck from the regular season to continue into the playoffs, the basis for this postseason preview.

Let's look at teams that lady luck might leave this postseason.

St. Louis Cardinals

The Cardinals have scored 60 more runs than expected. At the end of each season, the standard deviation of actual from expected runs is 23 runs. St. Louis has exceeded expectations by almost 3 standard deviations, a trend that will not continue into the playoffs.

The Cardinals get on base with the best teams in the majors. However, they do not hit for power, with a slugging percentage near the major league average. Lady luck has clustered together their hits in an extraordinary way this season. The Cardinals have still scored more runs than all but Boston and Detroit.

Tampa Bay Rays

Of all potential playoff teams, cluster luck has hit the Rays the hardest. They have scored 50 less runs than expected while given up almost 6 more.

These 56 runs have put the Rays behind the Red Sox in the AL East. In addition, Boston has benefited from 27 runs of cluster luck in their favour. While cluster luck does not put these two teams at the same level, it does bring Tampa Bay closer to the top of the division.

Pittsburgh Pirates

I started studying run creation formulas in 2011. Back then, Pittsburgh was threatening to end 18 years of losing baseball. I wanted to know whether they were getting lucky.

Dave Smyth's Base Runs said that Pittsburgh had given up 34 less runs than expected. The Pirates quickly collapsed and had their 19th straight losing season.

In 2013, Pittsburgh notched a winning season for the first time in 20 seasons. Moreover, analytics suggests that have not gotten lucky. The Pirates have scored 34 runs fewer than projected while giving up 1 less run than expected. Even accounting for cluster luck, their pitching and defence have been spectacular.

Current 2013 World Series Odds - Odds as at 1st October 2013.

 Opening Oddsbet365YouWinLadbrokesWilliamHillBetVictor
Boston  34.00 4.40 5.25 4.50 4.50 4.50
LA Dodgers  10.00 5.50 5.50 5.50 5.00 5.50
Detroit  10.00 6.00 5.50 6.00 5.50 5.50
St.Louis  21.00 6.50 7.50 7.00 7.00 7.00
Oakland  34.00 8.50 9.00 8.00 9.00 9.00
Atlanta  17.00 8.00 8.50 8.50 9.00 9.50
Tampa Bay  19.00 16.00 17.00 17.00 17.00 17.00
Cincinnati  13.00 19.00 21.00 19.00 21.00 21.00
Pittsburgh  67.00 17.00 19.00 19.00 21.00 21.00
Cleveland  81.00 17.00 19.00 19.00 21.00 23.00



Follow Ed on Twitter: @thepowerrank

And check out Ed's US sports team rankings and ratings at

Tags:  MLB , Betting Analysis

Im the founder of The Power Rank, a sports analytics website. Its based on a ranking algorithm I developed from my Ph.D. thesis at Stanford. I freelance with Betting Expert and Sports Illustrated.