What Is A Meaningful Sample Size?
How much data do you need in order to make accurate predictions? What is a meaningful sample size when conducting your betting analysis? Today on the blog Martin Eastwood discusses how much data is enough.
Imagine you have developed a new betting system and you want to know whether it is successful or not - how long do you need to test it for? 10 games, 100 games, 1000 games or even longer?
What if you have been looking through sports data to try and find an edge and you think you have spotted a trend. Is it real or just a coincidence?
How about Paul the Octopus predicting the winners for the world cup, is he just lucky or is he actually a talented tipster?
One of the most crucial aspects of being able to make such decisions statistically is understanding the effect of sample size. As a general rule of thumb, the more data you have then the more accurate your answer will be and the smaller the differences you can accurately detect. Having too little data may be dangerous as it can lead to results being inconclusive or even misleading and incorrect.
How Large Is A Meaningful Sample Size?
The question of how big a sample size needs to be, or more commonly how small a sample size you can get away with, is probably the single most asked question in statistics and to answer it properly requires some thought about the data and the factors influencing them.
First of all you really need to understand the question you want to answer and the precision you need to do it. For example, let’s say you are a poker player who has made a profit and you want to know if your playing history shows you to be a good player or just a lucky player. Do you need to work out your win rate to within a single percentage point or just show that it is above a certain level? The more precise you want to be and the smaller the differences you are looking to identify, then the more data you will need.
If your data are limited then you may need to compromise on your analysis. It is always better to answer a more general question accurately then to over-complicate things and end up with the wrong conclusion because you did not have enough data.
You also need to understand the data set that you have available for analysis. How variable is the data? What is the data’s average? How are the data distributed? There are a number of different ways to calculate sample sizes and they will all need this sort of information to be of any real use.
So let’s look at an example using a simple sample size estimation technique. The first thing we need to do is decide how precise we need to be by choosing the margin of error we are prepared to tolerate. For example, as you approach an election a pollster may report that 60% of voters will vote for political party A with a margin of error of +/-2.5%. The smaller you want this margin of error to be then the more data you will need to use.
Next we need to choose the alpha value, which refers to the confidence level you want to work with. The smaller the alpha then the greater your confidence that the true result you are looking for lies within the margin of error you chose.
Alpha is normally reported as a proportion so if we give our poll above an alpha of 0.1 it means we are (1.0 - 0.1) * 100 = 90% certain that the true result will be within our margin of error 90% of the time we carry out the poll. Again, the smaller the alpha you choose then the more data you will need to get the correct answer.
Finally, the third thing we need is something called a critical standard score (also known as z). This is basically a measure of the amount of variability in your data. The more variable your data is then the harder it is to identify small differences so the more results you will need to analyse.
How Much Data Is Enough?
This is where it all gets a bit complicated as we need to know how variable our data is before we know how much data we actually need to collect. But if we don’t know how much data to collect then how can we look at how variable it is? At this point we have to be pragmatic and just make an estimate for z. Without getting too tangled up in the mathematics (you need a normal distribution calculator if you are interested in finding out a bit more about it), for our example above an alpha of 0.1 gives an estimate of around 1.645 for z.
Now all we have to do is the final calculation where:
z = 1.645
alpha = 0.9
Margin of error (ME) = 2.5%, which is equivalent to 0.025
n = number of samples
n = ( z2 * alpha2 ) / ME2
n = (2.71 * 0.81) / 0.000625
n = 3512
And there we have it, to be 90% certain that you are within +/-2.5% of the actual value you need 3512 samples to analyse.
More than you were expecting?
False Positives and False Negatives
One of the key aspects of estimating sample size is the confidence level you wish to work with. This is essentially how confident you are that the true result you are looking at lies within your chosen margin of error. For example, if you were comparing the goal scoring rates of football teams you may wish to be 95% certain that one team is more likely to score in the second half of a match than another.
Of course, this still leaves us with a 5% chance that the difference we are observing is just down to random variance or luck. Seeing statistical difference like this where it doesn’t exist is known as a Type One error or a false positive. The chances of these errors occurring can be reduced by using higher confidence levels. For example, moving up to a confidence level of 99% means that false positives will only happen one percent of the time instead. However, this increased confidence requires more data to be analysed.
The opposite of this are Type Two errors, also known as false negatives, which is seeing no statistical difference when a difference does actually exist. The rate these false negatives occur has its own equivalent of the confidence level known as power, which is all too often overlooked when sample sizes are calculated.
There is no set value for power but traditionally statistical analyses are designed in such a way that they have an 80% probability of detecting a difference when there really is a difference to be detected. In other words there is a 20% probability that a false negative will occur and the statistical difference will be missed.
This may seem quite high as it means one in five analyses will potentially fail to detect the differences they are intended to identify. It also feels especially high when you consider that confidence levels are normally set to 95% in order to minimise the risk of false positives to just 5%.
So where does this 80% come from? Traditionally statisticians have considered false positives to be around four times more serious than false negatives and therefore requiring more stringent safeguards. Therefore, if the risk of a false positive result is 5% then the risk of a false negative needs only be 20%. This then gives a reasonable balance of risk between the two while keeping sample sizes at realistic levels.
Statistical power also depends on a number of other factors, of which the two most important are the size of the difference you wish to detect and the variability of the data you are using. The larger the difference is then the easier it is to detect and the smaller the sample size you can get away with. However, the more variable the data is then the more difficult it is to separate statistical difference from random chance and so a greater amount of data is needed.
Determining Sample Size Using Power
There are plenty of different ways to determine sample size using power, of which probably the simplest is Lehr’s Rule shown below, which gives a confidence level of 95% and Power of 80%....
n = 16 / delta^2
...where delta is the expected difference we wish to detect between two sets of data divided by the overall standard deviation of the data.
An Example: Two Football Teams
Let’s take a look at a somewhat contrived example and compare two teams, one with a win rate of 65% and one with a win rate of 68%, with an overall deviation of 5% across the data. Although one team has a higher win rate, how much data do we need before we can actually be statistically certain they are really better than the opposition?
n = 16 / delta^2
delta = (68-65 / 5)^2 = 0.36
n = 16 / 0.36
n = 44
As the margins become smaller the sample sizes grow larger and larger as it becomes more difficult to detect the difference. This is important to realise with sports data as by the time you are comparing teams of similar abilities, sample sizes can become larger than the number of games actually played in a season and champions often turn out to be just the luckiest rather than the best.
Be careful that you are using enough data when analysing teams and are not just being fooled by random chance or variability.
Follow Martin on Twitter: @penaltyblog
And read more of his work on his blog pena.lt/y/