In statistics and in particular statistical theory, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the standard deviation (a measure of statistical dispersion) of a population of values, in such a way that the expected value of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence intervals, or by using Bayesian analysis.
However, for statistical theory, it provides an exemplar problem in the context of estimation theory which is both simple to state and for which results cannot be obtained in closed form. It also provides an example where imposing the requirement for unbiased estimation might be seen as just adding inconvenience, with no real benefit.
YouTube Encyclopedic

1/5Views:272 600207 841259 55736 988397

✪ Sample standard deviation and bias  Probability and Statistics  Khan Academy

✪ Proof that the Sample Variance is an Unbiased Estimator of the Population Variance

✪ Review and intuition why we divide by n1 for the unbiased sample  Khan Academy

✪ Statistics : Sample Standard Deviation and Variance

✪ Unbiased Estimation
Transcription
Let's say that you're a watermelon farmer, and you want to study how dense the seeds are in your watermelon. Perhaps you want to do this because over time, you're trying to breed watermelons that have fewer seeds, and you should see whether you are actually making progress. And you don't want to cut open every watermelon in your watermelon farm or patch or whatever it might be called, because you want to sell most of them. You just want to sample a few watermelons, and then take samples of those watermelons to figure out how dense the seeds are, and hope that you can calculate statistics on those samples that are decent estimates of the parameters for the population. So let's start doing that. So let's say that you take these little cubic inch chunks out of a random sample of your watermelons. And then you count the number of seeds in them. And you have 8 samples like this. So in one of them, you found 4 seeds. In the next, you found 3, 5, 7, 2, 9, 11, and 7. So this is a sample, just to make sure we're visualizing it right. If this is the population of all of the chunks I guess we could view this as a cubic inch the cubic inch chunks in my entire watermelon farm, I'm sampling a very small sample of them. Maybe I could have had a million over here. A million chunks of watermelon could have been produced from my farm, but I'm only sampling so capital N would be 1 million, lowercase n is equal to 8. And once again, you might want to have more samples, but this'll make our math easy. Now, let's think about what statistics we can measure. Well, the first one that we often do is a measure of central tendency. And that's the arithmetic mean. But here, we're trying to estimate the population mean by coming up with the sample mean. So what is the sample mean going to be? Well, all we have to do is add up these points, add up these measurements, and then divide by the number of measurements we have. So let's get our calculator out for that. Actually, maybe I don't need my calculator. Let's see. So 4 plus 3 is 7. 7 plus 5 is 12. 12 plus 7 is 19. 19 plus 2 is 21, plus 9 is 30, plus 11 is 41, plus 7 is 48. So I'm going to get 48 over 8 data points. So this worked out quite well. 48 divided by 8 is equal to 6. So our sample mean is 6. It's our estimate of what the population mean might be. But we also want to think about how much in our population we want to estimate, how much spread is there, or how much do our measurements vary from this mean. So there, we say, well, we can try to estimate the population variance by calculating the sample variance. And we're going to calculate the unbiased sample variance. Hopefully, we're fairly convinced at this point why we divide by n minus 1. So we're going to calculate the unbiased sample variance. And if we do that, what do we get? I'll do this in a different color. It's going to be 4 minus 6 squared plus 3 minus 6 squared plus 5 minus 6 squared plus 7 minus 6 squared plus 2 minus 6 squared plus 9 minus 6 squared plus 11 minus 6 squared plus 7 minus 6 squared, all of that divided by not by 8. Remember, we want the unbiased sample variance. We're going to divide it by 8 minus 1. So we're going to divide by 7. Let me give myself a little bit more real estate. The unbiased sample variance and I could even denote it by this to make it clear that we're dividing by lowercase n minus 1 is going to be equal to let's see, 4 minus 6 is negative 2. That squared is positive 4. So I did that one. 3 minus 6 is negative 3. That squared is going to be 9. 5 minus 6 squared is 1 squared, which is 1. 7 minus 6 is once again 1 squared, which is 1. 2 minus 6, negative 4 squared is 16. 9 minus 6 squared, well, that's going to be 9. 11 minus 6 squared, that is 25. And then finally, 7 minus 6 squared, that's another 1. And we're going to divide it by 7. Let's see if we can add this up in our heads. 4 plus 9 is 13, plus 1 is 14, 15, 31, 40, 65, 66. So this is going to be equal to 66 over 7. And we could either divide we get that's 9 and 3/7. We could write that as 9 and 3/7. Or if we want to write that as a decimal, I could just take 66 divided by 7 gives us 9 point I'll just round it. So it's approximately 9.43. Now, that gave us our unbiased sample variance. Well, how could we calculate a sample standard deviation? We want to somehow get added estimate of what the population standard deviation might be. Well, the logic, I guess, is reasonable to say, well, this is our unbiased sample variance. It's our best estimate of what the true population variance is. When we think about population parameters to get the population standard deviation, we just take the square root of the population variance. So if we want to get an estimate of the sample standard deviation, why don't we just take the square root of the unbiased sample variance? So that's what we'll do. So we'll define it that way. We'll call it the sample standard deviation. We're going to define it to be equal to the square root of the unbiased sample variance. It's going to be the square root of this quantity, and we can take our calculator out. It's going to be the square root of what I just typed in. I can do 2nd answer. It'll be the last entry here. So the square root of that is and I'll just round. It's approximately equal to 3.07. Now, I'm going to tell you something very counterintuitive. Or at least initially it's counterintuitive, but hopefully you'll appreciate this over time. This we've already talked about in some depth. People have even created simulations to show that this is an unbiased estimate of population variance when we divide it by n minus 1. And that's a good starting point if we're going to take the square root of anything. But it actually turns out that because the square root function is nonlinear, that this sample standard deviation and this is how it tends to be defined sample standard deviation, that this sample standard deviation, which is the square root of our sample variance, so from i equals 1 to n of our unbiased sample variance, so we divide it by n minus 1. This is how we literally divide our sample standard deviation. Because the square root function is nonlinear, it turns out that this is not an unbiased estimate of the true population standard deviation. And I encourage people to make simulations of that if they're interested. But then you might say, well, we went through great pains to divide by n minus 1 here in order to get an unbiased estimate of the population variance. Why don't we go through similar pains and somehow figure out a formula for an unbiased estimate of the population standard deviation? And the reason why that's difficult is to unbias the sample variance, we just have to divide by n minus 1 instead of n. And that'd work for any probability distribution for our population. It turns out to do the same thing for the standard deviation. It's not that easy. It's actually dependent on how that population is actually distributed. So in statistics, we just define the sample standard deviation. And the one that we typically use is based on the square root of the unbiased sample variance. But when you take that square root, it does give you a biased result when you're trying to use this to estimate the population standard deviation. But it's the simplest, best tool we have.
Contents
Background
In statistics, the standard deviation of a population of numbers is often estimated from a random sample drawn from the population. This is the sample standard deviation, which is defined by
where is the sample (formally, realizations from a random variable X) and is the sample mean.
One way of seeing that this is a biased estimator of the standard deviation of the population is to start from the result that s^{2} is an unbiased estimator for the variance σ^{2} of the underlying population if that variance exists and the sample values are drawn independently with replacement. The square root is a nonlinear function, and only linear functions commute with taking the expectation. Since the square root is a strictly concave function, it follows from Jensen's inequality that the square root of the sample variance is an underestimate.
The use of n − 1 instead of n in the formula for the sample variance is known as Bessel's correction, which corrects the bias in the estimation of the population variance, and some, but not all of the bias in the estimation of the sample standard deviation.
It is not possible to find an estimate of the standard deviation which is unbiased for all population distributions, as the bias depends on the particular distribution. Much of the following relates to estimation assuming a normal distribution.
Bias correction
Results for the normal distribution
When the random variable is normally distributed, a minor correction exists to eliminate the bias. To derive the correction, note that for normally distributed X, Cochran's theorem implies that has a chi square distribution with n − 1 degrees of freedom and thus its square root, has a chi distribution with n − 1 degrees of freedom. Consequently, calculating the expectation of this last expression and rearranging constants,
where the correction factor c_{4}(n) is the scale mean of the chi distribution with n − 1 degrees of freedom, This depends on the sample size n, and is given as follows:^{[1]}
where Γ(·) is the gamma function. An unbiased estimator of σ can be obtained by dividing s by c_{4}(n). As n grows large it approaches 1, and even for smaller values the correction is minor. The figure shows a plot of c_{4}(n) versus sample size. The table below gives numerical values of c_{4} and algebraic expressions for some values of n; more complete tables may be found in most textbooks^{[citation needed]} on statistical quality control.
Sample size  Expression of c_{4}  Numerical value 

2  0.7978845608  
3  0.8862269255  
4  0.9213177319  
5  0.9399856030  
6  0.9515328619  
7  0.9593687891  
8  0.9650304561  
9  0.9693106998  
10  0.9726592741  
100  0.9974779761  
1000  0.9997497811  
10000  0.9999749978  
2k  
2k+1 
It is important to keep in mind this correction only produces an unbiased estimator for normally and independently distributed X. When this condition is satisfied, another result about s involving c_{4}(n) is that the standard error of s is^{[2]}^{[3]} , while the standard error of the unbiased estimator is
Rule on thumb for the normal distribution === If calculation of the function c_{4}(n) appears too difficult, there is a simple ruleonthumb^{[4]} to take the estimator
The formula differs from the familiar expression for s^{2} only by having n − 1.5 instead on n − 1 in the denominator. This expression is only approximate, in fact
The bias is relatively small: say, for n {{}} 3 it is equal to 1.3%, and for n {{}} 9 the bias is already 0.1%.
Other distributions
In cases where statistically independent data are modelled by a parametric family of distributions other than the normal distribution, the population standard deviation will, if it exists, be a function of the parameters of the model. One general approach to estimation would be maximum likelihood. Alternatively, it may be possible to use the Rao–Blackwell theorem as a route to finding a good estimate of the standard deviation. In neither case would the estimates obtained usually be unbiased. Notionally, theoretical adjustments might be obtainable to lead to unbiased estimates but, unlike those for the normal distribution, these would typically depend on the estimated parameters.
If the requirement is simply to reduce the bias of an estimated standard deviation, rather than to eliminate it entirely, then two practical approaches are available, both within the context of resampling. These are jackknifing and bootstrapping. Both can be applied either to parametrically based estimates of the standard deviation or to the sample standard deviation.
For nonnormal distributions an approximate (up to O(n^{−1}) terms) formula for the unbiased estimator of the standard deviation is
where γ_{2} denotes the population excess kurtosis. The excess kurtosis may be either known beforehand for certain distributions, or estimated from the data.
Effect of autocorrelation (serial correlation)
The material above, to stress the point again, applies only to independent data. However, realworld data often does not meet this requirement; it is autocorrelated (also known as serial correlation). As one example, the successive readings of a measurement instrument that incorporates some form of “smoothing” (more correctly, lowpass filtering) process will be autocorrelated, since any particular value is calculated from some combination of the earlier and later readings.
Estimates of the variance, and standard deviation, of autocorrelated data will be biased. The expected value of the sample variance is^{[5]}
where n is the sample size (number of measurements) and is the autocorrelation function (ACF) of the data. (Note that the expression in the brackets is simply one minus the average expected autocorrelation for the readings.) If the ACF consists of positive values then the estimate of the variance (and its square root, the standard deviation) will be biased low. That is, the actual variability of the data will be greater than that indicated by an uncorrected variance or standard deviation calculation. It is essential to recognize that, if this expression is to be used to correct for the bias, by dividing the estimate by the quantity in brackets above, then the ACF must be known analytically, not via estimation from the data. This is because the estimated ACF will itself be biased.^{[6]}
Example of bias in standard deviation
To illustrate the magnitude of the bias in the standard deviation, consider a dataset that consists of sequential readings from an instrument that uses a specific digital filter whose ACF is known to be given by
where α is the parameter of the filter, and it takes values from zero to unity. Thus the ACF is positive and geometrically decreasing.
The figure shows the ratio of the estimated standard deviation to its known value (which can be calculated analytically for this digital filter), for several settings of α as a function of sample size n. Changing α alters the variance reduction ratio of the filter, which is known to be
so that smaller values of α result in more variance reduction, or “smoothing.” The bias is indicated by values on the vertical axis different from unity; that is, if there were no bias, the ratio of the estimated to known standard deviation would be unity. Clearly, for modest sample sizes there can be significant bias (a factor of two, or more).
Variance of the mean
It is often of interest to estimate the variance or standard deviation of an estimated mean rather than the variance of a population. When the data are autocorrelated, this has a direct effect on the theoretical variance of the sample mean, which is^{[7]}
The variance of the sample mean can then be estimated by substituting an estimate of σ^{2}. One such estimate can be obtained from the equation for E[s^{2}] given above. First define the following constants, assuming, again, a known ACF:
so that
This says that the expected value of the quantity obtained by dividing the observed sample variance by the correction factor gives an unbiased estimate of the variance. Similarly, rewriting the expression above for the variance of the mean,
and substituting the estimate for gives^{[8]}
which is an unbiased estimator of the variance of the mean in terms of the observed sample variance and known quantities. Note that, if the autocorrelations are identically zero, this expression reduces to the wellknown result for the variance of the mean for independent data. The effect of the expectation operator in these expressions is that the equality holds in the mean (i.e., on average).
Estimating the standard deviation of the population
Having the expressions above involving the variance of the population, and of an estimate of the mean of that population, it would seem logical to simply take the square root of these expressions to obtain unbiased estimates of the respective standard deviations. However it is the case that, since expectations are integrals,
Instead, assume a function θ exists such that an unbiased estimator of the standard deviation can be written
and θ depends on the sample size n and the ACF. In the case of NID (normally and independently distributed) data, the radicand is unity and θ is just the c_{4} function given in the first section above. As with c_{4}, θ approaches unity as the sample size increases (as does γ_{1}).
It can be demonstrated via simulation modeling that ignoring θ (that is, taking it to be unity) and using
removes all but a few percent of the bias caused by autocorrelation, making this a reducedbias estimator, rather than an unbiased estimator. In practical measurement situations, this reduction in bias can be significant, and useful, even if some relatively small bias remains. The figure above, showing an example of the bias in the standard deviation vs. sample size, is based on this approximation; the actual bias would be somewhat larger than indicated in those graphs since the transformation bias θ is not included there.
Estimating the standard deviation of the mean
The unbiased variance of the mean in terms of the population variance and the ACF is given by
and since there are no expected values here, in this case the square root can be taken, so that
Using the unbiased estimate expression above for σ, an estimate of the standard deviation of the mean will then be
If the data are NID, so that the ACF vanishes, this reduces to
In the presence of a nonzero ACF, ignoring the function θ as before leads to the reducedbias estimator
which again can be demonstrated to remove a useful majority of the bias.
See also
References
 ^ Ben W. Bolch, "More on unbiased estimation of the standard deviation", The American Statistician, 22(3), p. 27 (1968)
 ^ Duncan, A. J., Quality Control and Industrial Statistics 4th Ed., Irwin (1974) ISBN 0256015589, p.139
 ^ * N.L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate Distributions, Volume 1, 2nd edition, Wiley and sons, 1994. ISBN 0471584959. Chapter 13, Section 8.2
 ^ Richard M. Brugger, "A Note on Unbiased Estimation on the Standard Deviation", The American Statistician (23) 4 p. 32 (1969)
 ^ Law and Kelton, Simulation Modeling and Analysis, 2nd Ed. McGrawHill (1991), p.284, ISBN 0070366985. This expression can be derived from its original source in Anderson, The Statistical Analysis of Time Series, Wiley (1971), ISBN 0471047457, p.448, Equation 51.
 ^ Law and Kelton, p.286. This bias is quantified in Anderson, p.448, Equations 52–54.
 ^ Law and Kelton, p.285. This equation can be derived from Theorem 8.2.3 of Anderson. It also appears in Box, Jenkins, Reinsel, Time Series Analysis: Forecasting and Control, 4th Ed. Wiley (2008), ISBN 9780470272848, p.31.
 ^ Law and Kelton, p.285
 Douglas C. Montgomery and George C. Runger, Applied Statistics and Probability for Engineers, 3rd edition, Wiley and sons, 2003. (see Sections 7–2.2 and 16–5)
External links
 A Java interactive graphic showing the Helmert PDF from which the bias correction factors are derived.
 MonteCarlo simulation demo for unbiased estimation of standard deviation.
 http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc32.htm What are Variables Control Charts?
This article incorporates public domain material from the National Institute of Standards and Technology website https://www.nist.gov.