Sampling error

In statistics, sampling errors are incurred when the statistical characteristics of a population are estimated from a subset, or sample, of that population. Since the sample does not include all members of the population, statistics of the sample (often known as estimators), such as means and quartiles, generally differ from the statistics of the entire population (known as parameters). The difference between the sample statistic and population parameter is considered the sampling error.^[1] For example, if one measures the height of a thousand individuals from a population of one million, the average height of the thousand is typically not the same as the average height of all one million people in the country.

Since sampling is almost always done to estimate population parameters that are unknown, by definition exact measurement of the sampling errors will not be possible; however they can often be estimated, either by general methods such as bootstrapping, or by specific methods incorporating some assumptions (or guesses) regarding the true population distribution and parameters thereof.

YouTube Encyclopedic

1/3
Views:
102 032
15 883
134 517

Transcription

Taking probability samples of large populations is considered common practice in the social sciences. Although a random selection process is generally the best way of getting a representative sample from a population, it doesn’t guarantee a perfect sample. We must acknowledge that even the best random samples will always be a little different from the true population. We call that “sampling error”. It occurs when we take a random sample rather than observe every subject in a population. Let’s pretend that you are conducting a telephone survey on how much people spend during their summer vacation. You call 1000 randomly selected U.S. households and just by dumb luck after a 100 phone calls you happened to get a hold of Mark Zuckerberg, founder of Facebook, and he agrees to take your survey. Unlikely, but possible. Let’s also pretend that after calling 600 people you also got a hold of Oprah Winfrey. Again, unlikely, but the point I’m trying to make is that if, just by random chance or luck, we got slightly too many rich people in our sample, or too few wealthy people, our sample will look a little different than the true population. That difference is called sampling error. When collecting a sample, we can’t avoid sampling error, but we can estimate the size of sampling error and there are ways of reducing sampling error. The margin of error that you commonly see with survey results is an estimate of sampling error. Because it is just an estimate, there is a small chance, usually 5% or less, that the margin of error is actually larger than stated in a report. We can reduce sampling error by increasing the sample size, that is you can select more subjects to observe. As your sample size increases, your sampling error decreases. But increasing your sample size also increases costs, both in time and in money. And after about a 1000 cases you start to get less bang for your buck. As you can see in this chart, after a 1000 cases, even if you more than double your sample size to 2500 subjects you only reduce your margin of error by 1%. You can also reduce sampling error with a good sampling design. For example, if your overall population has distinct subpopulations, then sampling each subpopulation independently may reduce sampling error. But these techniques can only reduce sampling error so far. The only way to remove sampling error completely would be to observe every element in a population, which is impractical if not, in some cases, impossible. We simply must acknowledge that survey samples are imperfect, but generally a very efficient and accurate way of studying a large, complex population.

Description

The sampling error is the error caused by observing a sample instead of the whole population.^[1] The sampling error is the difference between a sample statistic used to estimate a population parameter and the actual but unknown value of the parameter.^[2]

Effective Sampling

In statistics, a truly random sample means selecting individuals from a population with an equivalent probability; in other words, picking individuals from a group without bias. Failing to do this correctly will result in a sampling bias, which can dramatically increase the sample error in a systematic way. For example, attempting to measure the average height of the entire human population of the Earth, but measuring a sample only from one country, could result in a large over- or under-estimation. In reality, obtaining an unbiased sample can be difficult as many parameters (in this example, country, age, gender, and so on) may strongly bias the estimator and it must be ensured that none of these factors play a part in the selection process.

Even in a perfect non-biased sample, the sample error will still exist due to the remaining statistical component; consider that measuring only two or three individuals and taking the average would produce a wildly varying result each time. The likely size of the sampling error can generally be reduced by taking a larger sample.^[3]

Sample Size Determination

The cost of increasing a sample size may be prohibitive in reality. Since the sample error can often be estimated beforehand as a function of the sample size, various methods of sample size determination are used to weigh the predicted accuracy of an estimator against the predicted cost of taking a larger sample.

Bootstrapping and Standard Error

As discussed, a sample statistic, such as an average or percentage, will generally be subject to sample-to-sample variation.^[1] By comparing many samples, or splitting a larger sample up into smaller ones (potentially with overlap), the spread of the resulting sample statistics can be used to estimate the standard error on the sample.

In Genetics

The term "sampling error" has also been used in a related but fundamentally different sense in the field of genetics; for example in the bottleneck effect or founder effect, when natural disasters or migrations dramatically reduce the size of a population, resulting in a smaller population that may or may not fairly represent the original one. This is a source of genetic drift, as certain alleles become more or less common), and has been referred to as "sampling error",^[4] despite not being an "error" in the statistical sense.

References

Wikimedia Commons has media related to Sampling error.

^ ^a ^b ^c Sarndal, Swenson, and Wretman (1992), Model Assisted Survey Sampling, Springer-Verlag, ISBN 0-387-40620-4
^ Burns, N.; Grove, S. K. (2009). The Practice of Nursing Research: Appraisal, Synthesis, and Generation of Evidence (6th ed.). St. Louis, MO: Saunders Elsevier. ISBN 978-1-4557-0736-2.
^ Scheuren, Fritz (2005). "What is a Margin of Error?". What is a Survey? (PDF). Washington, D.C.: American Statistical Association. Archived from the original (PDF) on 2013-03-12. Retrieved 2008-01-08.
^ Campbell, Neil A.; Reece, Jane B. (2002). Biology. Benjamin Cummings. pp. 450–451. ISBN 0-536-68045-0.

This page was last edited on 20 October 2023, at 18:44