To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

# Sampling distribution

In statistics, a sampling distribution or finite-sample distribution is the probability distribution of a given random-sample-based statistic. If an arbitrarily large number of samples, each involving multiple observations (data points), were separately used in order to compute one value of a statistic (such as, for example, the sample mean or sample variance) for each sample, then the sampling distribution is the probability distribution of the values that the statistic takes on. In many contexts, only one sample is observed, but the sampling distribution can be found theoretically.

Sampling distributions are important in statistics because they provide a major simplification en route to statistical inference. More specifically, they allow analytical considerations to be based on the probability distribution of a statistic, rather than on the joint probability distribution of all the individual sample values.

• 1/5
Views:
147 275
797 580
53 913
564 541
21 556
• The Sampling Distribution of the Sample Mean (fast version)
• Sampling distribution of the sample mean | Probability and Statistics | Khan Academy
• Finding Probability of a Sampling Distribution of Means Example 1
• Sampling distribution example problem | Probability and Statistics | Khan Academy
• Statistics: Using Excel to Calculate Sample Means (Sampling Distribution)

#### Transcription

let's take a look at the sampling distribution of the sample mean X bar. I'm going to assume that we are sampling from an infinite population or that we are sampling only a small fraction from a finite population this is true the vast majority of the time, but the underlying mathematics changes a little bit if we are sampling a larger fraction from a finite population Suppose we are sampling from a population with mean mu and standard deviation sigma and we're going to let X bar be a random variable representing the sample mean of n independently drawn observations from this distribution and a couple of important points the mean of the sampling distribution of the sample mean is simply equal to the population mean symbolically the mean of the sampling distribution of the X bar is equal to the mean of the population from which we are sampling. And the standard deviation of the sampling distribution of X bar is equal to this quantity down here. The standard deviation of the sampling distribution of X bar is equal to sigma over the square root of n. Why are these two things the case? It's not too tough to show mathematically and I look at that another video. One more important point if the population from which we are sampling is normally distributed then the sampling distribution of the sample mean X bar is also normal. We're going to bring these things together in some probability calculations. And we would standardize in our probability calculations in the usual way so for a single value if we have some random variable X, say that's normally distributed then when we're standardizing for a single value we say that Z is equal to X minus its mean over its standard deviation and we've done that previously but for the mean of n observations when we are standardizing we would do a similar thing. Z is equal to the sample mean X bar minus its mean which just happens to be mu over the standard deviation of X bar and this is simply X bar minus mu over sigma divided by the square root of n so if we're talking about the mean of n observations we need to remember to include the square root of n down here. Let's look at an example. The number of calories in order a poutine at a certain fast food restaurant is approximately normally distributed with a mean of 740 calories and a standard deviation of 20. if you are unfamiliar with poutine it is french fries, cheese curds and gravy. Very tasty but also not so good for you. At a certain fast food restaurant around here it has a mean calorie content of a approximately 740 calories and we're going to say it's normally distributed with a standard deviation of 20 calories and we're going to do a couple of probability calculations. what is the probability a randomly selected order has at least 760 calories well if we let the random variable X represent the number of calories in a randomly selected order of poutine then X is approximately normally distributed with a mean of 740 and a standard deviation of 20. And what we want is the probability that X is bigger than or equal to 760 and we're gonna standardize this and so we've done this type of thing before where we standardize that we said our Z is equal to X minus mu over sigma so no problem we can say this is the probability that Z is bigger than or equal to 760 minus 740 over sigma which is twenty. And this is the probability that Z is bigger than or equal to 1 and so if we drew out our standard normal curve because Z has a standard normal distribution and we went to our normal table or a computer and we found the area to the right of one we would see that this is approximately 0.159 to three decimal places A different type of question what is the probability than mean of 9 randomly selected orders is at least 760 calories? This is fundamentally different because we're talking about the mean of nine of these orders so we want to know the probability that X bar is bigger than or equal to 760. we're going to do a very similar thing to what we did up top but we have to remember that our Z is equal to X bar minus mu over sigma divided by the square root of n because we're talking about X bar here not a single observation so when we standardize here we say the probability that Z is bigger than or equal to 760 minus 740 over twenty divided by the square root of nine this is the probability that Z is bigger than or equal to three, where Z has a standard normal distribution. So we draw this out We have zero here we have 3 out here in this right tail and we're looking for this area. This turns out to be approximately 0.0013 if you put that into a computer or use your standard normal table to find that area. Visually here's what's happening at the solid line is representing the distribution of a single poutine this was approximately normally distributed with a mean of 740 and a standard deviation of 20. What we found in the first question on the last page was this area, the area to the right of 760 under this distribution but when we're talking about the mean of 9 poutine the distribution is a little different. It's very similar this distribution has a mean of 740, so the same mean as that of a single observation, but the standard deviation of the sampling distribution of X bar was sigma over the square root of n and that was twenty over the square root of n which is simply twenty over 3 so the standard deviation of this distribution is less than the standard deviation for a single value and that we can see that because it has this higher peak and less area out in the far tails and so what we found that second question was simply this little bit of area under this dotted line distribution out to the right of 760 one last thing before i let you go there is a very important concept in statistics called the central limit theorem, and the gist of it is that the sample mean will be approximately normally distributed for large sample sizes regardless of the distribution from which we are sampling so even when we're sampling from non-normal populations, if we have a large enough sample size the sample mean will be approximately normally distributed and this is an extremely important concept in the world of statistics and probability, something that is deserving of its own video and i have a video for that but i would be remiss if i didn't at least mention it here

## Introduction

The sampling distribution of a statistic is the distribution of that statistic, considered as a random variable, when derived from a random sample of size ${\displaystyle n}$. It may be considered as the distribution of the statistic for all possible samples from the same population of a given sample size. The sampling distribution depends on the underlying distribution of the population, the statistic being considered, the sampling procedure employed, and the sample size used. There is often considerable interest in whether the sampling distribution can be approximated by an asymptotic distribution, which corresponds to the limiting case either as the number of random samples of finite size, taken from an infinite population and used to produce the distribution, tends to infinity, or when just one equally-infinite-size "sample" is taken of that same population.

For example, consider a normal population with mean ${\displaystyle \mu }$ and variance ${\displaystyle \sigma ^{2}}$. Assume we repeatedly take samples of a given size from this population and calculate the arithmetic mean ${\displaystyle \scriptstyle {\bar {x}}}$ for each sample – this statistic is called the sample mean. The distribution of these means, or averages, is called the "sampling distribution of the sample mean". This distribution is normal ${\displaystyle \scriptstyle {\mathcal {N}}(\mu ,\,\sigma ^{2}/n)}$ (n is the sample size) since the underlying population is normal, although sampling distributions may also often be close to normal even when the population distribution is not (see central limit theorem). An alternative to the sample mean is the sample median. When calculated from the same population, it has a different sampling distribution to that of the mean and is generally not normal (but it may be close for large sample sizes).

The mean of a sample from a population having a normal distribution is an example of a simple statistic taken from one of the simplest statistical populations. For other statistics and other populations the formulas are more complicated, and often they don't exist in closed-form. In such cases the sampling distributions may be approximated through Monte-Carlo simulations[1][p. 2], bootstrap methods, or asymptotic distribution theory.

## Standard error

The standard deviation of the sampling distribution of a statistic is referred to as the standard error of that quantity. For the case where the statistic is the sample mean, and samples are uncorrelated, the standard error is:

${\displaystyle \sigma _{\bar {x}}={\frac {\sigma }{\sqrt {n}}}}$

where ${\displaystyle \sigma }$ is the standard deviation of the population distribution of that quantity and ${\displaystyle n}$ is the sample size (number of items in the sample).

An important implication of this formula is that the sample size must be quadrupled (multiplied by 4) to achieve half (1/2) the measurement error. When designing statistical studies where cost is a factor, this may have a role in understanding cost–benefit tradeoffs.

## Examples

Population Statistic Sampling distribution
Normal: ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2})}$ Sample mean ${\displaystyle {\bar {X}}}$ from samples of size n ${\displaystyle {\bar {X}}\sim {\mathcal {N}}{\Big (}\mu ,\,{\frac {\sigma ^{2}}{n}}{\Big )}}$ or (In case sigma is not known): ${\displaystyle {\bar {X}}\sim {\mathcal {T}}{\Big (}\mu ,\,{\frac {S^{2}}{n}}{\Big )}}$ Where ${\displaystyle S}$ is the standard deviation of the sample and ${\displaystyle {\mathcal {T}}}$ is the Student's t-distribution.
Bernoulli: ${\displaystyle \operatorname {Bernoulli} (p)}$ Sample proportion of "successful trials" ${\displaystyle {\bar {X}}}$ ${\displaystyle n{\bar {X}}\sim \operatorname {Binomial} (n,p)}$
Two independent normal populations:

${\displaystyle {\mathcal {N}}(\mu _{1},\sigma _{1}^{2})}$  and  ${\displaystyle {\mathcal {N}}(\mu _{2},\sigma _{2}^{2})}$

Difference between sample means, ${\displaystyle {\bar {X}}_{1}-{\bar {X}}_{2}}$ ${\displaystyle {\bar {X}}_{1}-{\bar {X}}_{2}\sim {\mathcal {N}}\!\left(\mu _{1}-\mu _{2},\,{\frac {\sigma _{1}^{2}}{n_{1}}}+{\frac {\sigma _{2}^{2}}{n_{2}}}\right)}$
Any absolutely continuous distribution F with density ƒ Median ${\displaystyle X_{(k)}}$ from a sample of size n = 2k − 1, where sample is ordered ${\displaystyle X_{(1)}}$ to ${\displaystyle X_{(n)}}$ ${\displaystyle f_{X_{(k)}}(x)={\frac {(2k-1)!}{(k-1)!^{2}}}f(x){\Big (}F(x)(1-F(x)){\Big )}^{k-1}}$
Any distribution with distribution function F Maximum ${\displaystyle M=\max \ X_{k}}$ from a random sample of size n ${\displaystyle F_{M}(x)=P(M\leq x)=\prod P(X_{k}\leq x)=\left(F(x)\right)^{n}}$

## Statistical inference

In the theory of statistical inference, the idea of a sufficient statistic provides the basis of choosing a statistic (as a function of the sample data points) in such a way that no information is lost by replacing the full probabilistic description of the sample with the sampling distribution of the selected statistic.

In frequentist inference, for example in the development of a statistical hypothesis test or a confidence interval, the availability of the sampling distribution of a statistic (or an approximation to this in the form of an asymptotic distribution) can allow the ready formulation of such procedures, whereas the development of procedures starting from the joint distribution of the sample would be less straightforward.

In Bayesian inference, when the sampling distribution of a statistic is available, one can consider replacing the final outcome of such procedures, specifically the conditional distributions of any unknown quantities given the sample data, by the conditional distributions of any unknown quantities given selected sample statistics. Such a procedure would involve the sampling distribution of the statistics. The results would be identical provided the statistics chosen are jointly sufficient statistics.

## References

1. ^ Mooney, Christopher Z. (1999). Monte Carlo simulation. Thousand Oaks, Calif.: Sage. ISBN 9780803959439.
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.