Part of a series on Statistics 
Probability theory 

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
The LLN is important because it guarantees stable longterm results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the law only applies (as the name indicates) when a large number of observations is considered. There is no principle that a small number of observations will coincide with the expected value or that a streak of one value will immediately be "balanced" by the others (see the gambler's fallacy).
YouTube Encyclopedic

1/5Views:392 74863 9289 71721 44019 699

✪ Law of large numbers  Probability and Statistics  Khan Academy

✪ Lecture 29: Law of Large Numbers and Central Limit Theorem  Statistics 110

✪ Law of Large Numbers and Probability

✪ (PP 5.5) Law of large numbers and Central limit theorem

✪ The Law of Large Numbers
Transcription
Let's learn a little bit about the law of large numbers, which is on many levels, one of the most intuitive laws in mathematics and in probability theory. But because it's so applicable to so many things, it's often a misused law or sometimes, slightly misunderstood. So just to be a little bit formal in our mathematics, let me just define it for you first and then we'll talk a little bit about the intuition. So let's say I have a random variable, X. And we know its expected value or its population mean. The law of large numbers just says that if we take a sample of n observations of our random variable, and if we were to average all of those observations and let me define another variable. Let's call that x sub n with a line on top of it. This is the mean of n observations of our random variable. So it's literally this is my first observation. So you can kind of say I run the experiment once and I get this observation and I run it again, I get that observation. And I keep running it n times and then I divide by my number of observations. So this is my sample mean. This is the mean of all the observations I've made. The law of large numbers just tells us that my sample mean will approach my expected value of the random variable. Or I could also write it as my sample mean will approach my population mean for n approaching infinity. And I'll be a little informal with what does approach or what does convergence mean? But I think you have the general intuitive sense that if I take a large enough sample here that I'm going to end up getting the expected value of the population as a whole. And I think to a lot of us that's kind of intuitive. That if I do enough trials that over large samples, the trials would kind of give me the numbers that I would expect given the expected value and the probability and all that. But I think it's often a little bit misunderstood in terms of why that happens. And before I go into that let me give you a particular example. The law of large numbers will just tell us that let's say I have a random variable X is equal to the number of heads after 100 tosses of a fair coin tosses or flips of a fair coin. First of all, we know what the expected value of this random variable is. It's the number of tosses, the number of trials times the probabilities of success of any trial. So that's equal to 50. So the law of large numbers just says if I were to take a sample or if I were to average the sample of a bunch of these trials, so you know, I get my first time I run this trial I flip 100 coins or have 100 coins in a shoe box and I shake the shoe box and I count the number of heads, and I get 55. So that Would be X1. Then I shake the box again and I get 65. Then I shake the box again and I get 45. And I do this n times and then I divide it by the number of times I did it. The law of large numbers just tells us that this the average the average of all of my observations, is going to converge to 50 as n approaches infinity. Or for n approaching 50. I'm sorry, n approaching infinity. And I want to talk a little bit about why this happens or intuitively why this is. A lot of people kind of feel that oh, this means that if after 100 trials that if I'm above the average that somehow the laws of probability are going to give me more heads or fewer heads to kind of make up the difference. That's not quite what's going to happen. That's often called the gambler's fallacy. Let me differentiate. And I'll use this example. So let's say let me make a graph. And I'll switch colors. This is n, my xaxis is n. This is the number of trials I take. And my yaxis, let me make that the sample mean. And we know what the expected value is, we know the expected value of this random variable is 50. Let me draw that here. This is 50. So just going to the example I did. So when n is equal to let me just [INAUDIBLE] here. So my first trial I got 55 and so that was my average. I only had one data point. Then after two trials, let's see, then I have 65. And so my average is going to be 65 plus 55 divided by 2. which is 60. So then my average went up a little bit. Then I had a 45, which will bring my average down a little bit. I won't plot a 45 here. Now I have to average all of these out. What's 45 plus 65? Let me actually just get the number just so you get the point. So it's 55 plus 65. It's 120 plus 45 is 165. Divided by 3. 3 goes into 165 5 5 times 3 is 15. It's 53. No, no, no. 55. So the average goes down back down to 55. And we could keep doing these trials. So you might say that the law of large numbers tell this, OK, after we've done 3 trials and our average is there. So a lot of people think that somehow the gods of probability are going to make it more likely that we get fewer heads in the future. That somehow the next couple of trials are going to have to be down here in order to bring our average down. And that's not necessarily the case. Going forward the probabilities are always the same. The probabilities are always 50% that I'm going to get heads. It's not like if I had a bunch of heads to start off with or more than I would have expected to start off with, that all of a sudden things would be made up and I would get more tails. That would the gambler's fallacy. That if you have a long streak of heads or you have a disproportionate number of heads, that at some point you're going to have you have a higher likelihood of having a disproportionate number of tails. And that's not quite true. What the law of large numbers tells us is that it doesn't care let's say after some finite number of trials your average actually it's a low probability of this happening, but let's say your average is actually up here. Is actually at 70. You're like, wow, we really diverged a good bit from the expected value. But what the law of large numbers says, well, I don't care how many trials this is. We have an infinite number of trials left. And the expected value for that infinite number of trials, especially in this type of situation is going to be this. So when you average a finite number that averages out to some high number, and then an infinite number that's going to converge to this, you're going to over time, converge back to the expected value. And that was a very informal way of describing it, but that's what the law or large numbers tells you. And it's an important thing. It's not telling you that if you get a bunch of heads that somehow the probability of getting tails is going to increase to kind of make up for the heads. What it's telling you is, is that no matter what happened over a finite number of trials, no matter what the average is over a finite number of trials, you have an infinite number of trials left. And if you do enough of them it's going to converge back to your expected value. And this is an important thing to think about. But this isn't used in practice every day with the lottery and with casinos because they know that if you do large enough samples and we could even calculate if you do large enough samples, what's the probability that things deviate significantly? But casinos and the lottery every day operate on this principle that if you take enough people sure, in the shortterm or with a few samples, a couple people might beat the house. But over the longterm the house is always going to win because of the parameters of the games that they're making you play. Anyway, this is an important thing in probability and I think it's fairly intuitive. Although, sometimes when you see it formally explained like this with the random variables and that it's a little bit confusing. All it's saying is that as you take more and more samples, the average of that sample is going to approximate the true average. Or I should be a little bit more particular. The mean of your sample is going to converge to the true mean of the population or to the expected value of the random variable. Anyway, see you in the next video.
Contents
Examples
For example, a single roll of a fair, sixsided dice produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal probability. Therefore, the expected value of a single dice roll is
According to the law of large numbers, if a large number of sixsided dice are rolled, the average of their values (sometimes called the sample mean) is likely to be close to 3.5, with the precision increasing as more dice are rolled.
It follows from the law of large numbers that the empirical probability of success in a series of Bernoulli trials will converge to the theoretical probability. For a Bernoulli random variable, the expected value is the theoretical probability of success, and the average of n such variables (assuming they are independent and identically distributed (i.i.d.)) is precisely the relative frequency.
For example, a fair coin toss is a Bernoulli trial. When a fair coin is flipped once, the theoretical probability that the outcome will be heads is equal to 1/2. Therefore, according to the law of large numbers, the proportion of heads in a "large" number of coin flips "should be" roughly 1/2. In particular, the proportion of heads after n flips will almost surely converge to 1/2 as n approaches infinity.
Although the proportion of heads (and tails) approaches 1/2, almost surely the absolute difference in the number of heads and tails will become large as the number of flips becomes large. That is, the probability that the absolute difference is a small number, approaches zero as the number of flips becomes large. Also, almost surely the ratio of the absolute difference to the number of flips will approach zero. Intuitively, expected absolute difference grows, but at a slower rate than the number of flips, as the number of flips grows.
History
The Italian mathematician Gerolamo Cardano (1501–1576) stated without proof that the accuracies of empirical statistics tend to improve with the number of trials.^{[1]} This was then formalized as a law of large numbers. A special form of the LLN (for a binary random variable) was first proved by Jacob Bernoulli.^{[2]} It took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in his Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's Theorem". This should not be confused with Bernoulli's principle, named after Jacob Bernoulli's nephew Daniel Bernoulli. In 1837, S.D. Poisson further described it under the name "la loi des grands nombres" ("The law of large numbers").^{[3]}^{[4]} Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used.
After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev,^{[5]} Markov, Borel, Cantelli and Kolmogorov and Khinchin. Markov showed that the law can apply to a random variable that does not have a finite variance under some other weaker assumption, and Khinchin showed in 1929 that if the series consists of independent identically distributed random variables, it suffices that the expected value exists for the weak law of large numbers to be true.^{[6]}^{[7]} These further studies have given rise to two prominent forms of the LLN. One is called the "weak" law and the other the "strong" law, in reference to two different modes of convergence of the cumulative sample means to the expected value; in particular, as explained below, the strong form implies the weak.^{[6]}
Forms
Two different versions of the law of large numbers are described below; they are called the strong law of large numbers, and the weak law of large numbers. Stated for the case where X_{1}, X_{2}, ... is an infinite sequence of i.i.d. Lebesgue integrable random variables with expected value E(X_{1}) = E(X_{2}) = ...= µ, both versions of the law state that – with virtual certainty – the sample average
converges to the expected value

(law. 1)
(Lebesgue integrability of X_{j} means that the expected value E(X_{j}) exists according to Lebesgue integration and is finite. It does not mean that the associated probability measure is absolutely continuous with respect to Lebesgue measure.)
An assumption of finite variance Var(X_{1}) = Var(X_{2}) = ... = σ^{2} < ∞ is not necessary. Large or infinite variance will make the convergence slower, but the LLN holds anyway. This assumption is often used because it makes the proofs easier and shorter.
Mutual independence of the random variables can be replaced by pairwise independence in both versions of the law.^{[8]}
The difference between the strong and the weak version is concerned with the mode of convergence being asserted. For interpretation of these modes, see Convergence of random variables.
Weak law
The weak law of large numbers (also called Khinchin's law) states that the sample average converges in probability towards the expected value^{[9]}

(law. 2)
That is, for any positive number ε,
Interpreting this result, the weak law states that for any nonzero margin specified, no matter how small, with a sufficiently large sample there will be a very high probability that the average of the observations will be close to the expected value; that is, within the margin.
As mentioned earlier, the weak law applies in the case of i.i.d. random variables, but it also applies in some other cases. For example, the variance may be different for each random variable in the series, keeping the expected value constant. If the variances are bounded, then the law applies, as shown by Chebyshev as early as 1867. (If the expected values change during the series, then we can simply apply the law to the average deviation from the respective expected values. The law then states that this converges in probability to zero.) In fact, Chebyshev's proof works so long as the variance of the average of the first n values goes to zero as n goes to infinity.^{[7]} As an example, assume that each random variable in the series follows a Gaussian distribution with mean zero, but with variance equal to At each stage, the average will be normally distributed (as the average of a set of normally distributed variables). The variance of the sum is equal to the sum of the variances, which is asymptotic to . The variance of the average is therefore asymptotic to and goes to zero.
An example where the law of large numbers does not apply is the Cauchy distribution. Another example is where the random numbers equal the tangent of an angle uniformly distributed between −90° and +90°. The median is zero, but the expected value does not exist, and indeed the average of n such variables has the same distribution as one such variable. It does not converge in probability towards zero (or any other value) as n goes to infinity.
There are also examples of the weak law applying even though the expected value does not exist. See #Differences between the weak law and the strong law.
Strong law
The strong law of large numbers states that the sample average converges almost surely to the expected value^{[10]}

(law. 3)
That is,
What this means is that the probability that, as the number of trials n goes to infinity, the average of the observations converges to the expected value, is equal to one.
The proof is more complex than that of the weak law.^{[11]} This law justifies the intuitive interpretation of the expected value (for Lebesgue integration only) of a random variable when sampled repeatedly as the "longterm average".
Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law but not vice versa, when the strong law conditions hold the variable converges both strongly (almost surely) and weakly (in probability). However the weak law is known to hold in certain conditions where the strong law does not hold and then the convergence is only weak (in probability).^{[clarification needed]}
The strong law of large numbers can itself be seen as a special case of the pointwise ergodic theorem.
The strong law applies to independent identically distributed random variables having an expected value (like the weak law). This was proved by Kolmogorov in 1930. It can also apply in other cases. Kolmogorov also showed, in 1933, that if the variables are independent and identically distributed, then for the average to converge almost surely on something (this can be considered another statement of the strong law), it is necessary that they have an expected value (and then of course the average will converge almost surely on that).^{[12]}
If the summands are independent but not identically distributed, then
provided that each X_{k} has a finite second moment and
This statement is known as Kolmogorov's strong law, see e.g. Sen & Singer (1993, Theorem 2.3.10).
An example of a series where the weak law applies but not the strong law is when X_{k} is plus or minus (starting at sufficiently large k so that the denominator is positive) with probability 1/2 for each.^{[12]} The variance of X_{k} is then Kolmogorov's strong law does not apply because the partial sum in his criterion up to k=n is asymptotic to and this is unbounded.
If we replace the random variables with Gaussian variables having the same variances, namely then the average at any point will also be normally distributed. The width of the distribution of the average will tend toward zero (standard deviation asymptotic to ), but for a given ε, there is probability which does not go to zero with n that the average sometime after the nth trial will come back up to ε. Since this probability does not go to zero^{[clarification needed]}, it must have a positive lower bound p(ε), which means there is a probability of at least p(ε) that the average will attain ε after n trials. It will happen with probability p(ε)/2 before some m which depends on n. But even after m, there is still a probability of at least p(ε) that it will happen. (This seems to indicate that p(ε)=1 and the average will attain ε an infinite number of times.)
Differences between the weak law and the strong law
The weak law states that for a specified large n, the average is likely to be near μ. Thus, it leaves open the possibility that happens an infinite number of times, although at infrequent intervals. (Not necessarily for all n).
The strong law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have that for any ε > 0 the inequality holds for all large enough n.^{[13]}
The strong law does not hold in the following cases, but the weak law does.^{[14]}^{[15]}^{[16]}
1. Let X be an exponentially distributed random variable with parameter 1. The random variable has no expected value according to Lebesgue integration, but using conditional convergence and interpreting the integral as a Dirichlet integral, which is an improper Riemann integral, we can say:
2. Let x be geometric distribution with probability 0.5. The random variable does not have an expected value in the conventional sense because the infinite series is not absolutely convergent, but using conditional convergence, we can say:
3. If the cumulative distribution function of a random variable is
 then it has no expected value, but the weak law is true.^{[17]}^{[18]}
Uniform law of large numbers
Suppose f(x,θ) is some function defined for θ ∈ Θ, and continuous in θ. Then for any fixed θ, the sequence {f(X_{1},θ), f(X_{2},θ), ...} will be a sequence of independent and identically distributed random variables, such that the sample mean of this sequence converges in probability to E[f(X,θ)]. This is the pointwise (in θ) convergence.
The uniform law of large numbers states the conditions under which the convergence happens uniformly in θ. If^{[19]}^{[20]}
 Θ is compact,
 f(x,θ) is continuous at each θ ∈ Θ for almost all xs, and measurable function of x at each θ.
 there exists a dominating function d(x) such that E[d(X)] < ∞, and
Then E[f(X,θ)] is continuous in θ, and
This result is useful to derive consistency of a large class of estimators (see Extremum estimator).
Borel's law of large numbers
Borel's law of large numbers, named after Émile Borel, states that if an experiment is repeated a large number of times, independently under identical conditions, then the proportion of times that any specified event occurs approximately equals the probability of the event's occurrence on any particular trial; the larger the number of repetitions, the better the approximation tends to be. More precisely, if E denotes the event in question, p its probability of occurrence, and N_{n}(E) the number of times E occurs in the first n trials, then with probability one,^{[21]}
This theorem makes rigorous the intuitive notion of probability as the longrun relative frequency of an event's occurrence. It is a special case of any of several more general laws of large numbers in probability theory.
Chebyshev's inequality. Let X be a random variable with finite expected value μ and finite nonzero variance σ^{2}. Then for any real number k > 0,
Proof of the weak law
Given X_{1}, X_{2}, ... an infinite sequence of i.i.d. random variables with finite expected value E(X_{1}) = E(X_{2}) = ... = µ < ∞, we are interested in the convergence of the sample average
The weak law of large numbers states:
Theorem: 

(law. 2) 
Proof using Chebyshev's inequality assuming finite variance
This proof uses the assumption of finite variance (for all ). The independence of the random variables implies no correlation between them, and we have that
The common mean μ of the sequence is the mean of the sample average:
Using Chebyshev's inequality on results in
This may be used to obtain the following:
As n approaches infinity, the expression approaches 1. And by definition of convergence in probability, we have obtained

(law. 2)
Proof using convergence of characteristic functions
By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as
All X_{1}, X_{2}, ... have the same characteristic function, so we will simply denote this φ_{X}.
Among the basic properties of characteristic functions there are
 if X and Y are independent.
These rules can be used to calculate the characteristic function of in terms of φ_{X}:
The limit e^{itμ} is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem, converges in distribution to μ:
μ is a constant, which implies that convergence in distribution to μ and convergence in probability to μ are equivalent (see Convergence of random variables.) Therefore,

(law. 2)
This shows that the sample mean converges in probability to the derivative of the characteristic function at the origin, as long as the latter exists.
See also
 Asymptotic equipartition property
 Central limit theorem
 Infinite monkey theorem
 Law of averages
 Law of the iterated logarithm
 Lindy effect
 Regression toward the mean
 Sortition
 Law of truly large numbers
Notes
 ^ Mlodinow, L. The Drunkard's Walk. New York: Random House, 2008. p. 50.
 ^ Jakob Bernoulli, Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis, 1713, Chapter 4, (Translated into English by Oscar Sheynin)
 ^ Poisson names the "law of large numbers" (la loi des grands nombres) in: S.D. Poisson, Probabilité des jugements en matière criminelle et en matière civile, précédées des règles générales du calcul des probabilitiés (Paris, France: Bachelier, 1837), p. 7. He attempts a twopart proof of the law on pp. 139–143 and pp. 277 ff.
 ^ Hacking, Ian. (1983) "19thcentury Cracks in the Concept of Determinism", Journal of the History of Ideas, 44 (3), 455475 JSTOR 2709176
 ^ Tchebichef, P. (1846). "Démonstration élémentaire d'une proposition générale de la théorie des probabilités". Journal für die reine und angewandte Mathematik (Crelles Journal). 1846 (33): 259–267. doi:10.1515/crll.1846.33.259.
 ^ ^{a} ^{b} Seneta 2013.
 ^ ^{a} ^{b} Yuri Prohorov. "Law of large numbers". Encyclopedia of Mathematics.
 ^ Etemadi, N.Z. (1981). "An elementary proof of the strong law of large numbers". Wahrscheinlichkeitstheorie verw Gebiete. 55 (1): 119–122. doi:10.1007/BF01013465.
 ^ Loève 1977, Chapter 1.4, p. 14
 ^ Loève 1977, Chapter 17.3, p. 251
 ^ "The strong law of large numbers – What's new". Terrytao.wordpress.com. Retrieved 20120609.
 ^ ^{a} ^{b} Yuri Prokhorov. "Strong law of large numbers". Encyclopedia of Mathematics.
 ^ Ross (2009)
 ^ Lehmann, Erich L; Romano, Joseph P (20060330). Weak law converges to constant. ISBN 9780387276052.
 ^ "A NOTE ON THE WEAK LAW OF LARGE NUMBERS FOR EXCHANGEABLE RANDOM VARIABLES" (PDF). Dguvl Hun Hong and Sung Ho Lee.
 ^ "weak law of large numbers: proof using characteristic functions vs proof using truncation VARIABLES".
 ^ Mukherjee, Sayan. "Law of large numbers" (PDF). Archived from the original (PDF) on 20130309. Retrieved 20140628.
 ^ J. Geyer, Charles. "Law of large numbers" (PDF).
 ^ Newey & McFadden 1994, Lemma 2.4
 ^ Jennrich, Robert I. (1969). "Asymptotic Properties of NonLinear Least Squares Estimators". The Annals of Mathematical Statistics. 40 (2): 633–643. doi:10.1214/aoms/1177697731.
 ^ An Analytic Technique to Prove Borel's Strong Law of Large Numbers Wen, L. Am Math Month 1991
References
 Grimmett, G. R.; Stirzaker, D. R. (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford. ISBN 0198536658.
 Richard Durrett (1995). Probability: Theory and Examples, 2nd Edition. Duxbury Press.
 Martin Jacobsen (1992). Videregående Sandsynlighedsregning (Advanced Probability Theory) 3rd Edition. HCØtryk, Copenhagen. ISBN 8791180716.
 Loève, Michel (1977). Probability theory 1 (4th ed.). Springer Verlag.
 Newey, Whitney K.; McFadden, Daniel (1994). Large sample estimation and hypothesis testing. Handbook of econometrics, vol. IV, Ch. 36. Elsevier Science. pp. 2111–2245.
 Ross, Sheldon (2009). A first course in probability (8th ed.). Prentice Hall press. ISBN 9780136033134.
 Sen, P. K; Singer, J. M. (1993). Large sample methods in statistics. Chapman & Hall, Inc.
 Seneta, Eugene (2013), "A Tricentenary history of the Law of Large Numbers", Bernoulli, 19 (4): 1088–1121, arXiv:1309.6488, doi:10.3150/12BEJSP12
External links
 Hazewinkel, Michiel, ed. (2001) [1994], "Law of large numbers", Encyclopedia of Mathematics, Springer Science+Business Media B.V. / Kluwer Academic Publishers, ISBN 9781556080104
 Weisstein, Eric W. "Weak Law of Large Numbers". MathWorld.
 Weisstein, Eric W. "Strong Law of Large Numbers". MathWorld.
 Animations for the Law of Large Numbers by Yihui Xie using the R package animation
 Apple CEO Tim Cook said something that would make statisticians cringe. "We don't believe in such laws as laws of large numbers. This is sort of, uh, old dogma, I think, that was cooked up by somebody [..]" said Tim Cook and while: "However, the law of large numbers has nothing to do with large companies, large revenues, or large growth rates. The law of large numbers is a fundamental concept in probability theory and statistics, tying together theoretical probabilities that we can calculate to the actual outcomes of experiments that we empirically perform. explained Business Insider