To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time. 4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds Negative binomial distribution

Notation Different texts adopt slightly different definitions for the negative binomial distribution. They can be distinguished by whether the support starts at k = 0 or at k = r, whether p denotes the probability of a success or of a failure, and whether r represents success or failure, so it is crucial to identify the specific parametrization used in any given text. Probability mass function The orange line represents the mean, which is equal to 10 in each of these plots; the green line shows the standard deviation. $\mathrm {NB} (r,\,p)$ r > 0 — number of failures until the experiment is stopped (integer, but the definition can also be extended to reals)p ∈ (0,1) — success probability in each experiment (real) k ∈ { 0, 1, 2, 3, … } — number of successes $k\mapsto {k+r-1 \choose k}\cdot (1-p)^{r}p^{k},$ involving a binomial coefficient $k\mapsto 1-I_{p}(k+1,\,r),$ the regularized incomplete beta function ${\frac {pr}{1-p}}$ ${\begin{cases}{\big \lfloor }{\frac {p(r-1)}{1-p}}{\big \rfloor }&{\text{if}}\ r>1\\0&{\text{if}}\ r\leq 1\end{cases}}$ ${\frac {pr}{(1-p)^{2}}}$ ${\frac {1+p}{\sqrt {pr}}}$ ${\frac {6}{r}}+{\frac {(1-p)^{2}}{pr}}$ ${\biggl (}{\frac {1-p}{1-pe^{t}}}{\biggr )}^{\!r}{\text{ for }}t<-\log p$ ${\biggl (}{\frac {1-p}{1-pe^{i\,t}}}{\biggr )}^{\!r}{\text{ with }}t\in \mathbb {R}$ ${\biggl (}{\frac {1-p}{1-pz}}{\biggr )}^{\!r}{\text{ for }}|z|<{\frac {1}{p}}$ ${\frac {r}{(1-p)^{2}p}}$ In probability theory and statistics, the negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of failures (denoted r) occurs. For example, if we define a 1 as failure, all non-1s as successes, and we throw a die repeatedly until 1 appears the third time (r = three failures), then the probability distribution of the number of non-1s that appeared will be a negative binomial distribution.

The Pascal distribution (after Blaise Pascal) and Polya distribution (for George Pólya) are special cases of the negative binomial distribution. A convention among engineers, climatologists, and others is to use "negative binomial" or "Pascal" for the case of an integer-valued stopping-time parameter r, and use "Polya" for the real-valued case.

For occurrences of "contagious" discrete events, like tornado outbreaks, the Polya distributions can be used to give more accurate models than the Poisson distribution by allowing the mean and variance to be different, unlike the Poisson. "Contagious" events have positively correlated occurrences causing a larger variance than if the occurrences were independent, due to a positive covariance term.

• 1/5
Views:
185 756
112 163
22 802
32 017
7 447
• ✪ Introduction to the Negative Binomial Distribution
• ✪ Overview of Some Discrete Probability Distributions (Binomial,Geometric,Hypergeometric,Poisson,NegB)
• ✪ Negative Binomial Distribution
• ✪ Lesson 18: Negative Binomial Distribution - Part 1
• ✪ Negative Binomial Distribution and Geometric Distribution

Transcription

Let's take a look at the negative binomial distribution, another important discrete probability distribution. Let's take a look at a simple example to start. A coin is tossed repeatedly until heads comes up for the sixth time. What is the probability this happens on the 15th toss? Well we could calculate this probability using the negative binomial distribution. Before we look at calculating probabilities using the negative binomial distribution, let's look at how it relates to a couple of other important discrete probability distributions. The geometric distribution is the distribution of the number of trials needed to get the first success in repeated independent Bernoulli trials. Well the negative binomial distribution generalizes this. The negative binomial distribution is the distribution of the number of trials needed to get the rth success in repeated independent Bernoulli trials. So if we're interested in the number of trials to get the second success then r would be equal to 2, and if we're interested in the number of trials to get the 12 success then r would be equal to 12, and that r is simply going to depend on the problem at hand. A random variable that has a negative binomial distribution can sometimes be confused for one that has a binomial distribution. And that confusion can cause some problems, so let's look at the differences here. The binomial distribution is the distribution of the number of successes, so the number of successes is our random variable X, in a fixed number of independent Bernoulli trials. And that fixed number of trials we typically call n. But in the negative binomial distribution, the number of successes is the fixed number, and we're going to call that r, and the number of trials needed to get that number of successes is the random variable X. The negative binomial distribution can be defined a little bit differently. For instance, sometimes it's described as the distribution of the number of failures needed to get that fixed number of successes But we're going to use the definition here. So what we're interested in is the probability distribution of the number of trials needed to get this fixed number of successes in repeated independent Bernoulli trials. Let's break down that notion of independent Bernoulli trials a little bit further. Suppose we have independent trials and each trial results in one of two possible mutually exclusive outcomes. (and we're going to label those success and failure). The probability of success on any given trial is little p and this stays constant from trial to trial. The probability of failure is simply 1-p, and capital X is a random variable representing the trial number of the rth success. In order for the rth success to occur on the xth trial, we need a couple of events to occur. First of all, in the first x-1 trials we're going to need to have r-1 successes. And we can calculate the probability of that using the binomial formula. So this is just the binomial formula here. But we also need the xth trial to be a success, and that has probability p. and to calculate the probability the rth success occurs on the xth trial, we're simply going to multiply these two probabilities together, because the trials are assumed to be independent. So then the probability a random variable X takes on the value little x is the product of those two probabilities we just looked at. and then we have our probability mass function for the negative binomial distribution. We need to list out what values X can take on. So this is for x being equal to r, which is the smallest possible value of x (if we need r successes we're going to need at least r trials) and then r+1 and so on off to infinity. There's not going to be an upper bound on that. And it can be shown that the mean of this probability distribution is simply r/p and the variance of this probability distribution is r(1-p)/p^2. Let's look at an example here. A person conducting telephone surveys must get 3 more completed surveys before their job is finished. On each randomly dialed number there's a 9% chance of reaching an adult who will complete the survey. And this is close to reality for some types of surveys. What is the probability the 3rd completed survey occurs on the 10th call? Here we're dialing random numbers, and knowing the outcome of one randomly dialed call tells us nothing about the outcome of another randomly dialed call and so these trials are independent. On any individual call, we're either going to get the survey completed or we're not and so we've got the success and failure aspect on any individual trial. And we are interested in the probability of getting the 3rd success on the 10th trial. So what we want to know is the probability that the random variable X, representing the trial number of the third success, takes on the value 10. And the conditions of a negative binomial distribution are satisfied here. We have this 9% chance of completing the survey on any one individual call, so then little p is equal to 0.09. And we're interested in the trial number of the third success, and so r is 3. Here's our formula for the negative binomial distribution. And we have an r of 3 and a p of 0.09. And we're interested in the probability that a random variable X takes on the value 10. Well this is going to be equal to 10-1 choose 3-1 times 0.09 raised to the third power times 1-0.09 raised to the 10-3. And this, to 5 decimal places, 0.01356. Here I've plotted out the probability distribution of the number of calls required to get the 3rd success. And that probability that we just calculated for 10 calls is right about there. Note that the smallest value here is 3. If we need 3 successes, well we're going to need at least three trials. Over here on this side I've truncated the plot at 120. The values the random variable can take on goes off to infinity here, but the probabilities start to get pretty small so I just chopped off there for visual purposes. The mean number of calls required to get that 3rd success is going to be r/p, which is 3/ 0.09 and that works to 33 and a third. So on average we're going to have to make about 33 calls before we can go home for the day or move on to something else.

Definition

Suppose there is a sequence of independent Bernoulli trials. Thus, each trial has two potential outcomes called "success" and "failure". In each trial the probability of success is p and of failure is (1 − p). We are observing this sequence until a predefined number r of failures has occurred. Then the random number of successes we have seen, X, will have the negative binomial (or Pascal) distribution:

$X\sim \operatorname {NB} (r,p)$ When applied to real-world problems, outcomes of success and failure may or may not be outcomes we ordinarily view as good and bad, respectively. Suppose we used the negative binomial distribution to model the number of days a certain machine works before it breaks down. In this case "success" would be the result on a day when the machine worked properly, whereas a breakdown would be a "failure". If we used the negative binomial distribution to model the number of goal attempts an athlete makes before scoring r goals, though, then each unsuccessful attempt would be a "success", and scoring a goal would be "failure". If we are tossing a coin, then the negative binomial distribution can give the number of heads ("success") we are likely to encounter before we encounter a certain number of tails ("failure"). In the probability mass function below, p is the probability of success, and (1 − p) is the probability of failure.

The Excel function negbinomdist reverses the terms "success" and "failure".

Probability mass function

The probability mass function of the negative binomial distribution is

$f(k;r,p)\equiv \Pr(X=k)={\binom {k+r-1}{k}}p^{k}(1-p)^{r}$ where r is the number of failures, k is the number of successes, and p is the probability of success. Here the quantity in parentheses is the binomial coefficient, and is equal to

${\binom {k+r-1}{k}}={\frac {(k+r-1)!}{(r-1)!\,(k)!}}={\frac {(k+r-1)(k+r-2)\dotsm (r)}{(k)!}}.$ This quantity can alternatively be written in the following manner, explaining the name "negative binomial":

{\begin{aligned}&{\frac {(k+r-1)\dotsm (r)}{(k)!}}\\[6pt]={}&(-1)^{k}{\frac {(-r)(-r-1)(-r-2)\dotsm (-r-k+1)}{(k)!}}=(-1)^{k}{\binom {-r}{k}}.\end{aligned}} Note that by the last expression and the binomial series, for every 0 ≤ p < 1,

$(1-p)^{-r}=\sum _{k=0}^{\infty }{\binom {-r}{k}}(-p)^{k}=\sum _{k=0}^{\infty }{\binom {k+r-1}{k}}p^{k},$ hence the terms of the probability mass function indeed add up to one.

To understand the above definition of the probability mass function, note that the probability for every specific sequence of k successes and r failures is (1 − p)rpk, because the outcomes of the k + r trials are supposed to happen independently. Since the rth failure always comes last, it remains to choose the k trials with successes out of the remaining k + r − 1 trials. The above binomial coefficient, due to its combinatorial interpretation, gives precisely the number of all these sequences of length k + r − 1.

Expectation

The expected total number of trials of a negative binomial distribution with parameters (r, p) is rp/(1 − p). To see this intuitively, imagine an experiment simulating the negative binomial is performed many times. That is, a set of trials is performed until r failures are obtained, then another set of trials, and then another etc. Write down the number of trials performed in each experiment: a, b, c, … and set a + b + c + … = N. Now we would expect about N(1 − p) failures in total. Say the experiment was performed n times. Then there are nr failures in total. So we would expect nr = N(1 − p), so N/nr/(1 − p). See that N/n is just the average number of trials per experiment. That is what we mean by "expectation". The average number of successes per experiment is N/n − r, which must have expected value equal to (r*(1 − p)) / p. This agrees with the mean given in the box on the right-hand side of this page.

Variance

When counting the number of successes given the number r of failures, the variance is rp/(1 − p)2. When counting the number of failures before the r-th success, the variance is r(1-p)/p2.

Alternative formulations

Some sources may define the negative binomial distribution slightly differently from the primary one here. The most common variations are where the random variable X is counting different things. These variations can be seen in the table here:

 X is counting... Probability mass function Formula Alternate formula (using equivalent binomial) Alternate formula (simplified using: ${\textstyle n=k+r}$ ) Support 1 k successes, given r failures ${\textstyle f(k;r,p)\equiv \Pr(X=k)=}$ ${\textstyle {\binom {k+r-1}{k}}p^{k}(1-p)^{r}}$ ${\textstyle {\binom {k+r-1}{r-1}}p^{k}(1-p)^{r}}$ ${\textstyle {\binom {n-1}{k}}p^{k}(1-p)^{r}}$ ${\text{for }}k=0,1,2,\dotsc$ 2 n trials, given r failures ${\textstyle f(n;r,p)\equiv \Pr(X=n)=}$ ${\textstyle {\binom {n-1}{r-1}}p^{n-r}(1-p)^{r}}$ ${\textstyle {\binom {n-1}{n-r}}p^{n-r}(1-p)^{r}}$ ${\text{for }}n=r,r+1,r+2,\dotsc$ 3 r failures, given k successes ${\textstyle f(r;k,p)\equiv \Pr(X=r)=}$ ${\textstyle {\binom {k+r-1}{r}}p^{k}(1-p)^{r}}$ ${\textstyle {\binom {k+r-1}{k-1}}p^{k}(1-p)^{r}}$ ${\textstyle {\binom {n-1}{r}}p^{k}(1-p)^{r}}$ ${\text{for }}r=0,1,2,\dotsc$ 4 n trials, given k successes ${\textstyle f(n;k,p)\equiv \Pr(X=n)=}$ ${\textstyle {\binom {n-1}{k-1}}p^{k}(1-p)^{n-k}}$ ${\textstyle {\binom {n-1}{n-k}}p^{k}(1-p)^{n-k}}$ ${\text{for }}n=k,k+1,k+2,\dotsc$ 5 k successes, given n trials ${\textstyle f(k;n,p)\equiv \Pr(X=k)=}$ This is the binomial distribution: ${\textstyle {\binom {n}{k}}p^{k}(1-p)^{r}}$ ${\text{for }}k=0,1,2,\dotsc ,n$ Each of these definitions of the negative binomial distribution can be expressed in slightly different but equivalent ways. The first alternative formulation is simply an equivalent form of the binomial coefficient, that is: ${\textstyle {\binom {a}{b}}={\binom {a}{a-b}}\quad {\text{for }}\ 0\leq b\leq a}$ . The second alternate formulation somewhat simplifies the expression by recognizing that the total number of trials is simply the number of successes and failures, that is: ${\textstyle n=k+r}$ . These second formulations may be more intuitive to understand, however they are perhaps less practical as they have more terms.

1. This definition is where X is the number k of successes given a set of r failures, and is the primary way the negative binomial distribution is defined in this article. The second alternative formula clearly shows the relationship of the negative binomial distribution to the binomial distribution. The only difference is that in the binomial coefficient of the negative binomial distribution, there are n − 1 trials to choose from (instead of n) when evaluating the number of ways that k successes can occur. This is because when you are evaluating the number of ways you can obtain k successes before you reach r failures, the last trial must be a failure. As such, the other events have one fewer positions available when counting possible orderings.
2. The second definition is where X is the total number of n trials needed to get r failures. Since the total number of trials is equal to the number of successes plus the number of failures, the formulation is the same. The only difference in the distribution is the range is shifted by a factor of r. As such, the mean, the median, and the mode are also shifted by a factor of r.
3. The definition where X is the number of r failures that occur for a given number of k successes. This definition is very similar to the primary definition used in this article, only that k successes and r failures are switched when considering what is being counted and what is given. Note however, that p still refers to the probability of "success".
4. The definition where X is the number of n trials that occur for a given number of k successes. This definition is very similar to definition #2, only that k successes is given instead of r failures. Note however, that p still refers to the probability of "success".
• The definition of the negative binomial distribution can be extended to the case where the parameter r can take on a positive real value. Although it is impossible to visualize a non-integer number of "failures", we can still formally define the distribution through its probability mass function. The problem of extending the definition to real-valued (positive) r boils down to extending the binomial coefficient to its real-valued counterpart, based on the gamma function:
${\binom {k+r-1}{k}}={\frac {(k+r-1)(k+r-2)\dotsm (r)}{k!}}={\frac {\Gamma (k+r)}{k!\,\Gamma (r)}}$ Now, after substituting this expression in the original definition, we say that X has a negative binomial (or Pólya) distribution if it has a probability mass function:
$f(k;r,p)\equiv \Pr(X=k)={\frac {\Gamma (k+r)}{k!\,\Gamma (r)}}p^{r}(1-p)^{k}\quad {\text{for }}k=0,1,2,\dotsc$ Here r is a real, positive number.

In negative binomial regression, the distribution is specified in terms of its mean, ${\textstyle m={\frac {(1-p)r}{p}}}$ , which is then related to explanatory variables as in linear regression or other generalized linear models. From the expression for the mean m, one can derive ${\textstyle p={\frac {r}{m+r}}}$ and ${\textstyle 1-p={\frac {m}{m+r}}}$ . Then, substituting these expressions in the one for the probability mass function when r is real-valued, yields this parametrization of the probability mass function in terms of m:

$\Pr(X=k)={\frac {\Gamma (r+k)}{k!\,\Gamma (r)}}\left({\frac {r}{r+m}}\right)^{r}\left({\frac {m}{r+m}}\right)^{k}\quad {\text{for }}k=0,1,2,\dotsc$ The variance can then be written as ${\textstyle m+{\frac {m^{2}}{r}}}$ . Some authors prefer to set ${\textstyle \alpha ={\frac {1}{r}}}$ , and express the variance as ${\textstyle m+\alpha m^{2}}$ . In this context, and depending on the author, either the parameter r or its reciprocal α is referred to as the "dispersion parameter", "shape parameter" or "clustering coefficient", or the "heterogeneity" or "aggregation" parameter. The term "aggregation" is particularly used in ecology when describing counts of individual organisms. Decrease of the aggregation parameter r towards zero corresponds to increasing aggregation of the organisms; increase of r towards infinity corresponds to absence of aggregation, as can be described by Poisson regression.

• Sometimes the distribution is parameterized in terms of its mean μ and variance σ2:
{\begin{aligned}&p={\frac {\sigma ^{2}-\mu }{\sigma ^{2}}},\\[6pt]&r={\frac {\mu ^{2}}{\sigma ^{2}-\mu }},\\[3pt]&\Pr(X=k)={k+{\frac {\mu ^{2}}{\sigma ^{2}-\mu }}-1 \choose k}\left({\frac {\sigma ^{2}-\mu }{\sigma ^{2}}}\right)^{k}\left({\frac {\mu }{\sigma ^{2}}}\right)^{\mu ^{2}/(\sigma ^{2}-\mu )}.\end{aligned}} Occurrence

Waiting time in a Bernoulli process

For the special case where r is an integer, the negative binomial distribution is known as the Pascal distribution. It is the probability distribution of a certain number of failures and successes in a series of independent and identically distributed Bernoulli trials. For k + r Bernoulli trials with success probability p, the negative binomial gives the probability of k successes and r failures, with a failure on the last trial. In other words, the negative binomial distribution is the probability distribution of the number of successes before the rth failure in a Bernoulli process, with probability p of successes on each trial. A Bernoulli process is a discrete time process, and so the number of trials, failures, and successes are integers.

Consider the following example. Suppose we repeatedly throw a die, and consider a 1 to be a "failure". The probability of success on each trial is 5/6. The number of successes before the third failure belongs to the infinite set { 0, 1, 2, 3, ... }. That number of successes is a negative-binomially distributed random variable.

When r = 1 we get the probability distribution of number of successes before the first failure (i.e. the probability of the first failure occurring on the (k + 1)st trial), which is a geometric distribution:

$f(k;r,p)=(1-p)\cdot p^{k}\!$ Overdispersed Poisson

The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean. See Cumulants of some discrete probability distributions.

An application of this is to annual counts of tropical cyclones in the North Atlantic or to monthly to 6-monthly counts of wintertime extratropical cyclones over Europe, for which the variance is greater than the mean. In the case of modest overdispersion, this may produce substantially similar results to an overdispersed Poisson distribution.

The negative binomial distribution is also commonly used to model data in the form of discrete sequence read counts from high-throughput RNA and DNA sequencing experiments.

Related distributions

• The geometric distribution (on { 0, 1, 2, 3, ... }) is a special case of the negative binomial distribution, with
$\operatorname {Geom} (p)=\operatorname {NB} (1,\,1-p).\,$ Poisson distribution

Consider a sequence of negative binomial random variables where the stopping parameter r goes to infinity, whereas the probability of success in each trial, p, goes to zero in such a way as to keep the mean of the distribution constant. Denoting this mean as λ, the parameter p will be p = λ/(r + λ)

$\lambda =r\,{\frac {p}{1-p}}\quad \Rightarrow \quad p={\frac {\lambda }{r+\lambda }}.$ Under this parametrization the probability mass function will be

$f(k;r,p)={\frac {\Gamma (k+r)}{k!\cdot \Gamma (r)}}p^{k}(1-p)^{r}={\frac {\lambda ^{k}}{k!}}\cdot {\frac {\Gamma (r+k)}{\Gamma (r)\;(r+\lambda )^{k}}}\cdot {\frac {1}{\left(1+{\frac {\lambda }{r}}\right)^{r}}}$ Now if we consider the limit as r → ∞, the second factor will converge to one, and the third to the exponent function:

$\lim _{r\to \infty }f(k;r,p)={\frac {\lambda ^{k}}{k!}}\cdot 1\cdot {\frac {1}{e^{\lambda }}},$ which is the mass function of a Poisson-distributed random variable with expected value λ.

In other words, the alternatively parameterized negative binomial distribution converges to the Poisson distribution and r controls the deviation from the Poisson. This makes the negative binomial distribution suitable as a robust alternative to the Poisson, which approaches the Poisson for large r, but which has larger variance than the Poisson for small r.

$\operatorname {Poisson} (\lambda )=\lim _{r\to \infty }\operatorname {NB} \left(r,{\frac {\lambda }{r+\lambda }}\right).$ Gamma–Poisson mixture

The negative binomial distribution also arises as a continuous mixture of Poisson distributions (i.e. a compound probability distribution) where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed as a gamma distribution with shape = r and scale θ = p/(1 − p) or correspondingly rate β = (1 − p)/p.

To display the intuition behind this statement, consider two independent Poisson processes, “Success” and “Failure”, with intensities p and 1 − p. Together, the Success and Failure processes are equivalent to a single Poisson process of intensity 1, where an occurrence of the process is a success if a corresponding independent coin toss comes up heads with probability p; otherwise, it is a failure. If r is a counting number, the coin tosses show that the count of successes before the rth failure follows a negative binomial distribution with parameters r and p. The count is also, however, the count of the Success Poisson process at the random time T of the rth occurrence in the Failure Poisson process. The Success count follows a Poisson distribution with mean pT, where T is the waiting time for r occurrences in a Poisson process of intensity 1 − p, i.e., T is gamma-distributed with shape parameter r and intensity 1 − p. Thus, the negative binomial distribution is equivalent to a Poisson distribution with mean pT, where the random variate T is gamma-distributed with shape parameter r and intensity 1 − p. The preceding paragraph follows, because λ = pT is gamma-distributed with shape parameter r and intensity (1 − p)/p.

The following formal derivation (which does not depend on r being a counting number) confirms the intuition.

{\begin{aligned}f(k;r,p)&=\int _{0}^{\infty }f_{\operatorname {Poisson} (\lambda )}(k)\cdot f_{\operatorname {Gamma} \left(r,\,{\frac {1-p}{p}}\right)}(\lambda )\;\mathrm {d} \lambda \\[8pt]&=\int _{0}^{\infty }{\frac {\lambda ^{k}}{k!}}e^{-\lambda }\cdot \lambda ^{r-1}{\frac {e^{-\lambda (1-p)/p}}{{\big (}{\frac {p}{1-p}}{\big )}^{r}\,\Gamma (r)}}\;\mathrm {d} \lambda \\[8pt]&={\frac {(1-p)^{r}p^{-r}}{k!\,\Gamma (r)}}\int _{0}^{\infty }\lambda ^{r+k-1}e^{-\lambda /p}\;\mathrm {d} \lambda \\[8pt]&={\frac {(1-p)^{r}p^{-r}}{k!\,\Gamma (r)}}\ p^{r+k}\,\Gamma (r+k)\\[8pt]&={\frac {\Gamma (r+k)}{k!\;\Gamma (r)}}\;p^{k}(1-p)^{r}.\end{aligned}} Because of this, the negative binomial distribution is also known as the gamma–Poisson (mixture) distribution.

Note: The negative binomial distribution was originally derived as a limiting case of the gamma-Poisson distribution.

Distribution of a sum of geometrically distributed random variables

If Yr is a random variable following the negative binomial distribution with parameters r and p, and support {0, 1, 2, ...}, then Yr is a sum of r independent variables following the geometric distribution (on {0, 1, 2, ...}) with parameter 1 − p. As a result of the central limit theorem, Yr (properly scaled and shifted) is therefore approximately normal for sufficiently large r.

Furthermore, if Bs+r is a random variable following the binomial distribution with parameters s + r and 1 − p, then

{\begin{aligned}\Pr(Y_{r}\leq s)&{}=1-I_{p}(s+1,r)\\[5pt]&{}=1-I_{p}((s+r)-(r-1),(r-1)+1)\\[5pt]&{}=1-\Pr(B_{s+r}\leq r-1)\\[5pt]&{}=\Pr(B_{s+r}\geq r)\\[5pt]&{}=\Pr({\text{after }}s+r{\text{ trials, there are at least }}r{\text{ successes}}).\end{aligned}} In this sense, the negative binomial distribution is the "inverse" of the binomial distribution.

The sum of independent negative-binomially distributed random variables r1 and r2 with the same value for parameter p is negative-binomially distributed with the same p but with r-value r1 + r2.

The negative binomial distribution is infinitely divisible, i.e., if Y has a negative binomial distribution, then for any positive integer n, there exist independent identically distributed random variables Y1, ..., Yn whose sum has the same distribution that Y has.

Representation as compound Poisson distribution

The negative binomial distribution NB(r,p) can be represented as a compound Poisson distribution: Let {Yn, n ∈ ℕ0} denote a sequence of independent and identically distributed random variables, each one having the logarithmic distribution Log(p), with probability mass function

$f(k;r,p)={\frac {-p^{k}}{k\ln(1-p)}},\qquad k\in {\mathbb {N} }.$ Let N be a random variable, independent of the sequence, and suppose that N has a Poisson distribution with mean λ = −r ln(1 − p). Then the random sum

$X=\sum _{n=1}^{N}Y_{n}$ is NB(r,p)-distributed. To prove this, we calculate the probability generating function GX of X, which is the composition of the probability generating functions GN and GY1. Using

$G_{N}(z)=\exp(\lambda (z-1)),\qquad z\in \mathbb {R} ,$ and

$G_{Y_{1}}(z)={\frac {\ln(1-pz)}{\ln(1-p)}},\qquad |z|<{\frac {1}{p}},$ we obtain

{\begin{aligned}G_{X}(z)&=G_{N}(G_{Y_{1}}(z))\\[4pt]&=\exp {\biggl (}\lambda {\biggl (}{\frac {\ln(1-pz)}{\ln(1-p)}}-1{\biggr )}{\biggr )}\\[4pt]&=\exp {\bigl (}-r(\ln(1-pz)-\ln(1-p)){\bigr )}\\[4pt]&={\biggl (}{\frac {1-p}{1-pz}}{\biggr )}^{r},\qquad |z|<{\frac {1}{p}},\end{aligned}} which is the probability generating function of the NB(r,p) distribution.

The following table describes four distributions related to the number of successes in a sequence of draws:

With replacements No replacements
Given number of draws binomial distribution hypergeometric distribution
Given number of failures negative binomial distribution negative hypergeometric distribution

Properties

Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function:

$F(k;r,p)\equiv \Pr(X\leq k)=1-I_{p}(k+1,r)=I_{1-p}(r,k+1).$ Sampling and point estimation of p

Suppose p is unknown and an experiment is conducted where it is decided ahead of time that sampling will continue until r successes are found. A sufficient statistic for the experiment is k, the number of failures.

In estimating p, the minimum variance unbiased estimator is

${\widehat {p}}={\frac {r-1}{r+k-1}}.$ The maximum likelihood estimate of p is

${\widetilde {p}}={\frac {r}{r+k}},$ but this is a biased estimate. Its inverse (r + k)/r, is an unbiased estimate of 1/p, however.

Relation to the binomial theorem

Suppose Y is a random variable with a binomial distribution with parameters n and p. Assume p + q = 1, with pq ≥ 0. Then the binomial theorem implies that

$1=1^{n}=(p+q)^{n}=\sum _{k=0}^{n}{n \choose k}p^{k}q^{n-k}.$ Using Newton's binomial theorem, this can equally be written as:

$(p+q)^{n}=\sum _{k=0}^{\infty }{n \choose k}p^{k}q^{n-k},$ in which the upper bound of summation is infinite. In this case, the binomial coefficient

${n \choose k}={n(n-1)(n-2)\cdots (n-k+1) \over k!}.$ is defined when n is a real number, instead of just a positive integer. But in our case of the binomial distribution it is zero when k > n. We can then say, for example

$(p+q)^{8.3}=\sum _{k=0}^{\infty }{8.3 \choose k}p^{k}q^{8.3-k}.$ Now suppose r > 0 and we use a negative exponent:

$1=p^{r}\cdot p^{-r}=p^{r}(1-q)^{-r}=p^{r}\sum _{k=0}^{\infty }{-r \choose k}(-q)^{k}.$ Then all of the terms are positive, and the term

$p^{r}{-r \choose k}(-q)^{k}$ is just the probability that the number of failures before the rth success is equal to k, provided r is an integer. (If r is a negative non-integer, so that the exponent is a positive non-integer, then some of the terms in the sum above are negative, so we do not have a probability distribution on the set of all nonnegative integers.)

Now we also allow non-integer values of r. Then we have a proper negative binomial distribution, which is a generalization of the Pascal distribution, which coincides with the Pascal distribution when r happens to be a positive integer.

Recall from above that

The sum of independent negative-binomially distributed random variables r1 and r2 with the same value for parameter p is negative-binomially distributed with the same p but with r-value r1 + r2.

This property persists when the definition is thus generalized, and affords a quick way to see that the negative binomial distribution is infinitely divisible.

Recurrence relation

The following recurrence relation holds:

${\begin{cases}(k+1)\Pr(k+1)-p\Pr(k)(k+r)=0,\\[5pt]\Pr(0)=(1-p)^{r}\end{cases}}$ Parameter estimation

Maximum likelihood estimation

The maximum likelihood estimator only exists for samples for which the sample variance is larger than the sample mean. The likelihood function for N iid observations (k1, ..., kN) is

$L(r,p)=\prod _{i=1}^{N}f(k_{i};r,p)\,\!$ from which we calculate the log-likelihood function

$\ell (r,p)=\sum _{i=1}^{N}\ln {(\Gamma (k_{i}+r))}-\sum _{i=1}^{N}\ln(k_{i}!)-N\ln {(\Gamma (r))}+\sum _{i=1}^{N}k_{i}\ln {(p)}+Nr\ln(1-p).$ To find the maximum we take the partial derivatives with respect to r and p and set them equal to zero:

${\frac {\partial \ell (r,p)}{\partial p}}=\left[\sum _{i=1}^{N}k_{i}{\frac {1}{p}}\right]-Nr{\frac {1}{1-p}}=0$ and
${\frac {\partial \ell (r,p)}{\partial r}}=\left[\sum _{i=1}^{N}\psi (k_{i}+r)\right]-N\psi (r)+N\ln {(1-p)}=0$ where

$\psi (k)={\frac {\Gamma '(k)}{\Gamma (k)}}\!$ is the digamma function.

Solving the first equation for p gives:

$p={\frac {\sum _{i=1}^{N}k_{i}}{Nr+\sum _{i=1}^{N}k_{i}}}$ Substituting this in the second equation gives:

${\frac {\partial \ell (r,p)}{\partial r}}=\left[\sum _{i=1}^{N}\psi (k_{i}+r)\right]-N\psi (r)+N\ln {\left({\frac {r}{r+\sum _{i=1}^{N}k_{i}/N}}\right)}=0$ This equation cannot be solved for r in closed form. If a numerical solution is desired, an iterative technique such as Newton's method can be used. Alternatively, the expectation–maximization algorithm can be used.

Examples

Selling candy

Pat Collis is required to sell candy bars to raise money for the 6th grade field trip. There are thirty houses in the neighborhood, and Pat is not supposed to return home until five candy bars have been sold. So the child goes door to door, selling candy bars. At each house, there is a 0.6 probability of selling one candy bar and a 0.4 probability of selling nothing.

What's the probability of selling the last candy bar at the nth house?

Successfully selling candy enough times is what defines our stopping criterion (as opposed to failing to sell it), so k in this case represents the number of failures and r represents the number of successes. Recall that the NegBin(r, p) distribution describes the probability of k failures and r successes in k + r Bernoulli(p) trials with success on the last trial. Selling five candy bars means getting five successes. The number of trials (i.e. houses) this takes is therefore k + 5 = n. The random variable we are interested in is the number of houses, so we substitute k = n − 5 into a NegBin(5, 0.4) mass function and obtain the following mass function of the distribution of houses (for n ≥ 5):

$f(n)={(n-5)+5-1 \choose n-5}\;(1-0.4)^{5}\;0.4^{n-5}={n-1 \choose n-5}\;2^{5}\;{\frac {3^{n-5}}{5^{n}}}.$ What's the probability that Pat finishes on the tenth house?

$f(10)=0.1003290624.\,$ What's the probability that Pat finishes on or before reaching the eighth house?

To finish on or before the eighth house, Pat must finish at the fifth, sixth, seventh, or eighth house. Sum those probabilities:

$f(5)=0.01024\,$ $f(6)=0.03072\,$ $f(7)=0.055296\,$ $f(8)=0.0774144\,$ $\sum _{j=5}^{8}f(j)=0.17367.$ What's the probability that Pat exhausts all 30 houses in the neighborhood?

This can be expressed as the probability that Pat does not finish on the fifth through the thirtieth house:

$1-\sum _{j=5}^{30}f(j)=1-I_{0.4}(5,30-5+1)\approx 1-0.99849=0.00151.$ Length of hospital stay

Hospital length of stay is an example of real-world data that can be modelled well with a negative binomial distribution.