Negative multinomial distribution

Notation	${\textrm {NM}}(x_{0},\,\mathbf {p} )$
Parameters	$x_{0}>0$ — the number of failures before the experiment is stopped, $\mathbf {p}$ ∈ R^m — m-vector of "success" probabilities, p₀ = 1 − (p₁+…+p_m) — the probability of a "failure".
Support	$x_{i}\in \{0,1,2,\ldots \},1\leq i\leq m$
PMF	$\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {p_{0}^{x_{0}}}{\Gamma (x_{0})}}\prod _{i=1}^{m}{\frac {p_{i}^{x_{i}}}{x_{i}!}},$ where Γ(x) is the Gamma function.
Mean	${\tfrac {x_{0}}{p_{0}}}\,\mathbf {p}$
Variance	${\tfrac {x_{0}}{p_{0}^{2}}}\,\mathbf {pp} '+{\tfrac {x_{0}}{p_{0}}}\,\operatorname {diag} (\mathbf {p} )$
MGF	${\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{t_{j}}}}{\bigg )}^{\!x_{0}}$
CF	${\bigg (}{\frac {p_{0}}{1-\sum _{j=1}^{m}p_{j}e^{it_{j}}}}{\bigg )}^{\!x_{0}}$

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(x₀, p)) to more than two outcomes.^[1]

As with the univariate negative binomial distribution, if the parameter $x_{0}$ is a positive integer, the negative multinomial distribution has an urn model interpretation. Suppose we have an experiment that generates m+1≥2 possible outcomes, {X₀,...,X_m}, each occurring with non-negative probabilities {p₀,...,p_m} respectively. If sampling proceeded until n observations were made, then {X₀,...,X_m} would have been multinomially distributed. However, if the experiment is stopped once X₀ reaches the predetermined value x₀ (assuming x₀ is a positive integer), then the distribution of the m-tuple {X₁,...,X_m} is negative multinomial. These variables are not multinomially distributed because their sum X₁+...+X_m is not fixed, being a draw from a negative binomial distribution.

YouTube Encyclopedic

1/5
Views:
51 350
13 333
1 786
4 659
120 169

Transcription

Let's continue our look at some discrete probability distributions with an introduction to the multinomial distribution. The multinomial distribution is a generalization of the binomial distribution. In the binomial distribution there are only two possible outcomes on any one individual trial, and we labeled those success and failure. In the multinomial distribution, the number of possible outcomes on any one given trial is allowed to be greater than 2. Let's take a look at an example. This is approximately the distribution of blood types in the United States. And suppose we wanted to know the answer to this question: In a random sample of 10 Americans what is the probability 6 have blood type O, 2 have type A, 1 has type B, and 1 has type AB? When any one individual person is sampled, they're going to have one of these four blood types, according to these probabilities. And we're going to be able to answer this question using the multinomial distribution. Suppose we have n independent trials, and each trial results in 1 of k mutually exclusive outcomes, and these k outcomes are exhaustive, so one of them is going to occur. On any single trial these k outcomes comes occur with probabilities p_1 through p_k. And since these outcomes are mutually exclusive and exhaustive, then they must sum to 1. We also need these probabilities to stay constant from trial to trial. We're going to let the random variable X_i represent the number of occurrences of outcome of i. And i is going to take on the values 1 through k, representing those k possible outcomes on any one individual trial. So we're going to have k random variables, representing a count for each of those possible outcomes. Then the probability the random variable X_1 takes on the value little x_1 and all the way up through the random variable X_k taking on the value little x_k is equal to what we have here. Over here on this side we have p_1, the probability of outcome 1 on any one individual trial, raised to the number of times that outcome 1 happens, and all the way up to here, which is the probability of outcome k occurring on any one individual trial, raised to the number of times we need outcome k to occur. And so what we have here is the probability of any one specific ordering of x_1 occurrences of outcome 1 and x_2 occurrences of outcome 2, all the way up through x_k occurrences of outcome k, and what we have over here is the number of possible orderings that give us x_1 occurrences of outcome 1, all the way up through x_k occurrences of outcome k. And so these multiplied together give us the probability of this happening. We really should list out the possible values of X here. The random variable X_1 can take on the possible values 0,1, 2, all the way up through n. And the same is true of X_2 and all the way up through X_k. So this is true for x_i equalling 0,1, all the way up though n. But we know that n things must happen in total, so the sum of all those individual occurrences, or i equalling 1 through k must equal n. And if we think about this a little bit, any one of these random variables, when viewed individually, it will have a binomial distribution. And if you remember our mean and variance for the binomial distribution, we can say that the expectation of X_i is going to be equal to n times p_i. And the variance of the random variable X_i is going to be equal to np_i times (1-p_i). And you might remember that from our discussion of the mean and variance the binomial distribution. Let's return to our example. In a random sample of 10 Americans, what is the probability 6 have blood type O, 2 have type A, 1 has type B, and 1 has type AB? Well, we've got a random sample here, so knowing one person's blood type is going to tell us nothing about the next person's blood type. So that independence assumption is pretty reasonable here. On any one individual trial, we are going to get one of these 4 blood types. and these probabilities are staying constant from trial to trial. So the multinomial distribution is reasonable here. And we want to find out the probability that the random variable X_1, which is representing the number of people in our sample that have blood type O, the probability that takes on the value 6, and our random variable X_2, representing the number with type A, that takes on the value 2, and X_3 takes on the value 1, and X_4 takes on the value 1. And this is going to be equal to n factorial, we've got a sample size 10, so 10! over x_1 factorial, that's just the number with type O and that's 6 factorial, times 2 factorial times, 1 factorial, times 1 factorial, and now it's time for these probabilities. the probability blood type O happens on any one individual person is 0.44. And we need that to happen 6 times, so we're going to raise that to the sixth power. And then we're going to multiply that by the probability of blood type A, 0.42, squared, because we needed that to happen twice, And then we are multiplying that by 0.10, blood type B, raised to the first power, we need that happen once, and then multiplying that by 0.04 raised to the first power. And if we calculate that, we would see that that is 0.01290, when rounded to five decimal places. Let's look at another example here. An urn contains eight red balls, three yellow balls, and nine white balls. Six balls are randomly selected with replacement. What is the probability 2 are red, 1 is yellow, and 3 are white? This "with replacement' is an important notion here. If we are putting the ball back in and shaking it all up and then randomly selecting again, then the individual trials are independent, and the probability of getting a red ball, or a yellow ball, or white ball, those probabilities are staying constant through the different trials. And so the conditions for our multinomial distribution are satisfied here. And we're interested here in the probability that the random variable X_1, which I'm going to let represent the number red balls, the probability the random variable X_1 takes on the value 2, And X_2, the number of yellow balls, takes on the value 1, and X_3, the number of white balls, takes on the value 3. And this is going to be equal to n factorial, so six factorial, over 2 factorial, the number red balls, times 1 factorial, the number of yellow balls, times 3 factorial, the number of white balls. And now to the probabilities. Well we have 8 red balls, and there is 8+3+9, 20 balls in total. So the probability of getting 1 red on any one individual trial is going to be the eight red balls out of the 20 total, and we need that to happen twice, so we're gonna square that. And then we're going to multiply that by the probability of getting a yellow ball on any one individual trial, which is 3 out of 20, raised to the first power, because we need that to happen once. And then times 9/20, getting a white ball on an individual trial, cubed, because that's got to happen three times. and that works out to 0.13122, to 5 decimal places. Had the sampling been done without replacement, then the trials would no longer be independent, and the conditions of the multinomial distribution would no longer be satisfied. We would have to use something called the multivariate hypergeometric distribution to calculate the probability in this case. And, what the heck, let's run through a quick example of that. Here we've got the same problem we just looked at, except I've changed with replacement to without replacement. And there's a fundamental difference there. When I say without replacement, what that means is, if we pull out a red ball, we're putting it aside and it doesn't go back in, and then we randomly select another ball. So if we draw a red ball out on the first trial, say, it's going to be less likely to get a red ball on the second trial, So those trials are no longer independent. We've still got our random variables X_1, X_2 and X_3. And we still want to know the probability that the random variable X_1, the number of red balls, takes on the value 2. And X_2, the number of yellow balls, takes on the value 1, and X_3, the white balls, takes on the value 3. And we're going to do this through the multivariate hypergeometric distribution. On the bottom, we're going to have the number of possible samples. And we are drawing six balls from a total of 20, there are 20 balls altogether. and from those 20 we are choosing 6, so the bottom is going to be the total number of possible samples. In the top, we're going to have the total number of samples that get us what we want. The total number of samples that get 2 red and 1 yellow and 3 white. And for that we're going to say, well we need to pick from those 8 red balls we need to choose 2, so 8 choose 2. And from the 3 yellow balls we need to pick 1, so times 3 choose 1. And from the 9 white balls, we need to pick 3. 8 choose 2, times 3 choose 1, times 9 choose 3, all divided by 20 choose 6. This is the number of ways of getting what we want, over the total number of possible samples of size 6 that can be chosen from 20. And this, if we work this out, to 5 decimal places is 0.18204. Note that's a little bit different from the probability we calculated when it was with replacement

Properties

Marginal distributions

If m-dimensional x is partitioned as follows

\mathbf {X} ={\begin{bmatrix}\mathbf {X} ^{(1)}\\\mathbf {X} ^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}

and accordingly

{\boldsymbol {p}}

{\boldsymbol {p}}={\begin{bmatrix}{\boldsymbol {p}}^{(1)}\\{\boldsymbol {p}}^{(2)}\end{bmatrix}}{\text{ with sizes }}{\begin{bmatrix}n\times 1\\(m-n)\times 1\end{bmatrix}}

and let

q=1-\sum _{i}p_{i}^{(2)}=p_{0}+\sum _{i}p_{i}^{(1)}

The marginal distribution of ${\boldsymbol {X}}^{(1)}$ is $\mathrm {NM} (x_{0},p_{0}/q,{\boldsymbol {p}}^{(1)}/q)$ . That is the marginal distribution is also negative multinomial with the ${\boldsymbol {p}}^{(2)}$ removed and the remaining p's properly scaled so as to add to one.

The univariate marginal $m=1$ is said to have a negative binomial distribution.

Conditional distributions

The conditional distribution of $\mathbf {X} ^{(1)}$ given $\mathbf {X} ^{(2)}=\mathbf {x} ^{(2)}$ is ${\textstyle \mathrm {NM} (x_{0}+\sum {x_{i}^{(2)}},\mathbf {p} ^{(1)})}$ . That is,

\Pr(\mathbf {x} ^{(1)}\mid \mathbf {x} ^{(2)},x_{0},\mathbf {p} )=\Gamma \!\left(\sum _{i=0}^{m}{x_{i}}\right){\frac {(1-\sum _{i=1}^{n}{p_{i}^{(1)}})^{x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)}}}{\Gamma (x_{0}+\sum _{i=1}^{m-n}x_{i}^{(2)})}}\prod _{i=1}^{n}{\frac {(p_{i}^{(1)})^{x_{i}}}{(x_{i}^{(1)})!}}.

Independent sums

If $\mathbf {X} _{1}\sim \mathrm {NM} (r_{1},\mathbf {p} )$ and If $\mathbf {X} _{2}\sim \mathrm {NM} (r_{2},\mathbf {p} )$ are independent, then $\mathbf {X} _{1}+\mathbf {X} _{2}\sim \mathrm {NM} (r_{1}+r_{2},\mathbf {p} )$ . Similarly and conversely, it is easy to see from the characteristic function that the negative multinomial is infinitely divisible.

Aggregation

\mathbf {X} =(X_{1},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{m}))

then, if the random variables with subscripts i and j are dropped from the vector and replaced by their sum,

\mathbf {X} '=(X_{1},\ldots ,X_{i}+X_{j},\ldots ,X_{m})\sim \operatorname {NM} (x_{0},(p_{1},\ldots ,p_{i}+p_{j},\ldots ,p_{m})).

This aggregation property may be used to derive the marginal distribution of $X_{i}$ mentioned above.

Correlation matrix

The entries of the correlation matrix are

\rho (X_{i},X_{i})=1.

\rho (X_{i},X_{j})={\frac {\operatorname {cov} (X_{i},X_{j})}{\sqrt {\operatorname {var} (X_{i})\operatorname {var} (X_{j})}}}={\sqrt {\frac {p_{i}p_{j}}{(p_{0}+p_{i})(p_{0}+p_{j})}}}.

Parameter estimation

Method of Moments

If we let the mean vector of the negative multinomial be

{\boldsymbol {\mu }}={\frac {x_{0}}{p_{0}}}\mathbf {p}

and covariance matrix

{\boldsymbol {\Sigma }}={\tfrac {x_{0}}{p_{0}^{2}}}\,\mathbf {p} \mathbf {p} '+{\tfrac {x_{0}}{p_{0}}}\,\operatorname {diag} (\mathbf {p} ),

then it is easy to show through properties of determinants that

{\textstyle |{\boldsymbol {\Sigma }}|={\frac {1}{p_{0}}}\prod _{i=1}^{m}{\mu _{i}}}

. From this, it can be shown that

x_{0}={\frac {\sum {\mu _{i}}\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}}

and

\mathbf {p} ={\frac {|{\boldsymbol {\Sigma }}|-\prod {\mu _{i}}}{|{\boldsymbol {\Sigma }}|\sum {\mu _{i}}}}{\boldsymbol {\mu }}.

Substituting sample moments yields the method of moments estimates

{\hat {x}}_{0}={\frac {(\sum _{i=1}^{m}{{\bar {x_{i}}})}\prod _{i=1}^{m}{\bar {x_{i}}}}{|\mathbf {S} |-\prod _{i=1}^{m}{\bar {x_{i}}}}}

and

{\hat {\mathbf {p} }}=\left({\frac {|{\boldsymbol {S}}|-\prod _{i=1}^{m}{{\bar {x}}_{i}}}{|{\boldsymbol {S}}|\sum _{i=1}^{m}{{\bar {x}}_{i}}}}\right){\boldsymbol {\bar {x}}}

Related distributions

Negative binomial distribution
Multinomial distribution
Inverted Dirichlet distribution, a conjugate prior for the negative multinomial
Dirichlet negative multinomial distribution

References

^ Le Gall, F. The modes of a negative multinomial distribution, Statistics & Probability Letters, Volume 76, Issue 6, 15 March 2006, Pages 619-624, ISSN 0167-7152, 10.1016/j.spl.2005.09.009.

Waller LA and Zelterman D. (1997). Log-linear modeling with the negative multi- nomial distribution. Biometrics 53: 971–82.

From Wikipedia, the free encyclopedia