To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

# Generalized Dirichlet distribution

In statistics, the generalized Dirichlet distribution (GD) is a generalization of the Dirichlet distribution with a more general covariance structure and almost twice the number of parameters. Random variables with a GD distribution are not completely neutral .[1]

The density function of ${\displaystyle p_{1},\ldots ,p_{k-1}}$ is

${\displaystyle \left[\prod _{i=1}^{k-1}B(a_{i},b_{i})\right]^{-1}p_{k}^{b_{k-1}-1}\prod _{i=1}^{k-1}\left[p_{i}^{a_{i}-1}\left(\sum _{j=i}^{k}p_{j}\right)^{b_{i-1}-(a_{i}+b_{i})}\right]}$

where we define ${\displaystyle p_{k}=1-\sum _{i=1}^{k-1}p_{i}}$. Here ${\displaystyle B(x,y)}$ denotes the Beta function. This reduces to the standard Dirichlet distribution if ${\displaystyle b_{i-1}=a_{i}+b_{i}}$ for ${\displaystyle 2\leqslant i\leqslant k-1}$ (${\displaystyle b_{0}}$ is arbitrary).

For example, if k=4, then the density function of ${\displaystyle p_{1},p_{2},p_{3}}$ is

${\displaystyle \left[\prod _{i=1}^{3}B(a_{i},b_{i})\right]^{-1}p_{1}^{a_{1}-1}p_{2}^{a_{2}-1}p_{3}^{a_{3}-1}p_{4}^{b_{3}-1}\left(p_{2}+p_{3}+p_{4}\right)^{b_{1}-\left(a_{2}+b_{2}\right)}\left(p_{3}+p_{4}\right)^{b_{2}-\left(a_{3}+b_{3}\right)}}$

where ${\displaystyle p_{1}+p_{2}+p_{3}<1}$ and ${\displaystyle p_{4}=1-p_{1}-p_{2}-p_{3}}$.

Connor and Mosimann define the PDF as they did for the following reason. Define random variables ${\displaystyle z_{1},\ldots ,z_{k-1}}$ with ${\displaystyle z_{1}=p_{1},z_{2}=p_{2}/\left(1-p_{1}\right),z_{3}=p_{3}/\left(1-(p_{1}+p_{2})\right),\ldots ,z_{i}=p_{i}/\left(1-\left(p_{1}+\cdots +p_{i-1}\right)\right)}$. Then ${\displaystyle p_{1},\ldots ,p_{k}}$ have the generalized Dirichlet distribution as parametrized above, if the ${\displaystyle z_{i}}$ are independent beta with parameters ${\displaystyle a_{i},b_{i}}$, ${\displaystyle i=1,\ldots ,k-1}$.

• 1/5
Views:
51 320
5 565
369
2 092
14 537
• ✪ Introduction to the Multinomial Distribution
• ✪ Bay Area Discrete Math Day XII: Hierarchial Dirichlet Processes
• ✪ 025 Shifted multiple Dirichlet series and moments of Rankin-Selberg L-functions by J. Hoffstein
• ✪ Liouville Extension of Dirichlet Theorem, Volume bounded by the plane using multiple integral
• ✪ 10. Markov and Hidden Markov Models of Genomic and Protein Features

#### Transcription

Let's continue our look at some discrete probability distributions with an introduction to the multinomial distribution. The multinomial distribution is a generalization of the binomial distribution. In the binomial distribution there are only two possible outcomes on any one individual trial, and we labeled those success and failure. In the multinomial distribution, the number of possible outcomes on any one given trial is allowed to be greater than 2. Let's take a look at an example. This is approximately the distribution of blood types in the United States. And suppose we wanted to know the answer to this question: In a random sample of 10 Americans what is the probability 6 have blood type O, 2 have type A, 1 has type B, and 1 has type AB? When any one individual person is sampled, they're going to have one of these four blood types, according to these probabilities. And we're going to be able to answer this question using the multinomial distribution. Suppose we have n independent trials, and each trial results in 1 of k mutually exclusive outcomes, and these k outcomes are exhaustive, so one of them is going to occur. On any single trial these k outcomes comes occur with probabilities p_1 through p_k. And since these outcomes are mutually exclusive and exhaustive, then they must sum to 1. We also need these probabilities to stay constant from trial to trial. We're going to let the random variable X_i represent the number of occurrences of outcome of i. And i is going to take on the values 1 through k, representing those k possible outcomes on any one individual trial. So we're going to have k random variables, representing a count for each of those possible outcomes. Then the probability the random variable X_1 takes on the value little x_1 and all the way up through the random variable X_k taking on the value little x_k is equal to what we have here. Over here on this side we have p_1, the probability of outcome 1 on any one individual trial, raised to the number of times that outcome 1 happens, and all the way up to here, which is the probability of outcome k occurring on any one individual trial, raised to the number of times we need outcome k to occur. And so what we have here is the probability of any one specific ordering of x_1 occurrences of outcome 1 and x_2 occurrences of outcome 2, all the way up through x_k occurrences of outcome k, and what we have over here is the number of possible orderings that give us x_1 occurrences of outcome 1, all the way up through x_k occurrences of outcome k. And so these multiplied together give us the probability of this happening. We really should list out the possible values of X here. The random variable X_1 can take on the possible values 0,1, 2, all the way up through n. And the same is true of X_2 and all the way up through X_k. So this is true for x_i equalling 0,1, all the way up though n. But we know that n things must happen in total, so the sum of all those individual occurrences, or i equalling 1 through k must equal n. And if we think about this a little bit, any one of these random variables, when viewed individually, it will have a binomial distribution. And if you remember our mean and variance for the binomial distribution, we can say that the expectation of X_i is going to be equal to n times p_i. And the variance of the random variable X_i is going to be equal to np_i times (1-p_i). And you might remember that from our discussion of the mean and variance the binomial distribution. Let's return to our example. In a random sample of 10 Americans, what is the probability 6 have blood type O, 2 have type A, 1 has type B, and 1 has type AB? Well, we've got a random sample here, so knowing one person's blood type is going to tell us nothing about the next person's blood type. So that independence assumption is pretty reasonable here. On any one individual trial, we are going to get one of these 4 blood types. and these probabilities are staying constant from trial to trial. So the multinomial distribution is reasonable here. And we want to find out the probability that the random variable X_1, which is representing the number of people in our sample that have blood type O, the probability that takes on the value 6, and our random variable X_2, representing the number with type A, that takes on the value 2, and X_3 takes on the value 1, and X_4 takes on the value 1. And this is going to be equal to n factorial, we've got a sample size 10, so 10! over x_1 factorial, that's just the number with type O and that's 6 factorial, times 2 factorial times, 1 factorial, times 1 factorial, and now it's time for these probabilities. the probability blood type O happens on any one individual person is 0.44. And we need that to happen 6 times, so we're going to raise that to the sixth power. And then we're going to multiply that by the probability of blood type A, 0.42, squared, because we needed that to happen twice, And then we are multiplying that by 0.10, blood type B, raised to the first power, we need that happen once, and then multiplying that by 0.04 raised to the first power. And if we calculate that, we would see that that is 0.01290, when rounded to five decimal places. Let's look at another example here. An urn contains eight red balls, three yellow balls, and nine white balls. Six balls are randomly selected with replacement. What is the probability 2 are red, 1 is yellow, and 3 are white? This "with replacement' is an important notion here. If we are putting the ball back in and shaking it all up and then randomly selecting again, then the individual trials are independent, and the probability of getting a red ball, or a yellow ball, or white ball, those probabilities are staying constant through the different trials. And so the conditions for our multinomial distribution are satisfied here. And we're interested here in the probability that the random variable X_1, which I'm going to let represent the number red balls, the probability the random variable X_1 takes on the value 2, And X_2, the number of yellow balls, takes on the value 1, and X_3, the number of white balls, takes on the value 3. And this is going to be equal to n factorial, so six factorial, over 2 factorial, the number red balls, times 1 factorial, the number of yellow balls, times 3 factorial, the number of white balls. And now to the probabilities. Well we have 8 red balls, and there is 8+3+9, 20 balls in total. So the probability of getting 1 red on any one individual trial is going to be the eight red balls out of the 20 total, and we need that to happen twice, so we're gonna square that. And then we're going to multiply that by the probability of getting a yellow ball on any one individual trial, which is 3 out of 20, raised to the first power, because we need that to happen once. And then times 9/20, getting a white ball on an individual trial, cubed, because that's got to happen three times. and that works out to 0.13122, to 5 decimal places. Had the sampling been done without replacement, then the trials would no longer be independent, and the conditions of the multinomial distribution would no longer be satisfied. We would have to use something called the multivariate hypergeometric distribution to calculate the probability in this case. And, what the heck, let's run through a quick example of that. Here we've got the same problem we just looked at, except I've changed with replacement to without replacement. And there's a fundamental difference there. When I say without replacement, what that means is, if we pull out a red ball, we're putting it aside and it doesn't go back in, and then we randomly select another ball. So if we draw a red ball out on the first trial, say, it's going to be less likely to get a red ball on the second trial, So those trials are no longer independent. We've still got our random variables X_1, X_2 and X_3. And we still want to know the probability that the random variable X_1, the number of red balls, takes on the value 2. And X_2, the number of yellow balls, takes on the value 1, and X_3, the white balls, takes on the value 3. And we're going to do this through the multivariate hypergeometric distribution. On the bottom, we're going to have the number of possible samples. And we are drawing six balls from a total of 20, there are 20 balls altogether. and from those 20 we are choosing 6, so the bottom is going to be the total number of possible samples. In the top, we're going to have the total number of samples that get us what we want. The total number of samples that get 2 red and 1 yellow and 3 white. And for that we're going to say, well we need to pick from those 8 red balls we need to choose 2, so 8 choose 2. And from the 3 yellow balls we need to pick 1, so times 3 choose 1. And from the 9 white balls, we need to pick 3. 8 choose 2, times 3 choose 1, times 9 choose 3, all divided by 20 choose 6. This is the number of ways of getting what we want, over the total number of possible samples of size 6 that can be chosen from 20. And this, if we work this out, to 5 decimal places is 0.18204. Note that's a little bit different from the probability we calculated when it was with replacement

## Alternative form given by Wong

Wong [2] gives the slightly more concise form for ${\displaystyle x_{1}+\cdots +x_{k}\leqslant 1}$

${\displaystyle \prod _{i=1}^{k}{\frac {x_{i}^{\alpha _{i}-1}\left(1-x_{1}-\cdots -x_{i}\right)^{\gamma _{i}}}{B(\alpha _{i},\beta _{i})}}}$

where ${\displaystyle \gamma _{j}=\beta _{j}-\alpha _{j+1}-\beta _{j+1}}$ for ${\displaystyle 1\leqslant j\leqslant k-1}$ and ${\displaystyle \gamma _{k}=\beta _{k}-1}$. Note that Wong defines a distribution over a ${\displaystyle k}$ dimensional space (implicitly defining ${\displaystyle x_{k+1}=1-\sum _{i=1}^{k}x_{i}}$) while Connor and Mosiman use a ${\displaystyle k-1}$ dimensional space with ${\displaystyle x_{k}=1-\sum _{i=1}^{k-1}x_{i}}$.

## General moment function

If ${\displaystyle X=\left(X_{1},\ldots ,X_{k}\right)\sim GD_{k}\left(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k}\right)}$, then

${\displaystyle E\left[X_{1}^{r_{1}}X_{2}^{r_{2}}\cdots X_{k}^{r_{k}}\right]=\prod _{j=1}^{k}{\frac {\Gamma \left(\alpha _{j}+\beta _{j}\right)\Gamma \left(\alpha _{j}+r_{j}\right)\Gamma \left(\beta _{j}+\delta _{j}\right)}{\Gamma \left(\alpha _{j}\right)\Gamma \left(\beta _{j}\right)\Gamma \left(\alpha _{j}+\beta _{j}+r_{j}+\delta _{j}\right)}}}$

where ${\displaystyle \delta _{j}=r_{j+1}+r_{j+2}+\cdots +r_{k}}$ for ${\displaystyle j=1,2,\cdots ,k-1}$ and ${\displaystyle \delta _{k}=0}$. Thus

${\displaystyle E\left(X_{j}\right)={\frac {\alpha _{j}}{\alpha _{j}+\beta _{j}}}\prod _{m=1}^{j-1}{\frac {\beta _{m}}{\alpha _{m}+\beta _{m}}}.}$

## Reduction to standard Dirichlet distribution

As stated above, if ${\displaystyle b_{i-1}=a_{i}+b_{i}}$ for ${\displaystyle 2\leqslant i\leqslant k}$ then the distribution reduces to a standard Dirichlet. This condition is different from the usual case, in which setting the additional parameters of the generalized distribution to zero results in the original distribution. However, in the case of the GDD, this results in a very complicated density function.

## Bayesian analysis

Suppose ${\displaystyle X=\left(X_{1},\ldots ,X_{k}\right)\sim GD_{k}\left(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k}\right)}$ is generalized Dirichlet, and that ${\displaystyle Y\mid X}$ is multinomial with ${\displaystyle n}$ trials (here ${\displaystyle Y=\left(Y_{1},\ldots ,Y_{k}\right)}$). Writing ${\displaystyle Y_{j}=y_{j}}$ for ${\displaystyle 1\leqslant j\leqslant k}$ and ${\displaystyle y_{k+1}=n-\sum _{i=1}^{k}y_{i}}$ the joint posterior of ${\displaystyle X|Y}$ is a generalized Dirichlet distribution with

${\displaystyle X\mid Y\sim GD_{k}\left({\alpha '}_{1},\ldots ,{\alpha '}_{k};{\beta '}_{1},\ldots ,{\beta '}_{k}\right)}$

where ${\displaystyle {\alpha '}_{j}=\alpha _{j}+y_{j}}$ and ${\displaystyle {\beta '}_{j}=\beta _{j}+\sum _{i=j+1}^{k+1}y_{i}}$ for ${\displaystyle 1\leqslant k.}$

## Sampling experiment

Wong gives the following system as an example of how the Dirichlet and generalized Dirichlet distributions differ. He posits that a large urn contains balls of ${\displaystyle k+1}$ different colours. The proportion of each colour is unknown. Write ${\displaystyle X=(X_{1},\ldots ,X_{k})}$ for the proportion of the balls with colour ${\displaystyle j}$ in the urn.

Experiment 1. Analyst 1 believes that ${\displaystyle X\sim D(\alpha _{1},\ldots ,\alpha _{k},\alpha _{k+1})}$ (ie, ${\displaystyle X}$ is Dirichlet with parameters ${\displaystyle \alpha _{i}}$). The analyst then makes ${\displaystyle k+1}$ glass boxes and puts ${\displaystyle \alpha _{i}}$ marbles of colour ${\displaystyle i}$ in box ${\displaystyle i}$ (it is assumed that the ${\displaystyle \alpha _{i}}$ are integers ${\displaystyle \geq 1}$). Then analyst 1 draws a ball from the urn, observes its colour (say colour ${\displaystyle j}$) and puts it in box ${\displaystyle j}$. He can identify the correct box because they are transparent and the colours of the marbles within are visible. The process continues until ${\displaystyle n}$ balls have been drawn. The posterior distribution is then Dirichlet with parameters being the number of marbles in each box.

Experiment 2. Analyst 2 believes that ${\displaystyle X}$ follows a generalized Dirichlet distribution: ${\displaystyle X\sim GD(\alpha _{1},\ldots ,\alpha _{k};\beta _{1},\ldots ,\beta _{k})}$. All parameters are again assumed to be positive integers. The analyst makes ${\displaystyle k+1}$ wooden boxes. The boxes have two areas: one for balls and one for marbles. The balls are coloured but the marbles are not coloured. Then for ${\displaystyle j=1,\ldots ,k}$, he puts ${\displaystyle \alpha _{j}}$ balls of colour ${\displaystyle j}$, and ${\displaystyle \beta _{j}}$ marbles, in to box ${\displaystyle j}$. He then puts a ball of colour ${\displaystyle k+1}$ in box ${\displaystyle k+1}$. The analyst then draws a ball from the urn. Because the boxes are wood, the analyst cannot tell which box to put the ball in (as he could in experiment 1 above); he also has a poor memory and cannot remember which box contains which colour balls. He has to discover which box is the correct one to put the ball in. He does this by opening box 1 and comparing the balls in it to the drawn ball. If the colours differ, the box is the wrong one. The analyst puts a marble (sic) in box 1 and proceeds to box 2. He repeats the process until the balls in the box match the drawn ball, at which point he puts the ball (sic) in the box with the other balls of matching colour. The analyst then draws another ball from the urn and repeats until ${\displaystyle n}$ balls are drawn. The posterior is then generalized Dirichlet with parameters ${\displaystyle \alpha }$ being the number of balls, and ${\displaystyle \beta }$ the number of marbles, in each box.

Note that in experiment 2, changing the order of the boxes has a non-trivial effect, unlike experiment 1.