Goodness of fit

Regression analysis
Part of a series on
Models
Linear regression Simple regression Polynomial regression General linear model
Generalized linear model Vector generalized linear model Discrete choice Binomial regression Binary regression Logistic regression Multinomial logistic regression Mixed logit Probit Multinomial probit Ordered logit Ordered probit Poisson
Multilevel model Fixed effects Random effects Linear mixed-effects model Nonlinear mixed-effects model
Nonlinear regression Nonparametric Semiparametric Robust Quantile Isotonic Principal components Least angle Local Segmented
Errors-in-variables
Estimation
Least squares Linear Non-linear
Ordinary Weighted Generalized Generalized estimating equation
Partial Total Non-negative Ridge regression Regularized
Least absolute deviations Iteratively reweighted Bayesian Bayesian multivariate Least-squares spectral analysis
Background
Regression validation Mean and predicted response Errors and residuals Goodness of fit Studentized residual Gauss–Markov theorem
Mathematics portal
v t e

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

YouTube Encyclopedic

1/5
Views:
1 676 056
529 342
24 989
1 076 250
151 056

Transcription

I'm thinking about buying a restaurant, so I go and ask the current owner, what is the distribution of the number of customers you get each day? And he says, oh, I've already figure that out. And he gives me this distribution over here, which essentially says 10% of his customers come in on Monday, 10% on Tuesday, 15% on Wednesday, so forth, and so on. They're closed on Sunday. So this is 100% of the customers for a week. If you add that up, you get 100%. I obviously am a little bit suspicious, so I decide to see how good this distribution that he's describing actually fits observed data. So I actually observe the number of customers, when they come in during the week, and this is what I get from my observed data. So to figure out whether I want to accept or reject his hypothesis right here, I'm going to do a little bit of a hypothesis test. So I'll make the null hypothesis that the owner's distribution-- so that's this thing right here-- is correct. And then the alternative hypothesis is going to be that it is not correct, that it is not a correct distribution, that I should not feel reasonably OK relying on this. It's not the correct-- I should reject the owner's distribution. And I want to do this with a significance level of 5%. Or another way of thinking about it, I'm going to calculate a statistic based on this data right here. And it's going to be chi-square statistic. Or another way to view it is it that statistic that I'm going to calculate has approximately a chi-square distribution. And given that it does have a chi-square distribution with a certain number of degrees of freedom and we're going to calculate that, what I want to see is the probability of getting this result, or getting a result like this or a result more extreme less than 5%. If the probability of getting a result like this or something less likely than this is less than 5%, then I'm going to reject the null hypothesis, which is essentially just rejecting the owner's distribution. If I don't get that, if I say, hey, the probability of getting a chi-square statistic that is this extreme or more is greater than my alpha, than my significance level, then I'm not going to reject it. I'm going to say, well, I have no reason to really assume that he's lying. So let's do that. So to calculate the chi-square statistic, what I'm going to do is-- so here we're assuming the owner's distribution is correct. So assuming the owner's distribution was correct, what would have been the expected observed? So we have expected percentage here, but what would have been the expected observed? So let me write this right here. Expected. I'll add another row, Expected. So we would have expected 10% of the total customers in that week to come in on Monday, 10% of the total customers of that week to come in on Tuesday, 15% to come in on Wednesday. Now to figure out what the actual number is, we need to figure out the total number of customers. So let's add up these numbers right here. So we have-- I'll get the calculator out. So we have 30 plus 14 plus 34 plus 45 plus 57 plus 20. So there's a total of 200 customers who came into the restaurant that week. So let me write this down. So this is equal to-- so I wrote the total over here. Ignore this right here. I had 200 customers come in for the week. So what was the expected number on Monday? Well, on Monday, we would have expected 10% of the 200 to come in. So this would have been 20 customers, 10% times 200. On Tuesday, another 10%. So we would have expected 20 customers. Wednesday, 15% of 200, that's 30 customers. On Thursday, we would have expected 20% of 200 customers, so that would have been 40 customers. Then on Friday, 30%, that would have been 60 customers. And then on Friday 15% again. 15% of 200 would have been 30 customers. So if this distribution is correct, this is the actual number that I would have expected. Now to calculate chi-square statistic, we essentially just take-- let me just show it to you, and instead of writing chi, I'm going to write capital X squared. Sometimes someone will write the actual Greek letter chi here. But I'll write the x squared here. And let me write it this way. This is our chi-square statistic, but I'm going to write it with a capital X instead of a chi because this is going to have approximately a chi-squared distribution. I can't assume that it's exactly, so this is where we're dealing with approximations right here. But it's fairly straightforward to calculate. For each of the days, we take the difference between the observed and expected. So it's going to be 30 minus 20-- I'll do the first one color coded-- squared divided by the expected. So we're essentially taking the square of almost you could kind of do the error between what we observed and expected or the difference between what we observed and expect, and we're kind of normalizing it by the expected right over here. But we want to take the sum of all of these. So I'll just do all of those in yellow. So plus 14 minus 20 squared over 20 plus 34 minus 30 squared over 30 plus-- I'll continue over here-- 45 minus 40 squared over 40 plus 57 minus 60 squared over 60, and then finally, plus 20 minus 30 squared over 30. I just took the observed minus the expected squared over the expected. I took the sum of it, and this is what gives us our chi-square statistic. Now let's just calculate what this number is going to be. So this is going to be equal to-- I'll do it over here so you don't run out of space. So we'll do this a new color. We'll do it in orange. This is going to be equal to 30 minus 20 is 10 squared, which is 100 divided by 20, which is 5. I might not be able to do all of them in my head like this. Plus, actually, let me just write it this way just so you can see what I'm doing. This right here is 100 over 20 plus 14 minus 20 is negative 6 squared is positive 36. So plus 36 over 20. Plus 34 minus 30 is 4, squared is 16. So plus 16 over 30. Plus 45 minus 40 is 5 squared is 25. So plus 25 over 40. Plus the difference here is 3 squared is 9, so it's 9 over 60. Plus we have a difference of 10 squared is plus 100 over 30. And this is equal to-- and I'll just get the calculator out for this-- this is equal to, we have 100 divided by 20 plus 36 divided by 20 plus 16 divided by 30 plus 25 divided by 40 plus 9 divided by 60 plus 100 divided by 30 gives us 11.44. So let me write that down. So this right here is going to be 11.44. This is my chi-square statistic, or we could call it a big capital X squared. Sometimes you'll have it written as a chi-square, but this statistic is going to have approximately a chi-square distribution. Anyway, with that said, let's figure out, if we assume that it has roughly a chi-square distribution, what is the probability of getting a result this extreme or at least this extreme, I guess is another way of thinking about it. Or another way of saying, is this a more extreme result than the critical chi-square value that there's a 5% chance of getting a result that extreme? So let's do it that way. Let's figure out the critical chi-square value. And if this is more extreme than that, then we will reject our null hypothesis. So let's figure out our critical chi-square values. So we have an alpha of 5%. And actually the other thing we have to figure out is the degrees of freedom. The degrees of freedom, we're taking one, two, three, four, five, six sums, so you might be tempted to say the degrees of freedom are six. But one thing to realize is that if you had all of this information over here, you could actually figure out this last piece of information, so you actually have five degrees of freedom. When you have just kind of n data points like this, and you're measuring kind of the observed versus expected, your degrees of freedom are going to be n minus 1, because you could figure out that nth data point just based on everything else that you have, all of the other information. So our degrees of freedom here are going to be 5. It's n minus 1. So our significance level is 5%. And our degrees of freedom is also going to be equal to 5. So let's look at our chi-square distribution. We have a degree of freedom of 5. We have a significance level of 5%. And so the critical chi-square value is 11.07. So let's go with this chart. So we have a chi-squared distribution with a degree of freedom of 5. So that's this distribution over here in magenta. And we care about a critical value of 11.07. So this is right here. Oh, you actually even can't see it on this. So if I were to keep drawing this magenta thing all the way over here, if the magenta line just kept going, over here, you'd have 8. Over here you'd have 10. Over here, you'd have 12. 11.07 is maybe some place right over there. So what it's saying is the probability of getting a result at least as extreme as 11.07 is 5%. So we could write it even here. Our critical chi-square value is equal to-- we just saw-- 11.07. Let me look at the chart again. 11.07. The result we got for our statistic is even less likely than that. The probability is less than our significance level. So then we are going to reject. So the probability of getting that is-- let me put it this way-- 11.44 is more extreme than our critical chi-square level. So it's very unlikely that this distribution is true. So we will reject what he's telling us. We will reject this distribution. It's not a good fit based on this significance level.

Fit of distributions

In assessing whether a given distribution is suited to a data-set, the following tests and their underlying measures of fit can be used:

Bayesian information criterion
Kolmogorov–Smirnov test
Cramér–von Mises criterion
Anderson–Darling test
Berk-Jones tests^[1]^[2]
Shapiro–Wilk test
Chi-squared test
Akaike information criterion
Hosmer–Lemeshow test
Kuiper's test
Kernelized Stein discrepancy^[3]^[4]
Zhang's Z_K, Z_C and Z_A tests^[5]
Moran test
Density Based Empirical Likelihood Ratio tests^[6]

Regression analysis

In regression analysis, more specifically regression validation, the following topics relate to goodness of fit:

Coefficient of determination (the R-squared measure of goodness of fit);
Lack-of-fit sum of squares;
Mallows's Cp criterion
Prediction error
Reduced chi-square

Categorical data

The following are examples that arise in the context of categorical data.

Pearson's chi-square test

Pearson's chi-square test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

\chi ^{2}=\sum _{i=1}^{n}{{\frac {(O_{i}-E_{i})}{E_{i}}}^{2}}

where:

O_i = an observed count for bin i
E_i = an expected count for bin i, asserted by the null hypothesis.

The expected frequency is calculated by:

E_{i}\,=\,{\bigg (}F(Y_{u})\,-\,F(Y_{l}){\bigg )}\,N

where:

F = the cumulative distribution function for the probability distribution being tested.
Y_u = the upper limit for class i,
Y_l = the lower limit for class i, and
N = the sample size

The resulting value can be compared with a chi-square distribution to determine the goodness of fit. The chi-square distribution has (k − c) degrees of freedom, where k is the number of non-empty cells and c is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter Weibull distribution, c = 4.

Binomial case

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that np_i ≫ 1 for every i (where i = 1, 2, ..., k), then

\chi ^{2}=\sum _{i=1}^{k}{\frac {(N_{i}-np_{i})^{2}}{np_{i}}}=\sum _{\mathrm {all\ cells} }^{}{\frac {(\mathrm {O} -\mathrm {E} )^{2}}{\mathrm {E} }}.

This has approximately a chi-square distribution with k − 1 degrees of freedom. The fact that there are k − 1 degrees of freedom is a consequence of the restriction ${\textstyle \sum N_{i}=n}$ . We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus k − 1 degrees of freedom.

G-test

G-tests are likelihood-ratio tests of statistical significance that are increasingly being used in situations where Pearson's chi-square tests were previously recommended.^[7]

The general formula for G is

G=2\sum _{i}{O_{i}\cdot \ln \left({\frac {O_{i}}{E_{i}}}\right)},

where ${\textstyle O_{i}}$ and ${\textstyle E_{i}}$ are the same as for the chi-square test, ${\textstyle \ln }$ denotes the natural logarithm, and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count:

\sum _{i}O_{i}=\sum _{i}E_{i}=N

where

{\textstyle N}

is the total number of observations.

G-tests have been recommended at least since the 1981 edition of the popular statistics textbook by Robert R. Sokal and F. James Rohlf.^[8]

References

^ Berk, Robert H.; Jones, Douglas H. (1979). "Goodness-of-fit test statistics that dominate the Kolmogorov statistics". Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete. 47 (1): 47–59. doi:10.1007/BF00533250.
^ Moscovich, Amit; Nadler, Boaz; Spiegelman, Clifford (2016). "On the exact Berk-Jones statistics and their p-value calculation". Electronic Journal of Statistics. 10 (2). arXiv:1311.3190. doi:10.1214/16-EJS1172.
^ Liu, Qiang; Lee, Jason; Jordan, Michael (20 June 2016). "A Kernelized Stein Discrepancy for Goodness-of-fit Tests". Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 276–284.
^ Chwialkowski, Kacper; Strathmann, Heiko; Gretton, Arthur (20 June 2016). "A Kernel Test of Goodness of Fit". Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 2606–2615.
^ Zhang, Jin (2002). "Powerful goodness-of-fit tests based on the likelihood ratio" (PDF). J. R. Stat. Soc. B. 64 (2): 281–294. doi:10.1111/1467-9868.00337. Retrieved 5 November 2018.
^ Vexler, Albert; Gurevich, Gregory (2010). "Empirical Likelihood Ratios Applied to Goodness-of-Fit Tests Based on Sample Entropy". Computational Statistics and Data Analysis. 54 (2): 531–545. doi:10.1016/j.csda.2009.09.025.
^ McDonald, J.H. (2014). "G–test of goodness-of-fit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58.
^ Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). W. H. Freeman. ISBN 0-7167-2411-1.

From Wikipedia, the free encyclopedia