To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

# Homoscedasticity

Plot with random data showing homoscedasticity: at each value of x, the y-value of the dots has about the same variance.

In statistics, a sequence (or a vector) of random variables is homoscedastic /ˌhmskəˈdæstɪk/ if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.[1]

Assuming a variable is homoscedastic when in reality it is heteroscedastic /ˌhɛtərskəˈdæstɪk/) results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

## Assumptions of a regression model

A standard assumption in a linear regression, ${\displaystyle y_{i}=X_{i}\beta +\epsilon _{i},i=1,\ldots ,N,}$ is that the variance of the disturbance term ${\displaystyle \epsilon _{i}}$ is the same across observations, and in particular does not depend on the values of the explanatory variables ${\displaystyle X_{i}.}$[2] This is one of the assumptions under which the Gauss–Markov theorem applies and ordinary least squares (OLS) gives the best linear unbiased estimator ("BLUE"). Homoscedasticity is not required for the coefficient estimates to be unbiased, consistent, and asymptotically normal, but it is required for OLS to be efficient.[3] It is also required for the standard errors of the estimates to be unbiased and consistent, so it is required for accurate hypothesis testing, e.g. for a t-test of whether a coefficient is significantly different from zero.

A more formal way to state the assumption of homoskedasticity is that the diagonals of the variance-covariance matrix of ${\displaystyle \epsilon }$ must all be the same number: ${\displaystyle E\epsilon _{i}\epsilon _{i}=\sigma ^{2}}$, where ${\displaystyle \sigma ^{2}}$ is the same for all i.[4] Note that this still allows for the off-diagonals, the covariances ${\displaystyle E\epsilon _{i}\epsilon _{j}}$, to be nonzero, which is a separate violation of the Gauss-Markov assumptions known as serial correlation.

## Examples

The matrices below are covariances of the disturbance, with entries ${\displaystyle E\epsilon _{i}\epsilon _{j}}$, when there are just three observations across time. The disturbance in matrix A is homoskedastic; this is the simple case where OLS is the best linear unbiased estimator. The disturbances in matrices B and C are heteroskedastic. In matrix B, the variance is time-varying, increasing steadily across time; in matrix C, the variance depends on the value of x. The disturbance in matrix D is homoskedastic because the diagonal variances are constant, even though the off-diagonal covariances are non-zero and ordinary least squares is inefficient for a different reason: serial correlation.

${\displaystyle A=\sigma ^{2}{\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\\\end{bmatrix}}\;\;\;\;\;\;\;B=\sigma ^{2}{\begin{bmatrix}1&0&0\\0&2&0\\0&0&3\\\end{bmatrix}}\;\;\;\;\;\;\;C=\sigma ^{2}{\begin{bmatrix}x_{1}&0&0\\0&x_{2}&0\\0&0&x_{3}\\\end{bmatrix}}\;\;\;\;\;\;\;D=\sigma ^{2}{\begin{bmatrix}1&\rho &\rho ^{2}\\\rho &1&\rho \\\rho ^{2}&\rho &1\\\end{bmatrix}}}$

If y is consumption, x is income, and ${\displaystyle \epsilon }$ is whims of the consumer, and we are estimating ${\displaystyle y_{i}=\beta x_{i}+\epsilon _{i},}$ then if richer consumers' whims affect their spending more in absolute dollars, we might have ${\displaystyle Var(\epsilon _{i})=x_{i}\sigma ^{2},}$ rising with income, as in matrix C above.[5]

## Testing

Residuals can be tested for homoscedasticity using the Breusch–Pagan test, which performs an auxiliary regression of the squared residuals on the independent variables. From this auxiliary regression, the explained sum of squares is retained, divided by two, and then becomes the test statistic for a chi-squared distribution with the degrees of freedom equal to the number of independent variables. The null hypothesis of this chi-squared test is homoscedasticity, and the alternative hypothesis would indicate heteroscedasticity. Since the Breusch–Pagan test is sensitive to departures from normality or small sample sizes, the Koenker–Bassett or 'generalized Breusch–Pagan' test is commonly used instead. From the auxiliary regression, it retains the R-squared value which is then multiplied by the sample size, and then becomes the test statistic for a chi-squared distribution (and uses the same degrees of freedom). Although it is not necessary for the Koenker–Bassett test, the Breusch–Pagan test requires that the squared residuals also be divided by the residual sum of squares divided by the sample size.[6] Testing for groupwise heteroscedasticity requires the Goldfeld–Quandt test.

## Homoscedastic distributions

Two or more normal distributions, ${\displaystyle N(\mu _{i},\Sigma _{i})}$, are homoscedastic if they share a common covariance (or correlation) matrix, ${\displaystyle \Sigma _{i}=\Sigma _{j},\ \forall i,j}$. Homoscedastic distributions are especially useful to derive statistical pattern recognition and machine learning algorithms. One popular example is Fisher's linear discriminant analysis.

The concept of homoscedasticity can be applied to distributions on spheres.[7]