Average absolute deviation

The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, mode, or the result of any other measure of central tendency or any reference value related to the given data set. AAD includes the mean absolute deviation and the median absolute deviation (both abbreviated as MAD).

YouTube Encyclopedic

1/5
Views:
1 008 859
54 632
253 303
24 247
7 144

Transcription

- [Voiceover] Let's say that I've got two different data sets. The first data set, I have two, another two, a four, and a four. And then, in the other data set, I have a one. We'll do this on the right side of the screen. A one, a one, a six, and a four. Now, the first thing I wanna think about is, "Well, how do I ... "Is there a number that can give me "a measure of center of each of these data sets?" And one of the ways that we know how to do that is by finding the mean. So let's figure out the mean of each of these data sets. This first data set, the mean ... Well, we just need to sum up all of the numbers. That's gonna be two plus two plus four plus four. And then we're gonna divide by the number of numbers that we have. So we have one, two, three, four numbers. That's that four right over there. And this is going to be, two plus two is four, plus four is eight, plus four is 12. This is gonna be 12 over four, which is equal to three. Actually, let's see if we can visualize this a little bit on a number line. Actually I'll do kind of a ... I'll do a little bit of a dot plot here so we can see all of the values. If this is zero, one, two, three, four, and five. We have two twos. Why don't I just do ... So for each of these twos ... Actually, I'll just do it in yellow. So I have one two, then I have another two. I'm just gonna do a dot plot here. Then I have two fours. So, one four and another four, right over there. And we calculated that the mean is three. The mean is three. A measure of central tendency, it is three. So I'll just put three right over here. I'll just mark it with that dotted line. That's where the mean is. All right. Well, we've visualized that a little bit. That does look like it's the center. It's a pretty ... It makes sense. So now let's look at this other data set right over here. The mean, the mean over here is going to be equal to one plus one plus six plus four, all of that over, we still have four data points. And this is two plus six is eight, plus four is 12, 12 divided by four ... This is also three. So this also has the same mean. We have different numbers, but we have the same mean. But there's something about this data set that feels a little bit different about this. And let's visualize it, to see if we can see a difference. Let's see if we can visualize it. I have to go all the way up to six. Let's say this is zero, one, two, three, four, five, six, and I'll go one more, seven. So we have a one. We have a one, we have another one. We have a six. And then we have a four. And we calculated that the mean is three. So we calculated that the mean is three. So the mean is three. When we measure it by the mean, the central point, or measure of that central point which we use as the mean, well, it looks the same, but the data sets look different. How do they look different? Well, we've talked about notions of variability or variation. And it looks like this data set is more spread out. It looks like the data points are on average further away from the mean than these data points are. That's an interesting question that we ask ourselves in statistics. We just don't want a measure of center, like the mean. We might also want a measure of variability. And one of the more straightforward ways to think about variability is, well, on average, how far are each of the data points from the mean? That might sound a little complicated, but we're gonna figure out what that means in a second, (chortles) not to overuse the word "mean." So we wanna figure out, on average, how far each of these data points from the mean. And what we're about to calculate, this is called Mean Absolute Deviation. Absolute Deviation. Mean Absolute Deviation, or if you just use the acronym, MAD, mad, for Mean Absolute Deviation. And all we're talking about, we're gonna figure out how much do each of these points, their distance, so absolute deviation. How much do the deviate from the mean, but the absolute of it? So each of these points at two, they are one away from the mean. Doesn't matter if they're less or more. They're one away from the mean. And then we find the mean of all of the deviations. So what does that mean? (chuckles) I'm using the word "mean," using it a little bit too much. So let's figure out the Mean Absolute Deviation of this first data set. We've been able to figure out what the mean is. The mean is three. So we take each of the data points and we figure out, what's its absolute deviation from the mean? So we take the first two. So we say, two minus the mean. Two minus the mean, and we take the absolute value. So that's its absolute deviation. Then we have another two, so we find that absolute deviation from three. Remember, if we're just taking two minus three, taking the absolute value, that's just saying its absolute deviation. How far is it from three? It's fairly easy to calculate in this case. Then we have a four and another four. Let me write that. Then we have the absolute deviation of four from three, from the mean. Then plus, we have another four. We have this other four right up here. Four minus three. We take the absolute value, because once again, it's absolute deviation. And then we divide it, and then we divide it by the number of data points we have. So what is this going to be? Two minus three is negative one, but we take the absolute value. It's just going to be one. Two minus three is negative one. We take the absolute value. It's just gonna be one. And you see that here visually. This point is just one away. It's just one away from three. This point is just one away from three. Four minus three is one. Absolute value of that is one. This point is just one away from three. Four minus three, absolute value. That's another one. So you see in this case, every data point was exactly one away from the mean. And we took the absolute value so that we don't have negative ones here. We just care how far it is in absolute terms. So you have four data points. Each of their absolute deviations is four away. So the mean of the absolute deviations are one plus one plus one plus one, which is four, over four. So it's equal to one. One way to think about it is saying, on average, the mean of the distances of these points away from the actual mean is one. And that makes sense because all of these are exactly one away from the mean. Now, let's see how, what results we get for this data set right over here. And I'll do it ... Let me actually get some space over here. At any point, if you get inspired, I encourage you to calculate the Mean Absolute Deviation on your own. So let's calculate it. The Mean Absolute Deviation here, I'll write MAD, is going to be equal to ... Well, let's figure out the absolute deviation of each of these points from the mean. It's the absolute value of one minus three, that's this first one, plus the absolute deviation, so one minus three, that's the second one, then plus the absolute value of six minus three, that's the six, then we have the four, plus the absolute value of four minus three. Then we have four points. So one minus three is negative two. Absolute value is two. And we see that here. This is two away from three. We just care about absolute deviation. We don't care if it's to the left or to the right. Then we have another one minus three is negative two. It's absolute value, so this is two. That's this. This is two away from the mean. Then we have six minus three. Absolute value of that is going to be three. And that's this right over here. We see this six is three to the right of the mean. We don't care whether it's to the right or the left. And then four minus three. Four minus three is one, absolute value is one. And we see that. It is one to the right of three. And so what do we have? We have two plus two is four, plus three is seven, plus one is eight, over four, which is equal to two. So the Mean Absolute Deviation ... Let me write it down. It fell off over here. Here, for this data set, the Mean Absolute Deviation is equal to two, while for this data set, the Mean Absolute Deviation is equal to one. And that makes sense. They have the exact same means. They both have a mean of three. But this one is more spread out. The one on the right is more spread out because, on average, each of these points are two away from three, while on average, each of these points are one away from three. The means of the absolute deviations on this one is one. The means of the absolute deviations on this one is two. So the green one is more spread out from the mean.

Measures of dispersion

Several measures of statistical dispersion are defined in terms of the absolute deviation. The term "average absolute deviation" does not uniquely identify a measure of statistical dispersion, as there are several measures that can be used to measure absolute deviations, and there are several measures of central tendency that can be used as well. Thus, to uniquely identify the absolute deviation it is necessary to specify both the measure of deviation and the measure of central tendency. The statistical literature has not yet adopted a standard notation, as both the mean absolute deviation around the mean and the median absolute deviation around the median have been denoted by their initials "MAD" in the literature, which may lead to confusion, since they generally have values considerably different from each other.

Mean absolute deviation around a central point

The mean absolute deviation of a set {x₁, x₂, ..., x_n} is

{\frac {1}{n}}\sum _{i=1}^{n}|x_{i}-m(X)|.

The choice of measure of central tendency, $m(X)$ , has a marked effect on the value of the mean deviation. For example, for the data set {2, 2, 3, 4, 14}:

Measure of central tendency $m(X)$	Mean absolute deviation
Arithmetic Mean = 5	${\frac {\|2-5\|+\|2-5\|+\|3-5\|+\|4-5\|+\|14-5\|}{5}}=3.6$
Median = 3	${\frac {\|2-3\|+\|2-3\|+\|3-3\|+\|4-3\|+\|14-3\|}{5}}=2.8$
Mode = 2	${\frac {\|2-2\|+\|2-2\|+\|3-2\|+\|4-2\|+\|14-2\|}{5}}=3.0$

Mean absolute deviation around the mean

The mean absolute deviation (MAD), also referred to as the "mean deviation" or sometimes "average absolute deviation", is the mean of the data's absolute deviations around the data's mean: the average (absolute) distance from the mean. "Average absolute deviation" can refer to either this usage, or to the general form with respect to a specified central point (see above).

MAD has been proposed to be used in place of standard deviation since it corresponds better to real life.^[1] Because the MAD is a simpler measure of variability than the standard deviation, it can be useful in school teaching.^[2]^[3]

This method's forecast accuracy is very closely related to the mean squared error (MSE) method which is just the average squared error of the forecasts. Although these methods are very closely related, MAD is more commonly used because it is both easier to compute (avoiding the need for squaring)^[4] and easier to understand.^[5]

For the normal distribution, the ratio of mean absolute deviation from the mean to standard deviation is ${\textstyle {\sqrt {2/\pi }}=0.79788456\ldots }$ . Thus if X is a normally distributed random variable with expected value 0 then, see Geary (1935):^[6]

w={\frac {E|X|}{\sqrt {E(X^{2})}}}={\sqrt {\frac {2}{\pi }}}.

In other words, for a normal distribution, mean absolute deviation is about 0.8 times the standard deviation. However, in-sample measurements deliver values of the ratio of mean average deviation / standard deviation for a given Gaussian sample n with the following bounds:

w_{n}\in [0,1]

, with a bias for small n.^[7]

The mean absolute deviation from the mean is less than or equal to the standard deviation; one way of proving this relies on Jensen's inequality.

Proof

Jensen's inequality is $\varphi \left(\mathbb {E} [Y]\right)\leq \mathbb {E} \left[\varphi (Y)\right]$ , where φ is a convex function, this implies for $Y=\vert X-\mu \vert$ that:

\left(\mathbb {E} |X-\mu \right|)^{2}\leq \mathbb {E} \left(|X-\mu |^{2}\right)

\left(\mathbb {E} |X-\mu \right|)^{2}\leq \operatorname {Var} (X)

Since both sides are positive, and the square root is a monotonically increasing function in the positive domain:

\mathbb {E} \left(|X-\mu \right|)\leq {\sqrt {\operatorname {Var} (X)}}

For a general case of this statement, see Hölder's inequality.

Mean absolute deviation around the median

The median is the point about which the mean deviation is minimized. The MAD median offers a direct measure of the scale of a random variable around its median

D_{\text{med}}=E|X-{\text{median}}|

This is the maximum likelihood estimator of the scale parameter $b$ of the Laplace distribution.

Since the median minimizes the average absolute distance, we have $D_{\text{med}}\leq D_{\text{mean}}$ . The mean absolute deviation from the median is less than or equal to the mean absolute deviation from the mean. In fact, the mean absolute deviation from the median is always less than or equal to the mean absolute deviation from any other fixed number.

By using the general dispersion function, Habib (2011) defined MAD about median as

D_{\text{med}}=E|X-{\text{median}}|=2\operatorname {Cov} (X,I_{O})

where the indicator function is

\mathbf {I} _{O}:={\begin{cases}1&{\text{if }}x>{\text{median}},\\0&{\text{otherwise}}.\end{cases}}

This representation allows for obtaining MAD median correlation coefficients.^{[citation needed]}

Median absolute deviation around a central point

While in principle the mean or any other central point could be taken as the central point for the median absolute deviation, most often the median value is taken instead.

Median absolute deviation around the median

The median absolute deviation (also MAD) is the median of the absolute deviation from the median. It is a robust estimator of dispersion.

For the example {2, 2, 3, 4, 14}: 3 is the median, so the absolute deviations from the median are {1, 1, 0, 1, 11} (reordered as {0, 1, 1, 1, 11}) with a median of 1, in this case unaffected by the value of the outlier 14, so the median absolute deviation is 1.

For a symmetric distribution, the median absolute deviation is equal to half the interquartile range.

Maximum absolute deviation

The maximum absolute deviation around an arbitrary point is the maximum of the absolute deviations of a sample from that point. While not strictly a measure of central tendency, the maximum absolute deviation can be found using the formula for the average absolute deviation as above with $m(X)=\max(X)$ , where $\max(X)$ is the sample maximum.

Minimization

The measures of statistical dispersion derived from absolute deviation characterize various measures of central tendency as minimizing dispersion: The median is the measure of central tendency most associated with the absolute deviation. Some location parameters can be compared as follows:

L² norm statistics: the mean minimizes the mean squared error
L¹ norm statistics: the median minimizes average absolute deviation,
L^∞ norm statistics: the mid-range minimizes the maximum absolute deviation
trimmed L^∞ norm statistics: for example, the midhinge (average of first and third quartiles) which minimizes the median absolute deviation of the whole distribution, also minimizes the maximum absolute deviation of the distribution after the top and bottom 25% have been trimmed off.

Estimation

The mean absolute deviation of a sample is a biased estimator of the mean absolute deviation of the population. In order for the absolute deviation to be an unbiased estimator, the expected value (average) of all the sample absolute deviations must equal the population absolute deviation. However, it does not. For the population 1,2,3 both the population absolute deviation about the median and the population absolute deviation about the mean are 2/3. The average of all the sample absolute deviations about the mean of size 3 that can be drawn from the population is 44/81, while the average of all the sample absolute deviations about the median is 4/9. Therefore, the absolute deviation is a biased estimator.

However, this argument is based on the notion of mean-unbiasedness. Each measure of location has its own form of unbiasedness (see entry on biased estimator). The relevant form of unbiasedness here is median unbiasedness.

References

^ Taleb, Nassim Nicholas (2014). "What scientific idea is ready for retirement?". Edge. Archived from the original on 2014-01-16. Retrieved 2014-01-16.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
^ Kader, Gary (March 1999). "Means and MADS". Mathematics Teaching in the Middle School. 4 (6): 398–403. Archived from the original on 2013-05-18. Retrieved 20 February 2013.
^ Franklin, Christine, Gary Kader, Denise Mewborn, Jerry Moreno, Roxy Peck, Mike Perry, and Richard Scheaffer (2007). Guidelines for Assessment and Instruction in Statistics Education (PDF). American Statistical Association. ISBN 978-0-9791747-1-1. Archived (PDF) from the original on 2013-03-07. Retrieved 2013-02-20.{{cite book}}: CS1 maint: multiple names: authors list (link)
^ Nahmias, Steven; Olsen, Tava Lennon (2015), Production and Operations Analysis (7th ed.), Waveland Press, p. 62, ISBN 9781478628248, MAD is often the preferred method of measuring the forecast error because it does not require squaring.
^ Stadtler, Hartmut; Kilger, Christoph; Meyr, Herbert, eds. (2014), Supply Chain Management and Advanced Planning: Concepts, Models, Software, and Case Studies, Springer Texts in Business and Economics (5th ed.), Springer, p. 143, ISBN 9783642553097, the meaning of the MAD is easier to interpret.
^ Geary, R. C. (1935). The ratio of the mean deviation to the standard deviation as a test of normality. Biometrika, 27(3/4), 310–332.
^ See also Geary's 1936 and 1946 papers: Geary, R. C. (1936). Moments of the ratio of the mean deviation to the standard deviation for normal samples. Biometrika, 28(3/4), 295–307 and Geary, R. C. (1947). Testing for normality. Biometrika, 34(3/4), 209–242.

External links

Advantages of the mean absolute deviation

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging