To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

From Wikipedia, the free encyclopedia

Winsorizing or winsorization is the transformation of statistics by limiting extreme values in the statistical data to reduce the effect of possibly spurious outliers. It is named after the engineer-turned-biostatistician Charles P. Winsor (1895–1951). The effect is the same as clipping in signal processing.

The distribution of many statistics can be heavily influenced by outliers. A typical strategy is to set all outliers to a specified percentile of the data; for example, a 90% winsorization would see all data below the 5th percentile set to the 5th percentile, and data above the 95th percentile set to the 95th percentile. Winsorized estimators are usually more robust to outliers than their more standard forms, although there are alternatives, such as trimming, that will achieve a similar effect.

YouTube Encyclopedic

  • 1/3
    Views:
    7 563
    22 322
    81 837
  • ✪ Dealing with an outlier - Winsorize
  • ✪ Dealing with Outliers (part 1)
  • ✪ Testing for Outliers in Excel - Single Variable Sample

Transcription

Contents

Example

Consider the data set consisting of:

{92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, −40, 101, 86, 85, 15, 89, 89, 28, −5, 41}       (N = 20, mean = 101.5)

The data below the 5th percentile lies between −40 and −5, while the data above the 95th percentile lies between 101 and 1053. (Values shown in bold.) Then a 90% winsorization would result in the following:

{92, 19, 101, 58, 101, 91, 26, 78, 10, 13, −5, 101, 86, 85, 15, 89, 89, 28, −5, 41}       (N = 20, mean = 55.65)

Python can winsorize data using NumPy and SciPy libraries :

import scipy.stats
import numpy as np
a = np.array([92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, -40, 101, 86, 85, 15, 89, 89, 28, -5, 41])
scipy.stats.mstats.winsorize(a, limits=[0.05, 0.05])

R can winsorize data using the DescTools library :

library(DescTools)
a<-c(92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, -40, 101, 86, 85, 15, 89, 89, 28, -5, 41)
DescTools::Winsorize(a, probs = c(0.05, 0.95))

Distinction from trimming

Note that winsorizing is not equivalent to simply excluding data, which is a simpler procedure, called trimming or truncation, but is a method of censoring data.

In a trimmed estimator, the extreme values are discarded; in a winsorized estimator, the extreme values are instead replaced by certain percentiles (the trimmed minimum and maximum).

Thus a winsorized mean is not the same as a truncated mean. For instance, the 10% trimmed mean is the average of the 5th to 95th percentile of the data, while the 90% winsorized mean sets the bottom 5% to the 5th percentile, the top 5% to the 95th percentile, and then averages the data. In the previous example the trimmed mean would be obtained from the smaller set:

{92, 19, 101, 58,       91, 26, 78, 10, 13,       101, 86, 85, 15, 89, 89, 28, −5, 41}       (N = 18, mean = 56.5)

In this case, the winsorized mean can equivalently be expressed as a weighted average of the truncated mean and the 5th and 95th percentiles (for the 10% winsorized mean, 0.05 times the 5th percentile, 0.9 times the 10% trimmed mean, and 0.05 times the 95th percentile) though in general winsorized statistics need not be expressible in terms of the corresponding trimmed statistic.

More formally, they are distinct because the order statistics are not independent.

See also

References

  • Hastings, Jr., Cecil; Mosteller, Frederick; Tukey, John W.; Winsor, Charles P. (1947). "Low moments for small samples: a comparative study of order statistics". Annals of Mathematical Statistics. 18 (3): 413–426. doi:10.1214/aoms/1177730388.
  • Dixon, W. J. (1960). "Simplified Estimation from Censored Normal Samples". Annals of Mathematical Statistics. 31 (2): 385–391. doi:10.1214/aoms/1177705900.
  • Tukey, J. W. (1962). "The Future of Data Analysis". Annals of Mathematical Statistics. 33 (1): 1–67 [p. 18]. doi:10.1214/aoms/1177704711. JSTOR 2237638.

External links

This page was last edited on 24 June 2019, at 18:27
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.