To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Nonparametric statistics

From Wikipedia, the free encyclopedia

Nonparametric statistics is a type of statistical analysis that makes minimal assumptions about the underlying distribution of the data being studied. Often these models are infinite-dimensional, rather than finite dimensional, as is parametric statistics.[1] Nonparametric statistics can be used for descriptive statistics or statistical inference. Nonparametric tests are often used when the assumptions of parametric tests are evidently violated.[2]

YouTube Encyclopedic

  • 1/3
    Views:
    47 222
    80 163
    16 909
  • 1 Non-Parametric - An Introduction
  • Parametric and Nonparametric Statistical Tests
  • Module 22: Non-Parametric Methods (Sign Test)

Transcription

Good morning. We are about to start a review of non-parametric analysis, and we will provide an introduction. I want you to remember that statistics can be made to prove anything, even the truth. You remember that. Now, do you remember the normal distribution? I want you to think about that as we begin this journey into the world of non-parametric analysis. Here's a little normal distribution curve. You would recognize immediately that this is the standard normal distribution curve. You see that z-score? And then we have so many standard deviations on each side. And it is just beautifully symmetrical, and just fits right where it's supposed to. Isn't the normal distribution a beautiful thing? One could almost wish that everything was normally distributed. Well, not me. I think there's some things that are better that they're not normally distributed. Let's look at some of the things that are assumed to be normally distributed. Galaxies. Some statisticians have said that all large data sets are normally distributed. I don't really know how they describe large, but I would assume that the number of galaxies would indeed be a large data set. Tip my coffee to that. Stars in the galaxies and the weight of stars might be normally distributed. The color of stars might be normally distributed. People can be normally distributed. What is the average height of people? What's the average shoe size? All of those things are deemed to be normally distributed. Star fish can be normally distributed. What's the average weight of a star fish, of an adult star fish? Aardvarks can be normally distributed. Banded aardvarks can be normally distributed. And the Hungarian banded aardvarks. And the albino Hungarian banded aardvark can be normally distributed. Those of you that took me in the first quantitative class understand what I'm doing there. That I have a fascination with the albino Hungarian banded aardvark that has infested Texas. Pretty interesting. There they are. The albino Hungarian banded aardvarks. A lot of normal distribution sets included with those little creatures. Would you want your grades to be normally distributed? Many years ago, I took a class with a faculty who came in and said, the grades in this class will be normally distributed. And I want you to look at that just a minute. Do you like getting a C? Well, under the normal distribution there's a certain percentage. What, 64%, 68%, who knows, that are normally distributed. From B to D is what? 96% normally distributed? Only 2% A's and 2% F's. I don't know if I like that or not. This faculty member thought he was very clever and knew a lot about statistics. Because he assumed that grades were normally distributed, and he was going to say that his were. In a class of pretty good size, that might be troubling. 68% are going to get C's? 14% B's, 14% D's. Only 2% get A's and 2% get F's. Well, the 2% get F's might be OK. But what if nobody actually failed? Are you going to give 2% of them an F anyway? Well, if you're going to make your grades be normally distributed, that's what you're going to do. In a class of 20, that means there will be no A's, no F's, three B's, three D's, and 14 C's. How do you feel about that? Do you really want everything to be normally distributed? Well, I want you to consider a new type of distribution. This is a kind of square distribution. And in this distribution, the bulk of the data may be out on this side, trailing off on that side. Or the bulk of the data may be out here, trailing off on that side. Grades are actually a chi-square distribution. Distribution can be skewed to the left or to the right. A rejection region can be on either end. Here, I put the rejection region out on this side. And the rejection region in the one-tailed is, say, 5%. If the data fall out here, we'll reject it. The 5% could be over on this side if we chose to. This is called a chi-squared distribution. This is the Greek letter chi. And that's the squared. The chi-squared distribution. And it uses a s-statistic to determine significance. The chi-squared distribution can consider areas on either end. Your rejection region could be here. Your rejection region could be there. This is an example of a two-tailed test where we have a rejection of alpha. And we want a significance of alpha. So we divide our significance in half and put half of it here and half of it there. If it was 5%, that would be 2.5% here and 2.5% there. And this is the null hypothesis is not rejected here. And of course, if the null hypothesis falls in either end of that, then it would be rejected. This is an example of a two-tailed test. There are some great and profound truths about the distributions that you need to know. First of all, Likert scale are generally not normally distributed. You know, you go out and you, on a scale of five to one, with five being the most satisfied and one being the least satisfied, rate your satisfaction with this instructor. Hmm. A lot of you will say, better be nice. Might have him again. Well, don't ever answer that way. Those are anonymous. But most people say, well, he was great. Five. And others say, well, didn't like him a little bit. Four. Some say, well, he's really scummy. I'll give him a three. And then there's somebody out there, generally the one you tried to help the most, who appreciated it the least, who says, well that sucker's going to get a one from me. Well, I want you to think about that distribution. Those are Likert scales. And they are generally more chi-square distributed, much more so than they would ever be normally distributed. Likert scale data are usually chi-square distributed. They're not normally distributed. And many, many distributions take this form. That's why this form is so important. I want you to consider, though, that if you go out and you randomly select a large data set, and you're looking for a chi-square distribution or something that's skewed far to the left or skewed far to the right, what is interesting in that is that the likelihood is everything you randomly pick will be normally distributed. That's just your luck. But we're going to play with this some. Again, I want to thank you very much for your support. This has been fun. We've come a long ways through this course. We started down with the introductory terms. We moved through correlation factor analysis. We went into linear regression, multiple linear regression. We did a t-Test. We did ANOVA. And we did MANOVA. And now we're in non-parametric design. You've done well to come so far, Pilgrim, with so much hair with so many after it. May the odds be ever in your favor.

Definitions

The term "nonparametric statistics" has been defined imprecisely in the following two ways, among others:

  1. The first meaning of nonparametric involves techniques that do not rely on data belonging to any particular parametric family of probability distributions.

    These include, among others:

    • Methods which are distribution-free, which do not rely on assumptions that the data are drawn from a given parametric family of probability distributions.
    • Statistics defined to be a function on a sample, without dependency on a parameter.

    An example is Order statistics, which are based on ordinal ranking of observations.

    The discussion following is taken from Kendall's Advanced Theory of Statistics.[3]

    Statistical hypotheses concern the behavior of observable random variables.... For example, the hypothesis (a) that a normal distribution has a specified mean and variance is statistical; so is the hypothesis (b) that it has a given mean but unspecified variance; so is the hypothesis (c) that a distribution is of normal form with both mean and variance unspecified; finally, so is the hypothesis (d) that two unspecified continuous distributions are identical.

    It will have been noticed that in the examples (a) and (b) the distribution underlying the observations was taken to be of a certain form (the normal) and the hypothesis was concerned entirely with the value of one or both of its parameters. Such a hypothesis, for obvious reasons, is called parametric.

    Hypothesis (c) was of a different nature, as no parameter values are specified in the statement of the hypothesis; we might reasonably call such a hypothesis non-parametric. Hypothesis (d) is also non-parametric but, in addition, it does not even specify the underlying form of the distribution and may now be reasonably termed distribution-free. Notwithstanding these distinctions, the statistical literature now commonly applies the label "non-parametric" to test procedures that we have just termed "distribution-free", thereby losing a useful classification.

  2. The second meaning of non-parametric involves techniques that do not assume that the structure of a model is fixed. Typically, the model grows in size to accommodate the complexity of the data. In these techniques, individual variables are typically assumed to belong to parametric distributions, and assumptions about the types of associations among variables are also made. These techniques include, among others:
    • non-parametric regression, which is modeling whereby the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals.
    • non-parametric hierarchical Bayesian models, such as models based on the Dirichlet process, which allow the number of latent variables to grow as necessary to fit the data, but where individual variables still follow parametric distributions and even the process controlling the rate of growth of latent variables follows a parametric distribution.

Applications and purpose

Non-parametric methods are widely used for studying populations that have a ranked order (such as movie reviews receiving one to four "stars"). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences. In terms of levels of measurement, non-parametric methods result in ordinal data.

As non-parametric methods make fewer assumptions, their applicability is much more general than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust.

Non-parametric methods are sometimes considered simpler to use and more robust than parametric methods, even when the assumptions of parametric methods are justified. This is due to their more general nature, which may make them less susceptible to misuse and misunderstanding. Non-parametric methods can be considered a conservative choice, as they will work even when their assumptions are not met, whereas parametric methods can produce misleading results when their assumptions are violated.

The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test's assumptions are met, non-parametric tests have less statistical power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence.

Non-parametric models

Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term non-parametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

Methods

Non-parametric (or distribution-free) inferential statistical methods are mathematical procedures for statistical hypothesis testing which, unlike parametric statistics, make no assumptions about the probability distributions of the variables being assessed. The most frequently used tests include

History

Early nonparametric statistics include the median (13th century or earlier, use in estimation by Edward Wright, 1599; see Median § History) and the sign test by John Arbuthnot (1710) in analyzing the human sex ratio at birth (see Sign test § History).[4][5]

See also

Notes

  1. ^ "All of Nonparametric Statistics". Springer Texts in Statistics. 2006. doi:10.1007/0-387-30623-4.
  2. ^ Pearce, J; Derrick, B (2019). "Preliminary testing: The devil of statistics?". Reinvention: An International Journal of Undergraduate Research. 12 (2). doi:10.31273/reinvention.v12i2.339.
  3. ^ Stuart A., Ord J.K, Arnold S. (1999), Kendall's Advanced Theory of Statistics: Volume 2A—Classical Inference and the Linear Model, sixth edition, §20.2–20.3 (Arnold).
  4. ^ Conover, W.J. (1999), "Chapter 3.4: The Sign Test", Practical Nonparametric Statistics (Third ed.), Wiley, pp. 157–176, ISBN 0-471-16068-7
  5. ^ Sprent, P. (1989), Applied Nonparametric Statistical Methods (Second ed.), Chapman & Hall, ISBN 0-412-44980-3

General references

This page was last edited on 31 March 2024, at 13:32
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.