To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.

One- and two-tailed tests

From Wikipedia, the free encyclopedia

A two-tailed test applied to the normal distribution.
A two-tailed test applied to the normal distribution.
A one-tailed test, showing the p-value as the size of one tail.
A one-tailed test, showing the p-value as the size of one tail.

In statistical significance testing, a one-tailed test and a two-tailed test are alternative ways of computing the statistical significance of a parameter inferred from a data set, in terms of a test statistic. A two-tailed test is appropriate if the estimated value may be more than or less than the reference value, for example, whether a test taker may score above or below the historical average. A one-tailed test is appropriate if the estimated value may depart from the reference value in only one direction, for example, whether a machine produces more than one-percent defective products. Alternative names are one-sided and two-sided tests; the terminology "tail" is used because the extreme portions of distributions, where observations lead to rejection of the null hypothesis, are small and often "tail off" toward zero as in the normal distribution or "bell curve", pictured on the right.

YouTube Encyclopedic

  • 1/5
    758 719
    227 152
    82 522
    64 761
    91 697
  • ✪ One-tailed and two-tailed tests | Inferential statistics | Probability and Statistics | Khan Academy
  • ✪ How to calculate One Tail and Two Tail Tests For Hypothesis Testing.
  • ✪ One-Sided Test or Two-Sided Test?
  • ✪ One-Tailed and Two-Tailed Tests
  • ✪ Hypothesis Testing - one tailed 't' disribution


In the last video, our null hypothesis was the drug had no effect. And our alternative hypothesis was that the drug just has an effect. We didn't say whether the drug would lower the response time or raise the response time. We just said the drug had an effect, that the mean when you have the drug will not be the same thing as the population mean. And then the null hypothesis says no, your mean with the drug's going to be the same thing as the population mean, it has no effect. In this situation where we're really just testing to see if it had an effect, whether an extreme positive effect, or an extreme negative effect, would have both been considered an effect. We did something called a two-tailed test. This is called eight two-tailed test. Because frankly, a super high response time, if you had a response time that was more than 3 standard deviations, that would've also made us likely to reject the null hypothesis. So we were dealing with kind of both tails. You could have done a similar type of hypothesis test with the same experiment where you only had a one-tailed test. And the way we could have done that is we still could have had the null hypothesis be that the drug has no effect. Or that the mean with the drug-- the mean, and maybe I could say the mean with the drug-- is still going to be 1.2 seconds, our mean response time. Now if we wanted to do a one-tailed test, but for some reason we already had maybe a view that this drug would lower response times, then our alternative hypothesis-- and just so you get familiar with different types of notation, some books or teachers will write the alternative hypothesis as H1, sometimes they write it as H alternative, either one is fine. If you want to do one-tailed test, you could say that the drug lowers response time. Or that the mean with the drug is less than 1.2 seconds. Now if you do a one-tailed test like this, what we're thinking about is, what we want to look at is, all right, we have our sampling distribution. Actually, I can just use the drawing that I had up here. You had your sampling distribution of the sample mean. We know what the mean of that was, it's 1.2 seconds, same as the population mean. We were able to estimate its standard deviation using our sample standard deviation, and that was reasonable because it had a sample size of greater than 30, so we can still kind of deal with a normal distribution for the sampling distribution. And using that we saw that the result, the sample mean that we got, the 1.05 seconds, is 3 standard deviations below the mean. So if we look at it-- let me just re-draw it with our new hypothesis test. So this is the sampling distribution. It has a mean right over here at 1.2 seconds. And the result we got was 3 standard deviations below the mean. 1, 2, 3 standard deviations below the mean. That was what our 1.05 seconds were. So when you set it up like this where you're not just saying that the drug has an effect-- in that case, and that was the last view, you'd look at both tails. But here we're saying we only care is does the drug lower our response time? And just like we did before, you say OK, let's say the drug doesn't lower our response time. If the drug doesn't lower our response time, what was the probability or what is the probability of getting a lowering this extreme or more extreme? So here it will only be one of the tails that we could consider when we set our alternative hypothesis like that, that we think it lowers. So if our null hypothesis is true, the probability of getting a result more extreme than 1.05 seconds, now we are only considering this tail right over here. Let me just put it this way. More extreme than 1.05 seconds, or let me say, lower. Because in the last video we cared about more extreme because even a really high result would have said, OK, the mean's definitely not 1.2 seconds. But in this case we care about means that are lower. So now we care about the probability of a result lower than 1.05 seconds. That's the same thing as sampling-- of getting a sample from the sampling distribution that's more than 3 standard deviations below the mean. And in this case, we're only going to consider the area in this one tail. So this right here would be a one-tailed test where we only care about one direction below the mean. If you look at the one-tailed test-- this area over here-- we saw last time that both of these areas combined are 0.3%. But if you're only considering one of these areas, if you're only considering this one over here it's going to be half of that, because the normal distribution is symmetric. So it's going to the 0.13%. So this one right here is going to be 0.15%, or if you express it as a decimal, this is going to be 0.0015. So once again, if you set up your hypotheses like this, you would have said, if your null hypothesis is correct, there would have only been a 0.15% chance of getting a result lower than the result we got. So that would be very unlikely, so we will reject the null hypothesis and go with the alternative. And in this situation your P-value is going to be the 0.0015.



One-tailed tests are used for asymmetric distributions that have a single tail, such as the chi-squared distribution, which are common in measuring goodness-of-fit, or for one side of a distribution that has two tails, such as the normal distribution, which is common in estimating location; this corresponds to specifying a direction. Two-tailed tests are only applicable when there are two tails, such as in the normal distribution, and correspond to considering either direction significant.[1][2][3]

In the approach of Ronald Fisher, the null hypothesis H0 will be rejected when the p-value of the test statistic is sufficiently extreme (vis-a-vis the test statistic's sampling distribution) and thus judged unlikely to be the result of chance. In a one-tailed test, "extreme" is decided beforehand as either meaning "sufficiently small" or meaning "sufficiently large" – values in the other direction are considered not significant. In a two-tailed test, "extreme" means "either sufficiently small or sufficiently large", and values in either direction are considered significant.[4] For a given test statistic there is a single two-tailed test, and two one-tailed tests, one each for either direction. Given data of a given significance level in a two-tailed test for a test statistic, in the corresponding one-tailed tests for the same test statistic it will be considered either twice as significant (half the p-value), if the data is in the direction specified by the test, or not significant at all (p-value above 0.05), if the data is in the direction opposite that specified by the test.

For example, if flipping a coin, testing whether it is biased towards heads is a one-tailed test, and getting data of "all heads" would be seen as highly significant, while getting data of "all tails" would be not significant at all (p = 1). By contrast, testing whether it is biased in either direction is a two-tailed test, and either "all heads" or "all tails" would both be seen as highly significant data. In medical testing, while one is generally interested in whether a treatment results in outcomes that are better than chance, thus suggesting a one-tailed test; a worse outcome is also interesting for the scientific field, therefore one should use a two-tailed test that corresponds instead to testing whether the treatment results in outcomes that are different from chance, either better or worse.[5] In the archetypal lady tasting tea experiment, Fisher tested whether the lady in question was better than chance at distinguishing two types of tea preparation, not whether her ability was different from chance, and thus he used a one-tailed test.

Coin flipping example

In coin flipping, the null hypothesis is a sequence of Bernoulli trials with probability 0.5, yielding a random variable X which is 1 for heads and 0 for tails, and a common test statistic is the sample mean (of the number of heads) If testing for whether the coin is biased towards heads, a one-tailed test would be used – only large numbers of heads would be significant. In that case a data set of five heads (HHHHH), with sample mean of 1, has a chance of occurring, (5 consecutive flips with 2 outcomes - ((1/2)^5 =1/32), and thus would have and would be significant (rejecting the null hypothesis) if using 0.05 as the cutoff. However, if testing for whether the coin is biased towards heads or tails, a two-tailed test would be used, and a data set of five heads (sample mean 1) is as extreme as a data set of five tails (sample mean 0), so the p-value would be and this would not be significant (not rejecting the null hypothesis) if using 0.05 as the cutoff.


p-value of chi-squared distribution for different number of degrees of freedom
p-value of chi-squared distribution for different number of degrees of freedom

The p-value was introduced by Karl Pearson[6] in the Pearson's chi-squared test, where he defined P (original notation) as the probability that the statistic would be at or above a given level. This is a one-tailed definition, and the chi-squared distribution is asymmetric, only assuming positive or zero values, and has only one tail, the upper one. It measures goodness of fit of data with a theoretical distribution, with zero corresponding to exact agreement with the theoretical distribution; the p-value thus measures how likely the fit would be this bad or worse.

Normal distribution, showing two tails
Normal distribution, showing two tails

The distinction between one-tailed and two-tailed tests was popularized by Ronald Fisher in the influential book Statistical Methods for Research Workers[7], where he applied it especially to the normal distribution, which is a symmetric distribution with two equal tails. The normal distribution is a common measure of location, rather than goodness-of-fit, and has two tails, corresponding to the estimate of location being above or below the theoretical location (e.g., sample mean compared with theoretical mean). In the case of a symmetric distribution such as the normal distribution, the one-tailed p-value is exactly half the two-tailed p-value:[7]

Some confusion is sometimes introduced by the fact that in some cases we wish to know the probability that the deviation, known to be positive, shall exceed an observed value, whereas in other cases the probability required is that a deviation, which is equally frequently positive and negative, shall exceed an observed value; the latter probability is always half the former.

Fisher emphasized the importance of measuring the tail – the observed value of the test statistic and all more extreme – rather than simply the probability of specific outcome itself, in his The Design of Experiments (1935).[8] He explains this as because a specific set of data may be unlikely (in the null hypothesis), but more extreme outcomes likely, so seen in this light, the specific but not extreme unlikely data should not be considered significant.

Specific tests

If the test statistic follows a Student's t-distribution in the null hypothesis – which is common where the underlying variable follows a normal distribution with unknown scaling factor, then the test is referred to as a one-tailed or two-tailed t-test. If the test is performed using the actual population mean and variance, rather than an estimate from a sample, it would be called a one-tailed or two-tailed Z-test.

The statistical tables for t and for Z provide critical values for both one- and two-tailed tests. That is, they provide the critical values that cut off an entire region at one or the other end of the sampling distribution as well as the critical values that cut off the regions (of half the size) at both ends of the sampling distribution.

See also


  1. ^ Kock, N. (2015). One-tailed or two-tailed P values in PLS-SEM? International Journal of e-Collaboration, 11(2), 1-7.
  2. ^ Mundry, R.; Fischer, J. (1998). "Use of Statistical Programs for Nonparametric Tests of Small Samples Often Leads to Incorrect P Values: Examples from Animal Behaviour". Animal Behaviour. 56 (1): 256–259. doi:10.1006/anbe.1998.0756.
  3. ^ Pillemer, D. B. (1991). "One-versus two-tailed hypothesis tests in contemporary educational research". Educational Researcher. 20 (9): 13–17. doi:10.3102/0013189X020009013.
  4. ^ John E. Freund, (1984) Modern Elementary Statistics, sixth edition. Prentice hall. ISBN 0-13-593525-3 (Section "Inferences about Means", chapter "Significance Tests", page 289.)
  5. ^ J M Bland, D G Bland (BMJ, 1994) Statistics Notes: One and two sided tests of significance
  6. ^ Pearson, Karl (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling" (PDF). Philosophical Magazine. Series 5. 50 (302): 157–175. doi:10.1080/14786440009463897.
  7. ^ a b Fisher, Ronald (1925). Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd. ISBN 0-05-002170-2.
  8. ^ Fisher, Ronald A. (1971) [1935]. The Design of Experiments (9th ed.). Macmillan. ISBN 0-02-844690-9.
This page was last edited on 25 November 2018, at 23:27
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.