To install click the Add extension button. That's it.

The source code for the WIKI 2 extension is being checked by specialists of the Mozilla Foundation, Google, and Apple. You could also do it yourself at any point in time.

4,5
Kelly Slayton
Congratulations on this excellent venture… what a great idea!
Alexander Grigorievskiy
I use WIKI 2 every day and almost forgot how the original Wikipedia looks like.
Live Statistics
English Articles
Improved in 24 Hours
Added in 24 Hours
Languages
Recent
Show all languages
What we do. Every page goes through several hundred of perfecting techniques; in live mode. Quite the same Wikipedia. Just better.
.
Leo
Newton
Brights
Milds

Total sum of squares

From Wikipedia, the free encyclopedia

In statistical data analysis the total sum of squares (TSS or SST) is a quantity that appears as part of a standard way of presenting results of such analyses. For a set of observations, , it is defined as the sum over all squared differences between the observations and their overall mean .:[1]

For wide classes of linear models, the total sum of squares equals the explained sum of squares plus the residual sum of squares. For proof of this in the multivariate OLS case, see partitioning in the general OLS model.

In analysis of variance (ANOVA) the total sum of squares is the sum of the so-called "within-samples" sum of squares and "between-samples" sum of squares, i.e., partitioning of the sum of squares. In multivariate analysis of variance (MANOVA) the following equation applies[2]

where T is the total sum of squares and products (SSP) matrix, W is the within-samples SSP matrix and B is the between-samples SSP matrix. Similar terminology may also be used in linear discriminant analysis, where W and B are respectively referred to as the within-groups and between-groups SSP matrices.[2]

YouTube Encyclopedic

  • 1/3
    Views:
    835 845
    23 626
    522 695
  • ANOVA 1: Calculating SST (total sum of squares) | Probability and Statistics | Khan Academy
  • Total Sum of Squares
  • ANOVA 2: Calculating SSW and SSB (total sum of squares within and between) | Khan Academy

Transcription

In this video and the next few videos, we're just really going to be doing a bunch of calculations about this data set right over here. And hopefully, just going through those calculations will give you an intuitive sense of what the analysis of variance is all about. Now, the first thing I want to do in this video is calculate the total sum of squares. So I'll call that SST. SS-- sum of squares total. And you could view it as really the numerator when you calculate variance. So you're just going to take the distance between each of these data points and the mean of all of these data points, square them, and just take that sum. We're not going to divide by the degree of freedom, which you would normally do if you were calculating sample variance. Now, what is this going to be? Well, the first thing we need to do, we have to figure out the mean of all of this stuff over here. And I'm actually going to call that the grand mean. And I'm going to show you in a second that it's the same thing as the mean of the means of each of these data sets. So let's calculate the grand mean. So it's going to be 3 plus 2 plus 1 plus 5 plus 3 plus 4 plus 5 plus 6 plus 7. And then we have nine data points here so we'll divide by 9. And what is this going to be equal to? 3 plus 2 plus 1 is 6. 6 plus-- let me just add. So these are 6. 5 plus 3 plus 4 is 12. And then 5 plus 6 plus 7 is 18. And then 6 plus 12 is 18 plus another 18 is 36, divided by 9 is equal to 4. And let me show you that that's the exact same thing as the mean of the means. So the mean of this group 1 over here-- let me do it in that same green-- the mean of group 1 over here is 3 plus 2 plus 1. That's that 6 right over here, divided by 3 data points so that will be equal to 2. The mean of group 2, the sum here is 12. We saw that right over here. 5 plus 3 plus 4 is 12, divided by 3 is 4 because we have three data points. And then the mean of group 3, 5 plus 6 plus 7 is 18 divided by 3 is 6. So if you were to take the mean of the means, which is another way of viewing this grand mean, you have 2 plus 4 plus 6, which is 12, divided by 3 means here. And once again, you would get 4. So you could view this as the mean of all of the data in all of the groups or the mean of the means of each of these groups. But either way, now that we've calculated it, we can actually figure out the total sum of squares. So let's do that. So it's going to be equal to 3 minus 4-- the 4 is this 4 right over here-- squared plus 2 minus 4 squared plus 1 minus 4 squared. Now, I'll do these guys over here in purple. Plus 5 minus 4 squared plus 3 minus 4 squared plus 4 minus 4 squared. Let me scroll over a little bit. Now, we only have three left, plus 5 minus 4 squared plus 6 minus 4 squared plus 7 minus 4 squared. And what does this give us? So up here, this is going to be equal to 3 minus 4. Difference is 1. You square it. It's actually negative 1, but you square it, you get 1, plus you get negative 2 squared is 4, plus negative 3 squared. Negative 3 squared is 9. And then we have here in the magenta 5 minus 4 is 1 squared is still 1. 3 minus 4 squared is 1. You square it again, you still get 1. And then 4 minus 4 is just 0. So we could-- well, I'll just write the 0 there just to show you that we actually calculated that. And then we have these last three data points. 5 minus 4 squared. That's 1. 6 minus 4 squared. That is 4, right? That's 2 squared. And then plus 7 minus 4 is 3 squared is 9. So what's this going to be equal to? So I have 1 plus 4 plus 9 right over here. That's 5 plus 9. This right over here is 14, right? 5 plus-- yup, 14. And then we also have another 14 right over here because we have a 1 plus 4 plus 9. So that right over there is also 14. And then we have 2 over here. So it's going to be 28-- 14 times 2, 14 plus 14 is 28-- plus 2 is 30. Is equal to 30. So our total sum of squares-- and actually, if we wanted the variance here, we would divide this by the degrees of freedom. And we've learned multiple times the degrees of freedom here so let's say that we have-- so we know that we have m groups over here. So let me just write it as m and I'm not going to prove things rigorously here, but I want to show you where some of these strange formulas that show up in statistics books actually come from without proving it rigorously. More to give you the intuition. So we have m groups here. And each group here has n members. So how many total members do we have here? Well, we had m times n or 9, right? 3 times 3 total members. So our degrees of freedom-- and remember, you have however many data points you had minus 1 degrees of freedom because if you know the mean of means, if you assume you knew that, then only 9 minus 1, only eight of these are going to give you new information because if you know that, you could calculate the last one. Or it really doesn't have to be the last one. If you have the other eight, you could calculate this one. If you have eight of them, you could always calculate the ninth one using the mean of means. So one way to think about it is that there's only eight independent measurements here. Or if we want to talk generally, there are m times n-- so that tells us the total number of samples-- minus 1 degrees of freedom. And if we were actually calculating the variance here, we would just divide 30 by m times n minus 1 or this is another way of saying eight degrees of freedom for this exact example. We would take 30 divided by 8 and we would actually have the variance for this entire group, for the group of nine when you combine them. I'll leave you here in this video. In the next video, we're going to try to figure out how much of this total variance, how much of this total squared sum, total variation comes from the variation within each of these groups versus the variation between the groups. And I think you get a sense of where this whole analysis of variance is coming from. It's the sense that, look, there's a variance of this entire sample of nine, but some of that variance-- if these groups are different in some way-- might come from the variation from being in different groups versus the variation from being within a group. And we're going to calculate those two things and we're going to see that they're going to add up to the total squared sum variation.

See also

References

  1. ^ Everitt, B.S. (2002) The Cambridge Dictionary of Statistics, CUP, ISBN 0-521-81099-X
  2. ^ a b K. V. Mardia, J. T. Kent and J. M. Bibby (1979). Multivariate Analysis. Academic Press. ISBN 0-12-471252-5. Especially chapters 11 and 12.
This page was last edited on 12 November 2022, at 01:20
Basis of this page is in Wikipedia. Text is available under the CC BY-SA 3.0 Unported License. Non-text media are available under their specified licenses. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc. WIKI 2 is an independent company and has no affiliation with Wikimedia Foundation.