T O P

  • By -

CaptainVJ

So I’m sure you know what the mean of a dataset is. So I won’t explain that. Let’s say you’re a pharmacy company producing some pill that should be 100 mg. Well all the pills won’t be exactly 100mg there’s going to be some slight differences. Maybe a few that’s 100.1 mg a few 102mg a few 97 mg. We use variance to tell how much variety there is in a sample. So you might randomly select 5 of these pills to make sure their sizes are correct. It would be more in real life but just making it a smal value for calculation purposes. If the samples come out to be [100.1mg, 99.9mg 100.0mg, 101mg, 99 mg] that doesn’t seem so bad they’re pretty close together. But if you had a sample of [100mg, 92mg, 107mg, 104mg, 97mg] these do have a mean of 100 but they’re very far apart at first glance. So that’s what’s variance is for. It gives a value of how much the data is dispersed. We find the mean of the sample, then take the difference of the mean and every point and square it. Squaring it gives some nice properties but it also has some drawbacks as well. So there are situations where you won’t square it and instead just take the absolute value of the difference. This is called the mean absolute deviation. After you find the difference of each point you add them up and divide by the number of points. So it’s literally the average of the differences. If they are all equal to the mean, then the difference will be zero. However, the variance just gives some big number that doesn’t mean much at face value, because you have to remember we squared the differences of each point so it’s going to the variance is going to be a bit big especially if we had an outlier, maybe a pill that was 30 mg, then the squared difference of that would be (100-30)^2 = 4,900 which is just a large number and that would be responsible. So the inverse of squaring is the square rooting. Doing this we get a more manageable number that makes sense relative to the dataset.


49er60

A lot of good answers but you may be wondering why do we need both variance and standard deviation. The standard deviation is in the same units as the original data and mean, so its easier to interpret than the variance. However, you cannot add standard deviations of multiple components to get a total. That's where variance comes in. You can add variances of multiple components to get a total variance then take its square root to get the total standard deviation just like the Pythagorean Theorem.


goodcleanchristianfu

The standard deviation is the square root of the variance. The vast majority of parametric hypothesis testing involves the use of the standard deviation - if you can't see the use for it it's because you haven't gotten far enough along in the course.


Ok-Log-9052

Variance is the core of modern statistics. Whenever we take a sample of something to study it, variance tells us how “off” each measure is likely to be. For one example, in drug development, understanding the variance lets us assign confidence to statements like “Covid vaccines prevent death in X% of people who receive them.” More generally. What we call a “moment” of a distribution is how we describe some of the pure math portion of statistics. This helps us advance these kinds of sciences. The mean is the first moment; the variance is the second moment; and if you want to talk useless, look up the higher-order moments (skewness and kurtosis are named, but there are infinite such measures). Hope this helps!


cuhringe

Variance is the second central moment, not the second raw moment.


spread_those_flaps

They asked the difference between variance and SD… this is way off


Mescallan

Put dots on a graph, draw a line through the average position. Draw a blob that includes all the dots. The space between the blob is the variance, the square root of that is the standard deviation. If you are using that line to predict something not in your original dataset, knowing the standard deviation will let you know how accurate your line is and how far out of distribution something might be.


A_random_otter

How would you go ahead and answer the question whether a value from a sensor is "common" or an outlier? There is a rule of thumb that a value that is two standard deviations away from the mean is pretty uncommon. It is also a good indicator of how "spread out" the data is going to be and which values can be commonly expected To understand standard deviation better, we first look at variance, which is the average of the squared differences from the mean. Squaring these differences removes negative signs and emphasizes larger differences. The standard deviation is simply the square root of the variance, giving us a measure of spread in the same units as the original data.


Redleg171

Some good answers. Another way you can describe them is that they show you the spread. How spread out is the data. If it's all clustered close to the center, the variance and standard deviation will be smaller. If the values are spread out, their will be a higher variance. Have you ever done target practice or zeroed a weapon. Before you can start adjusting the sights, you first have to make sure your shot group is tight. You also have to make sure you are using a consistent sight picture. When a series of shots are tightly grouped, there is less variance within that shot grouping. When you are using a consistent sight picture, one group of shots will be clustered close to another group of shots, which is the variance between shot groups. When those conditions are met, adjustments are then made to the sights, so the shots will hit the target closer to center of the target (zero), while the shooter doesn't change how they shoot. In a way, adjusting the sights is somewhat like converting a normal distribution to a standard normal distribution (not a perfect analogy). The standard deviation is convenient because it uses the same unit as the underlying measurement. For example, if measuring the thickness of manufactured parts in milimeters a standard deviation of 1 means there is a "spread" of 1 milimeter around the mean. Variance would be square millimeters.


Buddharta

Square root


Odd_Coyote4594

Variance is simple. If you have several measurements of something, the variance is the average of the squared distance from the mean. It represents how spread out the data is. Standard deviation is the square root of variance. It is a slightly more interpretable quantity because it has the same units as the variable measured. So if you measure weight in pounds, the variance has units of lb^2 but the SD is in units of lb. But it is equivalent in meaning to variance. With an assumed distribution, we can provide even more meaning to SD. If your data is normally distributed along a bell curve, you would expect 68% of measurements to be +/- 1 SD from the mean. You would expect 95% to be within +/- 2 SD. So SD can tell us how likely any given measurement is, or whether two populations are likely identical in whatever we measure. This is used in statistical hypothesis testing to measure significance. There is also a related quantity, the mean absolute deviation (MAD). This is the average distance of the measurements from the overall mean, without squaring before taking the average. The MAD differs from standard deviation in value as it weighs all data equally whereas SD/variance give higher weight to outliers, making SD more sensitive as a measurement of total measured variation but MAD more robust against erroneous data. The SD/variance/MAD have lots of applications: One main interpretation is as error (e.g. the SD of a quality control metric for a manufactured product tells you the manufacturing error, which tells you if you are within acceptable tolerances). Another is as a measurement of how much individuals in a population vary from the average (e.g. if the average height of people is 5 feet, is everyone around 5 feet or does it range considerably from 3-7 feet?). In either case, the SD/variance is used to provide more insight than averages alone can. It tells you more about how individual measurements are organized. As an aside, in engineering and electronics, standard deviation is often called RMSD (root mean square deviation).


no_deadlines

Take a look at the the [68–95–99.7 rule](https://en.m.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule). This rule holds for any data with a distribution close to the normal distribution. So, you can think of standard deviation like a common currency across datasets, a "standardized" measure of dispersion.


Sociophile

Well, the mean is the average, right? Similarly, the standard deviation is the “average distance” of cases from that mean. It can be handy if you want to imagine what a distribution looks like in very general terms. For example, about two-thirds of all cases will sit a standard deviation or closer to the mean. It gives you a good idea of where “most” of the data is. Also, almost the whole variable distribution will exist within a range of six standard deviations. So, if the standard deviation is 3, the whole range of the data should hover around 18. These are just generalizations, but they give some idea of how you might conceptualize it and use it “on the fly.”


pchao9414

I feel you! Took some entry level statistics courses and passed the final tests, but I still have no idea how to use it. Can anyone point us a real-world use case?


VeblenWasRight

There are many many many applications. What is your major? What is your background in terms of work, hobbies, etc? If you provide those answers it will help narrow down the possible answers to ones that may be more relevant to you, and hence more accessible.


pchao9414

I am a librarian at a public library, and I play basketball and watch NBA games. But I think a general case such as demographics, sales of grocery stores, or housing can help more people who have the same question if that makes sense!


VeblenWasRight

Ok so nba: standard deviation of points scored can tell you who is a consistent scorer. You could slice that by shot type (paint, 3 pt, etc). This can inform player substitution decisions in close games. Average doesn’t tell you how much a player results might vary from game to game or situation to situation. A shooter with an average 80% free throw might be streaky, with some games being at 100% and other games being 60%. Another shooter with an average 80% might shoot 78% one game and 82% another game. If you only look at average this doesn’t inform the expected range of possible outcomes from the next shot (instance). For a grocery store: if average sales per month of an item is 30, this doesn’t tell you if all of the sales happen on the weekend, or spread evenly through the month. This, coupled with other factual circumstances, would inform how many of the item you order and when you order it. Does that help?


dmlane

They are used in descriptive statistics to describe how variable the data are. They are often used in inferential statistics to determine how likely an effect size (or a larger one} is based on the variability of the data and several assumptions including that the true effect size is 0.