What Is Variance?
Variance is a statistical measure that quantifies how spread out a set of data values is from their mean (average). A low variance indicates that data points tend to be close to the mean, while a high variance indicates that data points are spread out over a wider range. Mathematically, variance is the average of the squared differences from the mean: σ² = Σ(xᵢ − μ)² / N for a population, or s² = Σ(xᵢ − x̄)² / (n−1) for a sample.
Variance is one of the most fundamental concepts in statistics and probability theory. It forms the basis for standard deviation, confidence intervals, hypothesis testing, and analysis of variance (ANOVA). In finance, variance measures investment volatility. In manufacturing, it quantifies process consistency. In science, it indicates measurement precision. Understanding variance is essential for anyone working with data.
Population vs. Sample Variance
Population variance (σ²) is used when you have data for every member of the entire population. The denominator is N, the total population size. Sample variance (s²) is used when you have a subset (sample) of the population. The denominator is n−1 (called Bessel's correction), which compensates for the fact that a sample tends to underestimate the true population variance.
In practice, sample variance is much more common because we rarely have access to entire populations. When analyzing survey data, experimental results, or any finite dataset drawn from a larger population, use sample variance (n−1). Use population variance only when your data represents the complete set of values you care about — for example, all test scores from a specific class, or all sales from a specific quarter.
Variance and Standard Deviation
Standard deviation is simply the square root of variance: σ = √σ² or s = √s². While variance measures spread in squared units (making it harder to interpret directly), standard deviation is in the same units as the original data, making it more intuitive. For example, if test scores have a variance of 100, the standard deviation is 10 points.
The empirical rule (68-95-99.7 rule) states that for normally distributed data: approximately 68% of values fall within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations. This makes standard deviation a powerful tool for understanding the distribution of data and identifying outliers.
Step-by-Step Calculation
To calculate variance: (1) Find the mean of your data by summing all values and dividing by the count. (2) For each data point, subtract the mean and square the result. (3) Sum all the squared differences. (4) Divide by N (population) or n−1 (sample). The result is the variance. Take the square root for the standard deviation.
Example: Data = {4, 8, 6, 5, 3, 7, 8, 9}. Mean = 50/8 = 6.25. Squared differences: (4−6.25)²=5.0625, (8−6.25)²=3.0625, (6−6.25)²=0.0625, (5−6.25)²=1.5625, (3−6.25)²=10.5625, (7−6.25)²=0.5625, (8−6.25)²=3.0625, (9−6.25)²=7.5625. Sum = 31.5. Population variance = 31.5/8 = 3.9375. Sample variance = 31.5/7 = 4.5.
Applications of Variance
Finance: Variance of investment returns measures volatility (risk). Higher variance means more unpredictable returns. Portfolio diversification aims to reduce overall variance by combining assets with low correlation.
Quality control: Manufacturing processes use variance to ensure products meet specifications. Low variance indicates consistent production. Statistical process control (SPC) charts monitor variance over time to detect process changes.
Research: Variance is central to statistical inference. Analysis of Variance (ANOVA) compares means across groups by analyzing their variances. T-tests, F-tests, and chi-square tests all use variance in their calculations.
Machine learning: In algorithms like PCA, variance determines which features carry the most information. Bias-variance tradeoff is a fundamental concept in model selection and tuning.