Questions
4–6 questions per exam
Difficulty
Medium
Importance
Key for Class 12 Boards and competitive statistics sections
Overview
Correlation is a statistical measure that quantifies the strength and direction of the linear relationship between two variables. It is a cornerstone topic for board examinations as it connects data analysis with algebraic computation. Mastery involves understanding both the visual representation through diagrams and the precise quantification via Pearson's and Spearman's coefficients.
Scatter Diagram
A scatter diagram provides a graphical representation of the relationship between two variables plotted on an X-Y plane. It is used as the first step to visualize whether the correlation is positive, negative, or non-existent.
- Upward slope indicates positive correlation
- Downward slope indicates negative correlation
- Points scattered randomly indicate zero correlation
- Linear patterns suggest consistent rate of change
Karl Pearson’s Coefficient of Correlation
Pearson's r measures the degree of linear association between two quantitative variables. The coefficient always ranges between -1 and +1, where -1 represents a perfect inverse relationship and +1 a perfect direct relationship.
- Formula: r = Cov(X,Y) / (σx * σy)
- r = Σ((x-x̄)(y-ȳ)) / sqrt(Σ(x-x̄)² * Σ(y-ȳ)²)
- Value of 0 signifies no linear relationship
- Highly sensitive to outliers
- Unit-free measurement
Spearman’s Rank Correlation
When data is qualitative or contains extreme outliers, rank correlation is preferred over Pearson's. It relies on the ranks assigned to data points rather than the actual values of the variables.
- Formula: R = 1 - (6 * ΣD²) / (n * (n² - 1))
- D is the difference between ranks of two variables
- Used when data is ordinal in nature
- Modified formula required if ranks are tied
Formula Sheet
Pearson r: Σ((x-x̄)(y-ȳ)) / sqrt(Σ(x-x̄)² * Σ(y-ȳ)²)
Spearman ρ: 1 - (6 * ΣD²) / (n³ - n)
Exam Tip
Always verify that your calculated r value stays between -1 and 1; any result outside this range is a definitive signal of an arithmetic error in your summation or standard deviation calculation.
Common Mistakes
- Interpreting a correlation coefficient of 0 as no relationship at all, rather than specifically no linear relationship.
- Forgetting to apply the 'tied rank' correction factor in Spearman's method when duplicate values appear in the dataset.
- Mistaking correlation for causation; assuming that because two variables move together, one causes the other.
More Revision Notes
Ready to test yourself?
Play topic-wise Correlation questions in Aspirant Arcade — gamified MCQ practice.
Download Free