Homework 5
Theory
The Cauchy-Schwarz Inequality
Statement and Significance
The Cauchy-Schwarz inequality is one of the most fundamental results in mathematics, with profound applications in statistics, particularly in understanding correlation coefficients. For real numbers $a_1, a_2, \ldots, a_n$ and $b_1, b_2, \ldots, b_n$:
\[\left(\sum_{i=1}^{n} a_i b_i\right)^2 \leq \left(\sum_{i=1}^{n} a_i^2\right)\left(\sum_{i=1}^{n} b_i^2\right)\]This inequality establishes that the absolute value of the inner product of two vectors is at most the product of their magnitudes, which directly explains why correlation coefficients are bounded between -1 and 1.
Proof via Quadratic Function Method
The most elegant proof constructs a quadratic function that must be non-negative, then applies the discriminant condition.
Proof: Consider the quadratic function in $t$: \(f(t) = \sum_{i=1}^{n} (a_i - t b_i)^2\) Since this represents a sum of squares, we have $f(t) \geq 0$ for all real $t$. Expanding the quadratic:
\[f(t) = \sum_{i=1}^{n} (a_i^2 - 2ta_ib_i + t^2b_i^2)\] \[= \sum_{i=1}^{n} a_i^2 - 2t\sum_{i=1}^{n} a_ib_i + t^2\sum_{i=1}^{n} b_i^2\]Let $A = \sum_{i=1}^{n} a_i^2$, $B = \sum_{i=1}^{n} b_i^2$, and $C = \sum_{i=1}^{n} a_ib_i$.
Then: $f(t) = A - 2Ct + Bt^2$
Since $f(t) \geq 0$ for all $t$, this quadratic has at most one real root, which means its discriminant is non-positive:
\[\Delta = (2C)^2 - 4AB \leq 0\] \[4C^2 \leq 4AB\] \[C^2 \leq AB\]Therefore:
\[\left(\sum_{i=1}^{n} a_i b_i\right)^2 \leq \left(\sum_{i=1}^{n} a_i^2\right)\left(\sum_{i=1}^{n} b_i^2\right)\]Connection to Correlation Coefficient
The correlation coefficient is defined as:
\[r = \frac{\sum_{i=1}^{n} (a_i - \bar{a})(b_i - \bar{b})}{\sqrt{\sum_{i=1}^{n} (a_i - \bar{a})^2 \sum_{i=1}^{n} (b_i - \bar{b})^2}}\]By applying Cauchy-Schwarz to the centered variables $(a_i - \bar{a})$ and $(b_i - \bar{b})$, we establish that $|r| \leq 1$. The denominator serves as the normalizing factor that ensures the correlation coefficient remains bounded, making it a standardized measure of linear association.
Independence vs Uncorrelatedness
Fundamental Conceptual Differences
Independence is a comprehensive probabilistic concept describing the complete absence of any relationship between random variables. Two random variables $X$ and $Y$ are independent if knowledge of one provides no information about the other. Formally: \(P(X \in A, Y \in B) = P(X \in A)P(Y \in B)\) for all measurable sets $A$ and $B$.
Uncorrelatedness is a more limited condition that only addresses linear relationships. Variables are uncorrelated when their covariance equals zero:
\[\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0\]Logical Relationships
The relationship between these concepts is asymmetric:
- Independence ⟹ Uncorrelatedness: If $X$ and $Y$ are independent, then $E[XY] = E[X]E[Y]$, making them uncorrelated.
- Uncorrelatedness ⟹ Independence: This implication is FALSE in general. Variables can have zero linear correlation while maintaining strong nonlinear dependencies.
Classical Counterexample
Consider $X \sim \text{Uniform}(-1, 1)$ and $Y = X^2$. Then:
- $E[XY] = E[X^3] = 0$ (odd function over symmetric interval)
- $E[X] = 0$, so $\text{Cov}(X,Y) = 0$
- Yet $Y$ is completely determined by $X$, demonstrating perfect dependence
This example illustrates how uncorrelatedness only captures the “linear shadow” of dependence relationships.
Measures of Dependence
Linear Dependence Measures
Pearson Correlation Coefficient:
\[r = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}\]- Range: $[-1, 1]$
- Captures only linear relationships
- $r = 0$ indicates uncorrelatedness, not independence
Nonlinear Dependence Measures
Spearman’s Rank Correlation:
\[\rho_s = \text{Corr}(\text{rank}(X), \text{rank}(Y))\]- Detects monotonic relationships
- More robust to outliers than Pearson correlation
- Based on rank transformations rather than raw values
Kendall’s Tau:
\[\tau = \frac{2}{n(n-1)}\sum_{i<j}\text{sign}(x_i-x_j)\text{sign}(y_i-y_j)\]- Based on concordant and discordant pairs
- Interpretable as probability difference
- Alternative measure of rank correlation
Mutual Information:
\[I(X;Y) = \int\int p_{X,Y}(x,y)\log\frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}dxdy\]- Measures reduction in uncertainty about one variable given knowledge of another
- $I(X;Y) = 0$ if and only if $X$ and $Y$ are independent
- Captures all types of dependencies, including highly nonlinear relationships
Distance Correlation (Székely & Rizzo):
\[\text{dCor}(X,Y) = \frac{\text{dCov}(X,Y)}{\sqrt{\text{dVar}(X)\text{dVar}(Y)}}\]- Based on characteristic functions and Brownian motion
- $\text{dCor}(X,Y) = 0$ if and only if $X$ and $Y$ are independent
- Powerful for detecting nonlinear dependencies that traditional correlation misses
Maximal Information Coefficient (MIC):
- Estimates maximum mutual information over all possible grids
- Designed to capture various functional relationships
- Computationally intensive but effective for exploratory data analysis
Copula-Based Measures
Copulas provide a framework for separating dependence structure from marginal distributions:
- Gaussian Copula: Preserves linear correlation structure and is suitable when dependence follows multivariate normal patterns.
- Archimedean Copulas: Including Clayton, Gumbel, and Frank copulas, these capture different types of tail dependencies and asymmetric relationships.
- Empirical Copulas: Provide nonparametric estimation of dependence structure without distributional assumptions.
Special Cases and Practical Implications
The Multivariate Normal Exception
For jointly normal random variables, uncorrelatedness and independence are equivalent. This unique property of the multivariate normal distribution explains why Pearson correlation is widely used despite its limitations in general settings.
Conclusion
The distinction between independence and uncorrelatedness is fundamental to understanding variable relationships in statistics. While the Cauchy-Schwarz inequality provides the mathematical foundation for bounded correlation measures, true independence requires the complete absence of any predictive relationship, linear or nonlinear.
Practice
HW5 - Unified Stochastic Process Simulator