Homework 5

4 minute read

Theory

The Cauchy-Schwarz Inequality

Statement and Significance

The Cauchy-Schwarz inequality is one of the most fundamental results in mathematics, with profound applications in statistics, particularly in understanding correlation coefficients. For real numbers $a_1, a_2, \ldots, a_n$ and $b_1, b_2, \ldots, b_n$:

\[\left(\sum_{i=1}^{n} a_i b_i\right)^2 \leq \left(\sum_{i=1}^{n} a_i^2\right)\left(\sum_{i=1}^{n} b_i^2\right)\]

This inequality establishes that the absolute value of the inner product of two vectors is at most the product of their magnitudes, which directly explains why correlation coefficients are bounded between -1 and 1.

Proof via Quadratic Function Method

The most elegant proof constructs a quadratic function that must be non-negative, then applies the discriminant condition.

Proof: Consider the quadratic function in $t$: $f(t) = \sum_{i=1}^{n} (a_i - t b_i)^2$ Since this represents a sum of squares, we have $f(t) \geq 0$ for all real $t$. Expanding the quadratic:

\[f(t) = \sum_{i=1}^{n} (a_i^2 - 2ta_ib_i + t^2b_i^2)\] \[= \sum_{i=1}^{n} a_i^2 - 2t\sum_{i=1}^{n} a_ib_i + t^2\sum_{i=1}^{n} b_i^2\]

Let $A = \sum_{i=1}^{n} a_i^2$, $B = \sum_{i=1}^{n} b_i^2$, and $C = \sum_{i=1}^{n} a_ib_i$.

Then: $f(t) = A - 2Ct + Bt^2$

Since $f(t) \geq 0$ for all $t$, this quadratic has at most one real root, which means its discriminant is non-positive:

\[\Delta = (2C)^2 - 4AB \leq 0\] \[4C^2 \leq 4AB\] \[C^2 \leq AB\]

Therefore:

\[\left(\sum_{i=1}^{n} a_i b_i\right)^2 \leq \left(\sum_{i=1}^{n} a_i^2\right)\left(\sum_{i=1}^{n} b_i^2\right)\]

Connection to Correlation Coefficient

The correlation coefficient is defined as:

\[r = \frac{\sum_{i=1}^{n} (a_i - \bar{a})(b_i - \bar{b})}{\sqrt{\sum_{i=1}^{n} (a_i - \bar{a})^2 \sum_{i=1}^{n} (b_i - \bar{b})^2}}\]

By applying Cauchy-Schwarz to the centered variables $(a_i - \bar{a})$ and $(b_i - \bar{b})$, we establish that $|r| \leq 1$. The denominator serves as the normalizing factor that ensures the correlation coefficient remains bounded, making it a standardized measure of linear association.

Independence vs Uncorrelatedness

Fundamental Conceptual Differences

Independence is a comprehensive probabilistic concept describing the complete absence of any relationship between random variables. Two random variables $X$ and $Y$ are independent if knowledge of one provides no information about the other. Formally: $P(X \in A, Y \in B) = P(X \in A)P(Y \in B)$ for all measurable sets $A$ and $B$.

Uncorrelatedness is a more limited condition that only addresses linear relationships. Variables are uncorrelated when their covariance equals zero:

\[\text{Cov}(X,Y) = E[XY] - E[X]E[Y] = 0\]

Logical Relationships

The relationship between these concepts is asymmetric:

Independence ⟹ Uncorrelatedness: If $X$ and $Y$ are independent, then $E[XY] = E[X]E[Y]$, making them uncorrelated.
Uncorrelatedness ⟹ Independence: This implication is FALSE in general. Variables can have zero linear correlation while maintaining strong nonlinear dependencies.

Classical Counterexample

Consider $X \sim \text{Uniform}(-1, 1)$ and $Y = X^2$. Then:

$E[XY] = E[X^3] = 0$ (odd function over symmetric interval)
$E[X] = 0$, so $\text{Cov}(X,Y) = 0$
Yet $Y$ is completely determined by $X$, demonstrating perfect dependence

This example illustrates how uncorrelatedness only captures the “linear shadow” of dependence relationships.

Measures of Dependence

Linear Dependence Measures

Pearson Correlation Coefficient:

\[r = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}}\]

Range: $[-1, 1]$
Captures only linear relationships
$r = 0$ indicates uncorrelatedness, not independence

Nonlinear Dependence Measures

Spearman’s Rank Correlation:

\[\rho_s = \text{Corr}(\text{rank}(X), \text{rank}(Y))\]

Detects monotonic relationships
More robust to outliers than Pearson correlation
Based on rank transformations rather than raw values

Kendall’s Tau:

\[\tau = \frac{2}{n(n-1)}\sum_{i<j}\text{sign}(x_i-x_j)\text{sign}(y_i-y_j)\]

Based on concordant and discordant pairs
Interpretable as probability difference
Alternative measure of rank correlation

Mutual Information:

\[I(X;Y) = \int\int p_{X,Y}(x,y)\log\frac{p_{X,Y}(x,y)}{p_X(x)p_Y(y)}dxdy\]

Measures reduction in uncertainty about one variable given knowledge of another
$I(X;Y) = 0$ if and only if $X$ and $Y$ are independent
Captures all types of dependencies, including highly nonlinear relationships

Distance Correlation (Székely & Rizzo):

\[\text{dCor}(X,Y) = \frac{\text{dCov}(X,Y)}{\sqrt{\text{dVar}(X)\text{dVar}(Y)}}\]

Based on characteristic functions and Brownian motion
$\text{dCor}(X,Y) = 0$ if and only if $X$ and $Y$ are independent
Powerful for detecting nonlinear dependencies that traditional correlation misses

Maximal Information Coefficient (MIC):

Estimates maximum mutual information over all possible grids
Designed to capture various functional relationships
Computationally intensive but effective for exploratory data analysis

Copula-Based Measures

Copulas provide a framework for separating dependence structure from marginal distributions:

Gaussian Copula: Preserves linear correlation structure and is suitable when dependence follows multivariate normal patterns.
Archimedean Copulas: Including Clayton, Gumbel, and Frank copulas, these capture different types of tail dependencies and asymmetric relationships.
Empirical Copulas: Provide nonparametric estimation of dependence structure without distributional assumptions.

Special Cases and Practical Implications

The Multivariate Normal Exception

For jointly normal random variables, uncorrelatedness and independence are equivalent. This unique property of the multivariate normal distribution explains why Pearson correlation is widely used despite its limitations in general settings.

Conclusion

The distinction between independence and uncorrelatedness is fundamental to understanding variable relationships in statistics. While the Cauchy-Schwarz inequality provides the mathematical foundation for bounded correlation measures, true independence requires the complete absence of any predictive relationship, linear or nonlinear.

Raffaele Ruggeri