Homework 10
Theory
1. Sampling Mean and Variance
Sample Mean
Given a random sample $X_1, X_2, \ldots, X_n$ from a population with mean $\mu$ and variance $\sigma^2$, the sample mean is defined as:
\[\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i\]Key Properties:
- Expected Value: $E[\bar{X}] = \mu$ (unbiased estimator of population mean)
- Variance: $\text{Var}(\bar{X}) = \frac{\sigma^2}{n}$ (decreases with sample size)
- Standard Error: $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}$
Distribution Properties:
- Central Limit Theorem: For sufficiently large $n$, regardless of the population distribution:
- Normal Population: If $X_i \sim N(\mu, \sigma^2)$, then $\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)$ exactly for any $n$ The sample mean serves as the foundation for statistical inference, enabling generalizations about populations based on sample data.
Sample Variance
The unbiased sample variance is defined as:
\[S^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2\]Key Properties:
- Expected Value: $E[S^2] = \sigma^2$ (unbiased estimator of population variance)
- Degrees of Freedom: Uses $(n-1)$ in denominator due to loss of one degree of freedom from estimating $\mu$ with $\bar{X}$
Distribution Properties (for Normal Populations):
- Chi-square Distribution: The scaled sample variance follows:
- Independence: $\bar{X}$ and $S^2$ are statistically independent
- t-distribution: The standardized sample mean using sample standard deviation:
These properties make the sample variance critical for hypothesis testing and constructing confidence intervals.
Statistical Inference Applications
The distributions of sampling mean and variance enable:
- Confidence Intervals: Using the t-distribution when $\sigma$ is unknown
- Hypothesis Testing: Testing claims about population parameters
- Quality Control: Monitoring process stability through control charts
2. Lebesgue-Stieltjes Integration
General Framework
The Lebesgue-Stieltjes integral provides a unified approach to integration that generalizes both Riemann and Lebesgue integrals. For a function $g(x)$ and a right-continuous, non-decreasing function $F(x)$:
\[\int_{-\infty}^{\infty} g(x) , dF(x)\]This framework elegantly handles both discrete and continuous cases, as well as mixed distributions with discontinuities.
Applications to Probability Theory
1. Unified Expectation Formula: The expectation of a random variable $X$ with cumulative distribution function $F_X(x)$ is:
\[E[X] = \int_{-\infty}^{\infty} x , dF_X(x)\]This single formula encompasses:
- Discrete Case: $E[X] = \sum_{i} x_i P(X = x_i) = \int x , dF_X(x)$
- Continuous Case: $E[X] = \int_{-\infty}^{\infty} x f_X(x) , dx = \int x , dF_X(x)$
2. General Transformation Formula: For any measurable function $g$:
\[E[g(X)] = \int_{-\infty}^{\infty} g(x) , dF_X(x)\]3. Variance Representation:
\[\text{Var}(X) = E[(X - \mu)^2] = \int_{-\infty}^{\infty} (x - \mu)^2 , dF_X(x)\]4. Probability Measure Definition: For any measurable set $A$:
\[P(X \in A) = \int_A dF_X(x)\]Applications to Measure Theory
1. Measure Construction: Every right-continuous, non-decreasing function $F$ generates a unique measure $\mu_F$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ via:
\[\mu_F((a,b]) = F(b) - F(a)\]2. Lebesgue Decomposition: Any distribution function can be uniquely decomposed as:
\[F(x) = F_{ac}(x) + F_s(x) + F_d(x)\]where:
- $F_{ac}$: absolutely continuous part (corresponds to density)
- $F_s$: singular continuous part (continuous but not absolutely continuous)
- $F_d$: discrete part (has jumps)
3. Radon-Nikodym Theorem Application: When measures $\mu_F$ and $\mu_G$ satisfy $\mu_F \ll \mu_G$ (absolute continuity):
\[\int g , d\mu_F = \int g \cdot \frac{d\mu_F}{d\mu_G} , d\mu_G\]This is fundamental for:
- Likelihood Ratios: Comparing probability models
- Conditional Expectations:
- Change of Variables: Transformations of random variables
Connection to Sampling Theory
The Lebesgue-Stieltjes framework provides the theoretical foundation for sampling distributions:
1. Empirical Distribution Function: For a sample $X_1, \ldots, X_n$, the empirical CDF is:
\[F_n(x) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}_{X_i \leq x}\]2. Glivenko-Cantelli Theorem:
\[\sup_x |F_n(x) - F(x)| \to 0 \text{ almost surely as } n \to \infty\]3. Law of Large Numbers (Strong Form):
\[\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i \to E[X] = \int x , dF(x) \text{ almost surely}\]This connection demonstrates how the rigorous measure-theoretic foundation supports the practical results used in statistical inference, ensuring the validity of sampling-based conclusions even in complex probability spaces.