← home

The Marchenko-Pastur Animation

The animation on the home page visualizes a fundamental result from random matrix theory: the Marchenko-Pastur law. As the matrix dimension grows, the histogram of eigenvalues of a sample covariance matrix converges to a deterministic limiting density. The code uses a few nice tricks to do this efficiently in the browser.

The Marchenko-Pastur Law

Let $X$ be an $n \times m$ matrix with i.i.d. $N(0,1)$ entries, and form the sample covariance matrix $$S = \frac{1}{m} X X^T.$$ As $n, m \to \infty$ with $n/m \to \gamma \in (0, 1]$, the empirical distribution of the eigenvalues of $S$ converges almost surely to the Marchenko-Pastur distribution with density $$f(x) = \frac{1}{2\pi\gamma x}\sqrt{(\lambda_+ - x)(x - \lambda_-)}$$ where $$x \in [\lambda_-, \lambda_+],\ \text{and} \ \lambda_\pm = (1 \pm \sqrt{\gamma})^2.$$ The animation uses $\gamma = 1/2$ (i.e., $m = 2n$), giving support $$[(1 - 1/\sqrt{2})^2,\; (1 + 1/\sqrt{2})^2].$$

The Dumitriu-Edelman Tridiagonal Model

Naively, one would form the full $n \times m$ Gaussian matrix $X$, compute $S = \frac{1}{m}XX^T$, and find its eigenvalues—an $O(nm + n^2)$ operation at minimum. The animation instead exploits a result of Dumitriu and Edelman: there exists an $n \times n$ tridiagonal matrix whose eigenvalues have exactly the same joint distribution as those of $S$.

The construction is as follows. Let $B$ be the $n \times n$ bidiagonal matrix $$B = \begin{pmatrix} \chi_{m} & & & \\ \chi_{n-1} & \chi_{m-1} & & \\ & \chi_{n-2} & \chi_{m-2} & \\ & & \ddots & \ddots \end{pmatrix}$$ where the $\chi_k$ entries are independent chi-distributed random variables with $k$ degrees of freedom. Then the eigenvalues of $$T = \frac{1}{m} B B^T$$ have the same joint distribution as those of $S = \frac{1}{m}XX^T$ (Theorem 3.1 in Dumitriu-Edelman). Since $B$ is bidiagonal, $T = \frac{1}{m}BB^T$ is tridiagonal. In the code, with 0-indexing:

$B_{k,k} \sim \chi_{m-k}$
$B_{k,k+1} \sim \chi_{n-1-k}$

This reduces the problem to generating $O(n)$ random variates and working with a tridiagonal matrix, rather than an $n \times m$ dense one.

Random Variable Generation

The code needs to sample chi-distributed random variables, which it builds up from simpler distributions:

Normal random variables are generated via the Marsaglia polar method. Sample $u, v$ uniformly on $(-1,1)$ until $s = u^2 + v^2 < 1$. Then $u\sqrt{-2\log s/s}$ and $v\sqrt{-2\log s/s}$ are independent $N(0,1)$ draws. The spare is cached for the next call.

Gamma random variables use the Marsaglia-Tsang method. For $\Gamma(\alpha, 1)$ with $\alpha \geq 1$: set $d = \alpha - 1/3$ and $c = 1/\sqrt{9d}$, then repeatedly draw $x \sim N(0,1)$, form $v = (1 + cx)^3$, and accept with probability based on a squeeze test. For $\alpha < 1$, use the identity $$\Gamma(\alpha, 1) \stackrel{d}{=} \Gamma(\alpha+1,1) \cdot U^{1/\alpha}$$ where $U \sim \text{Uniform}(0,1)$.

Chi random variables follow from the relation $$\chi_k = \sqrt{2 \cdot \Gamma(k/2, 1)}.$$

Sturm Sequences for Eigenvalue Counting

Rather than computing all $n$ eigenvalues and binning them, the code counts how many eigenvalues fall in each histogram bin directly. For a symmetric tridiagonal matrix with diagonal entries $a_k$ and off-diagonal entries $b_k$, the Sturm sequence recurrence $$q_0 = a_0 - x, \qquad q_k = (a_k - x) - \frac{b_{k-1}^2}{q_{k-1}}$$ has the property that the number of negative $q_k$ values equals the number of eigenvalues strictly less than $x$. Evaluating this at each bin boundary and taking differences gives the bin counts in $O(n)$ per boundary—much cheaper than a full eigendecomposition.

References

Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, Springer Series in Statistics, 2nd ed., 2010.

I. Dumitriu and A. Edelman, Matrix models for beta ensembles, J. Math. Phys. 43 (2002), 5830–5847.

G. Marsaglia and T. A. Bray, A convenient method for generating normal variables, SIAM Review 6 (1964), 260–264.

G. Marsaglia and W. W. Tsang, The ziggurat method for generating random variables, J. Statistical Software 5 (2000), no. 8.