The Marchenko-Pastur Animation
The animation on the home page visualizes a fundamental result from random matrix theory: the Marchenko-Pastur law. As the matrix dimension grows, the histogram of eigenvalues of a sample covariance matrix converges to a deterministic limiting density. The code uses a few nice tricks to do this efficiently in the browser.
The Marchenko-Pastur LawLet $X$ be an $n \times m$ matrix with i.i.d. $N(0,1)$ entries, and form the sample covariance matrix $$S = \frac{1}{m} X X^T.$$ As $n, m \to \infty$ with $n/m \to \gamma \in (0, 1]$, the empirical distribution of the eigenvalues of $S$ converges almost surely to the Marchenko-Pastur distribution with density $$f(x) = \frac{1}{2\pi\gamma x}\sqrt{(\lambda_+ - x)(x - \lambda_-)}$$ where $$x \in [\lambda_-, \lambda_+],\ \text{and} \ \lambda_\pm = (1 \pm \sqrt{\gamma})^2.$$ The animation uses $\gamma = 1/2$ (i.e., $m = 2n$), giving support $$[(1 - 1/\sqrt{2})^2,\; (1 + 1/\sqrt{2})^2].$$
The Dumitriu-Edelman Tridiagonal ModelNaively, one would form the full $n \times m$ Gaussian matrix $X$, compute $S = \frac{1}{m}XX^T$, and find its eigenvalues—an $O(nm + n^2)$ operation at minimum. The animation instead exploits a result of Dumitriu and Edelman: there exists an $n \times n$ tridiagonal matrix whose eigenvalues have exactly the same joint distribution as those of $S$.
The construction is as follows. Let $B$ be the $n \times n$ bidiagonal matrix $$B = \begin{pmatrix} \chi_{m} & & & \\ \chi_{n-1} & \chi_{m-1} & & \\ & \chi_{n-2} & \chi_{m-2} & \\ & & \ddots & \ddots \end{pmatrix}$$ where the $\chi_k$ entries are independent chi-distributed random variables with $k$ degrees of freedom. Then the eigenvalues of $$T = \frac{1}{m} B B^T$$ have the same joint distribution as those of $S = \frac{1}{m}XX^T$ (Theorem 3.1 in Dumitriu-Edelman). Since $B$ is bidiagonal, $T = \frac{1}{m}BB^T$ is tridiagonal. In the code, with 0-indexing:
- $B_{k,k} \sim \chi_{m-k}$
- $B_{k,k+1} \sim \chi_{n-1-k}$
This reduces the problem to generating $O(n)$ random variates and working with a tridiagonal matrix, rather than an $n \times m$ dense one.
Random Variable GenerationThe code needs to sample chi-distributed random variables, which it builds up from simpler distributions:
Normal random variables are generated via the Marsaglia polar method. Sample $u, v$ uniformly on $(-1,1)$ until $s = u^2 + v^2 < 1$. Then $u\sqrt{-2\log s/s}$ and $v\sqrt{-2\log s/s}$ are independent $N(0,1)$ draws. The spare is cached for the next call.
Gamma random variables use the Marsaglia-Tsang method. For $\Gamma(\alpha, 1)$ with $\alpha \geq 1$: set $d = \alpha - 1/3$ and $c = 1/\sqrt{9d}$, then repeatedly draw $x \sim N(0,1)$, form $v = (1 + cx)^3$, and accept with probability based on a squeeze test. For $\alpha < 1$, use the identity $$\Gamma(\alpha, 1) \stackrel{d}{=} \Gamma(\alpha+1,1) \cdot U^{1/\alpha}$$ where $U \sim \text{Uniform}(0,1)$.
Chi random variables follow from the relation $$\chi_k = \sqrt{2 \cdot \Gamma(k/2, 1)}.$$
Sturm Sequences for Eigenvalue CountingRather than computing all $n$ eigenvalues and binning them, the code counts how many eigenvalues fall in each histogram bin directly. For a symmetric tridiagonal matrix with diagonal entries $a_k$ and off-diagonal entries $b_k$, the Sturm sequence recurrence $$q_0 = a_0 - x, \qquad q_k = (a_k - x) - \frac{b_{k-1}^2}{q_{k-1}}$$ has the property that the number of negative $q_k$ values equals the number of eigenvalues strictly less than $x$. Evaluating this at each bin boundary and taking differences gives the bin counts in $O(n)$ per boundary—much cheaper than a full eigendecomposition.
References| Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices, Springer Series in Statistics, 2nd ed., 2010. |
| I. Dumitriu and A. Edelman, Matrix models for beta ensembles, J. Math. Phys. 43 (2002), 5830–5847. |
| G. Marsaglia and T. A. Bray, A convenient method for generating normal variables, SIAM Review 6 (1964), 260–264. |
| G. Marsaglia and W. W. Tsang, The ziggurat method for generating random variables, J. Statistical Software 5 (2000), no. 8. |