Definition
Let \(\mathbf{x} \in \mathbb{R}^N\) be a vector whose elements, \(X_i\), are random variables (i.e., \(\mathbf{x}\) is a random vector). As elements of \(\mathbf{x}\), we say that \(X_1, X_2, \dots, X_N\) are jointly distributed. Define the vector \(\mathbf{\mu}\) and matrix \(\mathbf{\Sigma}\) as \[
\begin{align*}
\mathbf{\mu} \in \mathbb{R}^N &= (\mu_1, \mu_2, \dots, \mu_N)^\top = (\mathbb{E}(X_1), \mathbb{E}(X_2), \dots, \mathbb{E}(X_N))^\top \\
\mathbf{\Sigma} \in \mathbb{R}^{N \times N} &= Cov(\mathbf{x}) = \mathbb{E}[(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top] = \Bigl[Cov(X_i, X_j) \Bigr]_{i, j = 1}^N
\end{align*}
\]
and suppose \(X_i \sim \mathcal{N}(\mu_i, \mathbf{\Sigma}_{i, i})\). If every linear combination of \(\mathbf{x}\)’s elements results in a univariate normal variable, we say that \(\mathbf{x}\) comes from the multivariate Gaussian distribution parameterized by \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\), written as \(\mathbf{x} \sim \mathcal{N}_N (\mathbf{\mu}, \mathbf{\Sigma}).\)
Why does the algorithm work?
The claim is that \(\mathbf{x}\) is a draw from a multivariate Gaussian distribution parameterized by \(\mathbf{\mu}\) and \(\mathbf{\Sigma}\), i.e., \(\mathbf{x} \sim \mathcal{N}_N (\mathbf{\mu}, \mathbf{\Sigma})\). So, why does this work? First, we can show that \(\mathbb{E}(\mathbf{x}) = \mathbf{\mu}\):
\[
\begin{align*}
\mathbb{E}(\mathbf{x}) &= \mathbb{E}(L\mathbf{z} + \mathbf{\mu}) \\
&= \mathbb{E}(L\mathbf{z}) + \mathbb{E}(\mathbf{\mu}) \\
&= L \cdot \mathbb{E}(\mathbf{z}) + \mathbf{\mu} \\
&= L \cdot 0 + \mathbf{\mu} \\
&= \mathbf{\mu}.
\end{align*}
\]
Next, we can see that \(Cov(\mathbf{x}) = \mathbf{\Sigma}\):
Reminder that \(Cov(\mathbf{x})\) is not a scalar, rather, it is the covariance matrix (or variance-covariance matrix) of \(\mathbf{x}\).
\[
\begin{align*}
Cov(\mathbf{x}) &= \mathbb{E}[(\mathbf{x} - \mathbf{\mu})(\mathbf{x} - \mathbf{\mu})^\top] \\
&= \mathbb{E}[(L\mathbf{z} + \mathbf{\mu} - \mathbf{\mu})(L\mathbf{z} + \mathbf{\mu} - \mathbf{\mu})^\top] \\
&= \mathbb{E}[L\mathbf{z}\mathbf{z}^\top L^\top] \\
&= L\ \mathbb{E}[\mathbf{z}\mathbf{z}^\top]\ L^\top \\
&= L\ \mathbb{E}[(\mathbf{z} - 0)(\mathbf{z} - 0)^\top]\ L^\top \\
&= L\ Cov(\mathbf{z})\ L^\top \\
&= L I L^\top \\
&= L L^\top \\
&= \mathbf{\Sigma}.
\end{align*}
\]
This proof hinges on the definition of covariance and the independence of the \(Z_i\)’s. Remember, \(Cov(Z_i, Z_i) = Var(Z_i) = 1\) and \(Cov(Z_i, Z_j) = 0\) when \(i \neq j\) and \(Z_i \perp Z_j\).
The following references ((https://math.stackexchange.com/users/10117/lepidopterist), n.d.), ((https://math.stackexchange.com/users/21550/hkbattousai), n.d.), (ibilion[at]purdue.edu 2022) (specifically this section) were helpful to me as I worked through understanding the algorithm and the proofs for the two results above.
Having shown that the expectation and (co)variance of \(\mathbf{x}\) are what we’d assume, we need to show that \(\mathbf{x}\) qualifies as multivariate normal. A random vector is multivariate normal if any linear combination of its elements is normally distributed. From its construction, we know \(\mathbf{x}\) is a linear transformation of a standard normal vector, i.e., \(\mathbf{z}\). We can decompose \(\mathbf{x}\)’s \(i\)-th element, \(X_i\), into the following:
\[
\begin{align*}
\sigma_i &= \sum_{j = 0}^n L_{i,j} \\
X_i &= Z_i \sigma_i + \mu_i.
\end{align*}
\]
We can then show that \(X_i \sim \mathcal{N}(\sigma_i \cdot 0 + \mu_i, \sigma_i^2) \implies X_i \sim \mathcal{N}(\mu_i, \sigma_i^2)\). This follows from theorems regarding the transformation of random variables (Casella and Berger 2024). I’ll demonstrate it for an arbitrary \(Z \sim \mathcal{N}(0, 1)\), \(\mu\), and \(\sigma > 0\) (omitting the subscript \(i\) for brevity). The probability density function (pdf) of the standard normal \(f_Z(z)\) is defined as \[
f_Z(z) = \frac{1}{\sqrt{2\pi}}\ e^{-z^2 / 2}.
\]
Define \(X' = g(Z) = \sigma \cdot f_Z(Z)\). Then, \(g^{-1}(x) = \frac{x}{\sigma}\), and \(\frac{d}{dx}g^{-1}(x) = \frac{1}{\sigma}\). Thus, the probability density function \(f_{X'}(x)\) is
\[
\begin{align*}
f_{X'}(x) &= f_Z(g^{-1}(x)) \frac{d}{dx}g^{-1}(x) \\
&= f_Z \Bigl(\frac{x}{\sigma} \Bigr) \cdot \frac{1}{\sigma} \\
&= \frac{1}{\sqrt{2\pi}}\ e^{-(\frac{x}{\sigma})^2 \cdot \frac{1}{2}} \cdot \frac{1}{\sigma} \\
&= \frac{1}{\sqrt{2\pi} \sigma}\ e^{-\frac{x^2}{2\sigma^2}}.
\end{align*}
\]
This is the pdf for \(\mathcal{N}(0, \sigma^2)\). Now, we find \(X = h(X') = f_{X'}(X') + \mu\), where \(h^{-1}(x) = x - \mu\) and \(\frac{d}{dx}\ h^{-1}(x) = 1\). As with the above, we have \[
f_X(x) = f_{X'}(h^{-1}(x)) \frac{d}{dx}\ h^{-1}(x) = f_{X'}(x - \mu) = \frac{1}{\sqrt{2\pi} \sigma}\ e^{-\frac{(x - \mu)^2}{2\sigma^2}},
\]
which is the classical definition of the normal distribution’s probability density function. This means that scaling a normal variable by a constant, or adding a constant to a normal variable results in another normal variable. So, each \(X_i\) is normal. Lastly, as one’s intuition might suspect, the sum of jointly distributed normal variables is also normal. I won’t go through the proof here, but some versions can be found on wikipedia.
Thus, we can conclude that a linear combination \(Y = a_1 X_1 + \cdots + a_n X_n\) is a normal variable, meaning \(\mathbf{x}\) is a draw from the multivariate Gaussian distribution.