PCA and SVD

Data setup

\(x\in \mathbb{R}^p\) is a population subject
\(M \in \mathbb{R}^{n \times p}\) is the sample matrix (i.e. \(n < p\))
Denote \(M_0 = M - \bar{M}\) is centered \(M\). (For each feature, mean 0).
Population covariance: \[\Sigma = cov(x) = E(xx^\top) - E(x)E(x^\top) = E((x - E(x))(x-E(x))^\top)\]
Covaraince estimate \[\hat{\Sigma} = \frac{1}{n-1} M_0 M_0^\top\]

Eigen value decomposition on \((n-1) \hat{\Sigma} = M_0 M_0^\top\) \[(n-1) \hat{\Sigma} = B\Lambda B^\top\]

\[M_0 = UDV^\top\]

\[(n-1) \hat{\Sigma} = M_0 M_0^\top = UDV^\top (UDV^\top)^top = UDV^\top VDU^\top = UD^2U^\top\] Therefore \(\Sigma = D^2\)

airis <- iris[,1:4]
biris <- scale(airis, center = T, scale = F) ## center each column mean to 0
asvd <- svd(biris)
print(asvd$d^2)

## [1] 630.008014  36.157941  11.653216   3.551429

Sigma <- cov(biris) * (nrow(biris) - 1)
aeigen <- eigen(Sigma)
print(aeigen$values)

## [1] 630.008014  36.157941  11.653216   3.551429

Covariance matrix
- Eigen value decomposition won’t be affected since Covariance matrix already considered centered data.
The SVD result will be affected:
- The Principal component direction will also be affected by the mean structure.