Data setup
- \(x\in \mathbb{R}^p\) is a population subject
- \(M \in \mathbb{R}^{n \times p}\) is the sample matrix (i.e. \(n < p\))
- Denote \(M_0 = M - \bar{M}\) is centered \(M\). (For each feature, mean 0).
- Population covariance: \[\Sigma = cov(x) = E(xx^\top) - E(x)E(x^\top) = E((x - E(x))(x-E(x))^\top)\]
- Covaraince estimate \[\hat{\Sigma} = \frac{1}{n-1} M_0 M_0^\top\]
PCA
- Eigen value decomposition on \((n-1) \hat{\Sigma} = M_0 M_0^\top\) \[(n-1) \hat{\Sigma} = B\Lambda B^\top\]
Equivalence between PCA and SVD
\[(n-1) \hat{\Sigma} = M_0 M_0^\top = UDV^\top (UDV^\top)^top = UDV^\top VDU^\top = UD^2U^\top\] Therefore \(\Sigma = D^2\)
verify using Iris data
airis <- iris[,1:4]
biris <- scale(airis, center = T, scale = F) ## center each column mean to 0
asvd <- svd(biris)
print(asvd$d^2)
## [1] 630.008014 36.157941 11.653216 3.551429
Sigma <- cov(biris) * (nrow(biris) - 1)
aeigen <- eigen(Sigma)
print(aeigen$values)
## [1] 630.008014 36.157941 11.653216 3.551429
What if don’t center each feature to mean 0?
- Covariance matrix
- Eigen value decomposition won’t be affected since Covariance matrix already considered centered data.
- The SVD result will be affected:
- The Principal component direction will also be affected by the mean structure.
Do we need to scale the data to standardization 1?
- Yes. Otherwise, different features are not comparable.
Conclusion:
For PCA and SVD, scale the data (mean 0 and sd 1) is necessary!