Data setup

PCA

  • Eigen value decomposition on \((n-1) \hat{\Sigma} = M_0 M_0^\top\) \[(n-1) \hat{\Sigma} = B\Lambda B^\top\]

SVD

  • SVD on \(M_0\)

\[M_0 = UDV^\top\]

Equivalence between PCA and SVD

\[(n-1) \hat{\Sigma} = M_0 M_0^\top = UDV^\top (UDV^\top)^top = UDV^\top VDU^\top = UD^2U^\top\] Therefore \(\Sigma = D^2\)

verify using Iris data

airis <- iris[,1:4]
biris <- scale(airis, center = T, scale = F) ## center each column mean to 0
asvd <- svd(biris)
print(asvd$d^2)
## [1] 630.008014  36.157941  11.653216   3.551429
Sigma <- cov(biris) * (nrow(biris) - 1)
aeigen <- eigen(Sigma)
print(aeigen$values)
## [1] 630.008014  36.157941  11.653216   3.551429

What if don’t center each feature to mean 0?

  • Covariance matrix
    • Eigen value decomposition won’t be affected since Covariance matrix already considered centered data.
  • The SVD result will be affected:
    • The Principal component direction will also be affected by the mean structure.

Do we need to scale the data to standardization 1?

  • Yes. Otherwise, different features are not comparable.

Conclusion:

For PCA and SVD, scale the data (mean 0 and sd 1) is necessary!