Zhiguang Huo (Caleb)
Wednesday October 20th, 2021
\[Y = X\beta + \varepsilon\]
least square estimator (LS): \[\hat{\beta} = \arg \min_\beta \frac{1}{2}\| Y - X\beta\|_2^2\]
Maximum likelihood estimator (MLE): \[\hat{\beta} = \arg \max_\beta L(\beta; X, Y)\]
For linear regression model, LS estimator is the same as MLE \[\hat{\beta} = (X^\top X)^{-1} X^\top Y\] Assuming \(X^\top X\) is invertable
## [1] 0.9957515x <- data.frame(x1,x2,x3)
y <- 2*x1 + 3*x2 + 4*x3 + rnorm(n, sd = 3)
xyData <- cbind(y,x)
lmFit <- lm(y~x1 + x2 + x3, data=xyData)
summary(lmFit)## 
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = xyData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.2838  -1.7470   0.0118   1.8627   6.9231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.5342     1.7695  -0.867   0.3881    
## x1            2.2746     0.3116   7.299 8.39e-11 ***
## x2            5.2712     3.1574   1.669   0.0983 .  
## x3            1.8343     3.1201   0.588   0.5580    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.9 on 96 degrees of freedom
## Multiple R-squared:  0.8679, Adjusted R-squared:  0.8638 
## F-statistic: 210.3 on 3 and 96 DF,  p-value: < 2.2e-16## Loading required package: carData##         x1         x2         x3 
##   1.014903 118.852908 119.013351## 
## Call:
## lm(formula = y ~ x1 + x2, data = xyData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3139  -1.7410  -0.1042   1.8065   7.0280 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.5612     1.7629  -0.886    0.378    
## x1            2.2572     0.3092   7.301 7.97e-11 ***
## x2            7.1196     0.2895  24.595  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.891 on 97 degrees of freedom
## Multiple R-squared:  0.8674, Adjusted R-squared:  0.8647 
## F-statistic: 317.4 on 2 and 97 DF,  p-value: < 2.2e-16Regularization methods provide better solutions to this problem
\[\hat{\beta} = \arg \min_\beta \frac{1}{2}\| Y - X\beta\|_2^2 + \lambda \| \beta\|^2_2\] - \(\|a\|_2 = \sqrt{a_1^2 + \ldots + a_p^2}\).
\[\hat{\beta} = \arg \min_\beta \frac{1}{2}\| Y - X\beta\|_2^2 + \lambda \| \beta\|^2_2\]
##                  x1        x2        x3 
## 0.9529628 2.0535563 3.4643002 3.2671640The data is from the book element of statistical learning
## 'data.frame':    97 obs. of  10 variables:
##  $ lcavol : num  -0.58 -0.994 -0.511 -1.204 0.751 ...
##  $ lweight: num  2.77 3.32 2.69 3.28 3.43 ...
##  $ age    : int  50 58 74 58 62 50 64 58 47 63 ...
##  $ lbph   : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ svi    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lcp    : num  -1.39 -1.39 -1.39 -1.39 -1.39 ...
##  $ gleason: int  6 6 7 6 6 6 6 6 6 6 ...
##  $ pgg45  : int  0 0 20 0 0 0 0 0 0 0 ...
##  $ lpsa   : num  -0.431 -0.163 -0.163 -0.163 0.372 ...
##  $ train  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...##                    lcavol      lweight          age         lbph          svi 
##  0.181560862  0.564341279  0.622019787 -0.021248185  0.096712523  0.761673403 
##          lcp      gleason        pgg45 
## -0.106050939  0.049227933  0.004457512## 
## Call:
## lm(formula = lpsa ~ ., data = prostate)
## 
## Coefficients:
## (Intercept)       lcavol      lweight          age         lbph          svi  
##    0.181561     0.564341     0.622020    -0.021248     0.096713     0.761673  
##         lcp      gleason        pgg45  
##   -0.106051     0.049228     0.004458##                    lcavol      lweight          age         lbph          svi 
## -0.023822581  0.470383515  0.595477805 -0.015328827  0.082534382  0.663639476 
##          lcp      gleason        pgg45 
## -0.022092251  0.066864682  0.003190709##            lcavol  lweight      age     lbph      svi      lcp  gleason 
## 2.478387 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 
##    pgg45 
## 0.000000## Loaded lars 1.2x <- as.matrix(prostate[,1:8])
y <- prostate[,9]
lassoFit <- lars(x, y) ## lar for least angle regression
coef(lassoFit, s=2, mode="lambda") ## get beta estimate when lambda = 2. ##    lcavol   lweight       age      lbph       svi       lcp   gleason     pgg45 
## 0.4506712 0.2910916 0.0000000 0.0000000 0.3475427 0.0000000 0.0000000 0.0000000library(lars)
x <- as.matrix(prostate[,1:8])
y <- prostate[,9]
lassoFit <- lars(x, y) ## lar for least angle regression
coef(lassoFit, s=0.5, mode="fraction") ## get beta estimate when mu = 0.5*||beta_LS||_1##       lcavol      lweight          age         lbph          svi          lcp 
## 0.4746756454 0.4143707391 0.0000000000 0.0000000000 0.4499667198 0.0000000000 
##      gleason        pgg45 
## 0.0000000000 0.0001502949\[\hat{\beta}^{IC} = \arg\min_{\beta \in \mathbb{R}^P} \frac{1}{2} \|y - X\beta\|_2^2 + \lambda \|\beta\|_0\]
From book Element of Statistical Learning
\[\hat{\beta}^{elastic} = \arg\min_{\beta \in \mathbb{R}^P} \frac{1}{2} \|y - X\beta\|_2^2 + \lambda_2 \|\beta\|_2^2 + \lambda_1 \|\beta\|_1\]
\[\hat{\beta}^{elastic} = \arg\min_{\beta \in \mathbb{R}^P} \frac{1}{2} \|y - X\beta\|_2^2 + \lambda ( (1 - \alpha ) \|\beta\|_2^2 + \alpha \|\beta\|_1)\]
## Loading required package: Matrix## Loaded glmnet 4.1par(mfrow=c(2,2))
# For plotting options, type '?plot.glmnet' in R console
plot(fit.lasso, xvar="lambda", main = "LASSO")
plot(fit.ridge, xvar="lambda", main = "Ridge")
plot(fit.elnet, xvar="lambda", main = "Elastic Net")\[\min_{\beta=(\beta_{(1)},\dots,\beta_{(G)}) \in \mathbb{R}^p} \frac{1}{2} ||y-X\beta||_2^2 + \lambda \sum_{i=1}^G \sqrt{p_{(i)}} ||\beta_{(i)}||_2,\]
\[\min_{\beta \in \mathbb{R}^p} \frac{1}{2} || y - \beta ||_2^2 + \lambda \sum_{i=1}^{p-1} |\beta_i - \beta_{i+1}|\]
Consider a general setting \[\min_{\beta \in \mathbb{R}^p} f(\beta) + \lambda ||D\beta||_1\] where \(f: \mathbb{R}^n \rightarrow \mathbb{R}\) is a smooth convex function. \(D \in \mathbb{R}^{m\times n}\) is a penalty matrix.