In this question we will apply the sparse \(k\)-means algorithm and the sparse hierarchical algorithm on the iris data. The Iris data is directly available in R. There are 150 samples and 150 features in the iris data. The fifth column of the iris data is species. Below is the head of the iris data.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
In order to demonstrate the effectiveness of feature selection of the sparse \(k\)-means algorithm and the sparse hierarchical algorithm, we also added \(p_2 = 10\) noise features. Then the noise data is combined with the original data. Then the combined data (iris2) is further scaled to mean 0 and sd 1 for each feature. You only need to start with iris2 as the input data throughout this HW. Below is how to prepare iris2 as well as iris_label.
n <- nrow(iris)
p2 <- 10
set.seed(32611)
iris_noise <- matrix(rnorm(n*p2), ncol=p2)
iris_label <- iris$Species
iris2 <- cbind(iris[,1:4], iris_noise)
iris2 <- scale(iris2)
iris2 <- as.matrix(iris2)
Apply sparse \(k\)-means algorithm to iris2 data to cluster the samples. Fix the number of clusters to be K=3. Fix the tuning parameter wbounds \(wbounds = 1.9\). Draw the feature selection plot similarly to https://caleb-huo.github.io/teaching/2018SPRING/lectures/sparseClustering1.html#(29). Also preform PCA to iris2, visualize the data using the first two principal components, label each sample with the same color according their species. Also add appropriate legend to the result.
Apply sparse hierarchical clustering algorithm to iris2 data to cluster the samples. Fix the tuning parameter wbounds \(wbounds = 1.9\) with complete linkage. Draw the hierarchical tree structure and feature selection plot similarly to https://caleb-huo.github.io/teaching/2018SPRING/lectures/sparseClustering2.html#(16). Also draw the hierarchical tree structure with each sample colored according their species. Also add appropriate legend to the result.