00102892: Statistical Learning
Course Description
This is an introductory statistical machine learning course for graduate and upper-level undergraduate students in statistics, applied
mathematics, computer science, and other fields which involve learning from data. The course covers fundamental principles of machine learning
and major topics in supervised, unsupervised, and semi-supervised learning, including linear regression and classification, spline and kernel
smoothing, model selection and regularization, additive models, tree-based methods, support vector machines, clustering, principal component
analysis, nonnegative matrix factorization, graphical models, etc.
Syllabus
Lectures and Assignments
| Week | Date | Topics | References | Assignments | Notes and Further Reading |
| 1 | 9/9 | Introduction, no free lunch theorem | Zhou Chap. 1 | | |
| 9/11 | Classical linear regression | ESL Secs. 2.2, 3.2 | | Seemingly unrelated regressions are an example where estimating the error covariance can improve efficiency; see . |
| 2 | 9/16 | Sparse linear regression | ESL Secs. 3.3, 3.4 | Homework 1, due 9/23 | provides a retrospective view of the Lasso and its variants; supplements the view by emphasizing nonconvex penalties and feature screening methods. |
| 3 | 9/23 | Theory for Lasso | Wainwright Sec. 7.5 | | See Wainwright Secs. 7.3 and 7.4 for theory via the restricted eigenvalue condition, and for comparisons of conditions. |
| 9/25 | Algorithms for Lasso, linear discriminant analysis | ESL Sec. 3.8, 4.1–4.3 | | See for ADMM, for low-rank regression, and for sparse LDA. |
| 4 | 9/30 | National Day | | | |
| 5 | 10/7 | Logistic regression, separating hyperplanes | ESL Secs. 4.4, 4.5 | Homework 2, due 10/14 | |
| 10/9 | Splines | ESL Secs. 5.1–5.6 | | For P-splines that combine the ideas of regression splines and smoothing splines, see . |
| 6 | 10/14 | Reproducing kernel Hilbert spaces, kernel smoothing | ESL Secs. 5.7, 5.8, 6.1 | | For a more formal introduction to reproducing kernel Hilbert spaces, see Wainwright Chap. 12. |
| 7 | 10/21 | Local polynomial regression, kernel density estimation | ESL Secs. 6.1–6.6, Sec. 1.2.1 | Homework 3, due 10/28 | For the construction of higher order kernels using Legendre polynomials, see Tsybakov Sec. 1.2.2. |
| 10/23 | Naive Bayes, principles of model selection, AIC | ESL Secs. 6.6, 7.1–7.6 | | |
| 8 | 10/28 | BIC, bootstrap, generalized additive models | ESL Secs. 7.7–7.12, 9.1 | | The AIC–BIC dilemma was discussed by and . |
| 9 | 11/4 | Classification and regression trees, multivariate adaptive regression splines | ESL Secs. 9.2–9.5 | | A review of diversity indices was given by . |
| 11/6 | Midterm exam | | | Mean = 49, median = 51, Q1 = 33, Q3 = 66, high score = 94 |
| 10 | 11/11 | Boosting | ESL Secs. 10.1–10.6, 10.9–10.12 | Homework 4, due 11/18 | is a book-length treatment of boosting. |
| 11 | 11/18 | Support vector machines | ESL Secs. 12.1–12.3 | | Multiclass SVMs were considered by . |
| 11/20 | K-means clustering, principal component analysis | ESL Secs. 14.3, 14.5 | | Consistency of K-means clustering was established by . |
| 12 | 11/25 | Spectral clustering, nonnegative matrix factorization | ESL Secs. 14.5.3, 14.6 | | For consistency of spectral clustering and its application to community detection in social network models, see and . |
| 13 | 12/2 | Gaussian graphical models | ESL Secs. 17.1–17.3, Wainwright Chap. 11 | Homework 5, due 12/16 | The CLIME method was proposed by . |
| 12/4 | Directed acyclic graphs | See your lecture notes | | Estimating high-dimensional, sparse DAGs was considered by and . |
| 14 | 12/9 | Ensemble learning, semi-supervised learning | ESL Chaps. 15, 16, Zhou Chap. 13 | | |
| 15 | 12/16 | Neural networks | Zhou Chap. 5, DL Chap. 6 | | See for a recent review; compare it with a much older one, . |
| 16 | 12/23 | Oral presentations | | | |
| 12/25 | Oral presentations | | |
|
|