Introduction to Statistical Learning

Course Description

Overview of supervised learning, with a focus on regression and classification methods. Syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines; Some unsupervised learning: principal components and clustering (k-means and hierarchical). Computing is done in R, through tutorial sessions and homework assignments. This math-light course is offered remotely only via video segments (MOOC style). TAs will host remote weekly office hours using an online platform such as Google Hangout or BlueJeans. There are four homework assignments, a midterm, and a final exam, all of which are administered remotely. 

Course Details

  • Grading Basis: Letter Grade or Credit/No Credit
  • Intensive Studies: This course is offered as part of the Data Science Intensive. See the Intensive Studies page for more information on how to receive an official Document of Completion.

Prerequisites

Introductory courses in statistics or probability (e.g., STATS 60), linear algebra (e.g., MATH 51), and computer programming (e.g., CS 105)

Group 3GroupGroup 2