Concise Machine Learning Jonathan Richard Shewchuk May 4, 2017 Department of Electrical Engineering and Computer Sciences University of California at Berkeley Berkeley, California 94720 Abstract This report contains lecture notes for UC Berkeley’s introductory class on Machine Learning. It covers many methods for classification and regression, and several methods for clustering and dimensionality reduction. It is concise because not a word is included that cannot be written or spoken in a single semester’s lectures (with whiteboard lectures and almost no slides!) and because the choice of topics is limited to a small selection of particularly useful, popular algorithms. Supported in part by the National Science Foundation under Award CCF-1423560, in part by the University of California Lab Fees Research Program, and in part by an Alfred P. Sloan Research Fellowship. The claims in this document are those of the author. They are not endorsed by the sponsors or the U.S. Government.
Keywords: machine learning, classification, regression, density estimation, dimensionality reduction, clustering, perceptrons, support vector machines (SVMs), Gaussian discriminant analysis, linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), logistic regression, decision trees, neural networks, convolutional neural networks (CNNs, ConvNets), nearest neighbor search, least-squares linear regression, logistic regression, polynomial regression, ridge regression, Lasso, maximum likelihood estimation (MLE), principal components analysis (PCA), singular value decomposition (SVD), latent factor analysis, latent semantic indexing, k-means clustering, hierarchical clustering, spectral graph clustering
Contents 1 Introduction 1 2 Linear Classifiers and Perceptrons 7 3 Perceptron Learning; Maximum Margin Classifiers 13 4 Soft-Margin Support Vector Machines; Features 18 5 Machine Learning Abstractions and Numerical Optimization 25 6 Decision Theory; Generative and Discriminative Models 31 7 Gaussian Discriminant Analysis, including QDA and LDA 36 8 Eigenvectors and the Anisotropic Multivariate Normal Distribution 41 9 Anisotropic Gaussians, Maximum Likelihood Estimation, QDA, and LDA 46 10 Regression, including Least-Squares Linear and Logistic Regression 54 11 More Regression; Newton’s Method; ROC Curves 59 12 Statistical Justifications; the Bias-Variance Decomposition 65 13 Shrinkage: Ridge Regression, Subset Selection, and Lasso 71 14 The Kernel Trick 76 15 Decision Trees 81 16 More Decision Trees, Ensemble Learning, and Random Forests 86 17 Neural Networks 94 18 Neurons; Variations on Neural Networks 101 19 Better Neural Network Training; Convolutional Neural Networks 108 20 Unsupervised Learning and Principal Components Analysis 116 21 The Singular Value Decomposition; Clustering 125 i
22 Spectral Graph Clustering 133 23 Learning Theory 141 24 Multiple Eigenvectors; Latent Factor Analysis; Nearest Neighbors 147 25 Faster Nearest Neighbors: Voronoi Diagrams and k-d Trees 154 ii