×

Close

- Machine Learning - ML
- Note
**University of California Los Angeles, LA - UCLA**- 2 Topics
**3512 Views**- 28 Offline Downloads
- Uploaded

Touch here to read

Page-1

Topic:

CSE176 Introduction to Machine Learning — Lecture notes ´ Carreira-Perpi˜ Miguel A. n´an EECS, University of California, Merced November 28, 2016 These are notes for a one-semester undergraduate course on machine learning given by Prof. ´ Carreira-Perpi˜ Miguel A. n´an at the University of California, Merced. The notes are largely based on the book “Introduction to machine learning” by Ethem Alpaydın (MIT Press, 3rd ed., 2014), with some additions. These notes may be used for educational, non-commercial purposes. ´ Carreira-Perpi˜ c 2015–2016 Miguel A. n´an

1 Introduction 1.1 What is machine learning (ML)? • Data is being produced and stored continuously (“big data”): – science: genomics, astronomy, materials science, particle accelerators. . . – sensor networks: weather measurements, traffic. . . – people: social networks, blogs, mobile phones, purchases, bank transactions. . . – etc. • Data is not random; it contains structure that can be used to predict outcomes, or gain knowledge in some way. Ex: patterns of Amazon purchases can be used to recommend items. • It is more difficult to design algorithms for such tasks (compared to, say, sorting an array or calculating a payroll). Such algorithms need data. Ex: construct a spam filter, using a collection of email messages labelled as spam/not spam. • Data mining: the application of ML methods to large databases. • Ex of ML applications: fraud detection, medical diagnosis, speech or face recognition. . . • ML is programming computers using data (past experience) to optimize a performance criterion. • ML relies on: – Statistics: making inferences from sample data. – Numerical algorithms (linear algebra, optimization): optimize criteria, manipulate models. – Computer science: data structures and programs that solve a ML problem efficiently. • A model: – is a compressed version of a database; – extracts knowledge from it; – does not have perfect performance but is a useful approximation to the data. 1.2 Examples of ML problems • Supervised learning: labels provided. – Classification (pattern recognition): ∗ Face recognition. Difficult because of the complex variability in the data: pose and illumination in a face image, occlusions, glasses/beard/make-up/etc. Training examples: Test images: ∗ Optical character recognition: different styles, slant. . . ∗ Medical diagnosis: often, variables are missing (tests are costly). 1

∗ Speech recognition, machine translation, biometrics. . . ∗ Credit scoring: classify customers into high- and low-risk, based on their income and savings, using data about past loans (whether they were paid or not). – Regression: the labels to be predicted are continuous: ∗ Predict the price of a car from its mileage. ∗ Navigating a car: angle of the steering. ∗ Kinematics of a robot arm: predict workspace location from angles. Savings if income > θ1 and savings > θ2 then low-risk else high-risk Low-Risk y: price θ2 y = wx + w0 High-Risk θ1 Income x: mileage • Unsupervised learning: no labels provided, only input data. – Learning associations: ∗ Basket analysis: let p(Y |X) = “probability that a customer who buys product X also buys product Y ”, estimated from past purchases. If p(Y |X) is large (say 0.7), associate “X → Y ”. When someone buys X, recommend them Y . – Clustering: group similar data points. – Density estimation: where are data points likely to lie? – Dimensionality reduction: data lies in a low-dimensional manifold. – Feature selection: keep only useful features. – Outlier/novelty detection. • Semisupervised learning: labels provided for some points only. • Reinforcement learning: find a sequence of actions (policy) that reaches a goal. No supervised output but delayed reward. Ex: playing chess or a computer game, robot in a maze. 2

2 Supervised learning 2.1 Learning a class from examples: two-class problems • We are given a training set of labeled examples (positive and negative) and want to learn a classifier that we can use to predict unseen examples, or to understand the data. • Input representation: we need to decide what attributes (features) to use to describe the input patterns (examples, instances). This implies ignoring other attributes as irrelevant. x2: Engine power x2: Engine power training set for a “family car” Hypothesis class of rectangles (p1 ≤ price ≤ p2 ) AND (e1 ≤ engine power ≤ e2 ) where p1 , p2 , e1 , e2 ∈ R C e2 e1 x2t x1t p1 x1: Price p2 x1: Price D • Training set: X = {(xn , yn )}N is the nth input vector and yn ∈ {0, 1} its n=1 where xn ∈ R class label. • Hypothesis (model) class H: the set of classifier functions we will use. Ideally, the true class distribution C can be represented by a function in H (exactly, or with a small error). • Having selected H, learning the class reduces to finding an optimal h ∈ H. We don’t know the true class regions C, but we can approximate them by the empirical error : E(h; X ) = N X n=1 I(h(xn ) 6= yn ) = number of misclassified instances There may be more than one optimal h ∈ H. In that case, we achieve better generalization by maximizing the margin (the distance between the boundary of h and the instances closest to it). the hypothesis with the largest margin noise and a more complex hypothesis x2 h2 h1 x1 3

## Leave your Comments