Machine Learning

Intro

Machine learning is a core subbranch of AI. In general, ML is about learning to do better in the future based on what was experienced in the past. In the field of computer science, we design algorithms to learn data (i.e. finding patterns) and do stuff (i.e. making predictions).

Course Goal

  • Identify if ML is an apporopriate solution for a problem

  • What types of algorithms might be applicable

  • How to apply these algorithms

Topics

Basic Supervised Learning

  • Decision Trees

  • KNN

  • Perceptron

Advanced Supervised Learning

  • Linear regression and gradient descnet

  • Support Vector Machines

  • Probabilistic models

  • Neural Networks

  • Kernels

  • Ensemble Learning

Unsupervised Learning

  • K-means

  • PCA

  • Expectation Maximization

Notation

Problem Setting

  • Set of possible instances XX

    • an instance xXx\in{X} is a feature vector x=[x1,x2,...,xD]x = [x_1, x_2, ..., x_D]

  • Labels YY ({0,1}\{0, 1\} if binary)

  • Unknown target function f:XYf: X \rightarrow Y

  • Set of hypotheses H={hh:XY}H = \{h|h:X\rightarrow Y\}

    • i.e. each hh can be a decision tree. but performing an exhaustive search to find the best hh is too expensive. so we want some heuristic approach and only pick the feature that max/min accuracy/error.

Input - Training examples of ff

  • {(x1,y1),(x2,y2),...,(xN,yN)}\{(x^1, y^1), (x^2, y^2), ..., (x^N, y^N) \}

Output

  • A hypothesis hHh\in{H} that best approximates ff

Loss Function

loss(h(xi),yi)={0,if  h(xi)=yi1,otherwiseloss(h(x_i), y_i) = \begin{cases} 0, & if \; h(x_i) = y_i \\ 1, & otherwise \end{cases}

Data Distribution (Unknown)

  • Assume our training data are random samples from a probability distribution DD over (x,y)(x, y) pairs.

Expected Loss

  • ϵ\epsilon is the expected loss of hh over DD wrt lossloss (we want to minimize it)ϵ=E(xi,yi) D[loss(h(xi),yi)]=(xi,yi)ND(xi,yi)loss(h(xi),yi)\epsilon = \mathop{\mathbb{E}}_{(x_i,y_i)\text{~}D}[loss(h(x_i), y_i)] = \sum\limits_{(x_i, y_i)}^{N} D(x_i, y_i)loss(h(x_i), y_i)

  • But we don't know DD ʕº̫͡ºʔ

  • We can only compute the training error

    ϵ^=i=1N1Nloss(h(xi),yi))\hat{\epsilon} = \sum\limits_{i=1}^{N} \frac{1}{N} loss(h(x^i), y^i))

Learning Algorithms

As usual, we care about time and space efficiency. But we also care a great deal about the amount of data we need. 3 criteria for successful learning: 1. enough data 2. a rule that makes a low number of mistakes on the training data 3. make the rule as simple as possible Very often there is trade-off between 2 and 3.

Learning Models

  • Example (or instance) is the object being classified

  • An example is described by a set of attributes (aka features, variables, or dimensions)

  • Label (or class) is the category

  • Concept is the mapping from examples to labels. c:X{0,1}c: X \rightarrow \{0, 1\}

  • Concept class is a collection of concepts

Ex. A patient might be described by gender, age, weight, blood pressure, body temp, etc.

During training, the learning algorithm is supplied with labeled examples. During testing, only unlabeled examples are provided.

We often assume only two labels, 0 and 1. We also assume there is a mapping from examples to labels. c:X{0,1}c: X \rightarrow \{0, 1\}. X is the space of all possible examples (aka domain or instance space).

References

Last updated