Machine Learning

Intro

Machine learning is a core subbranch of AI. In general, ML is about learning to do better in the future based on what was experienced in the past. In the field of computer science, we design algorithms to learn data (i.e. finding patterns) and do stuff (i.e. making predictions).

Course Goal

Identify if ML is an apporopriate solution for a problem
What types of algorithms might be applicable
How to apply these algorithms

Topics

Basic Supervised Learning

Decision Trees
KNN
Perceptron

Advanced Supervised Learning

Linear regression and gradient descnet
Support Vector Machines
Probabilistic models
Neural Networks
Kernels
Ensemble Learning

Unsupervised Learning

K-means
PCA
Expectation Maximization

Notation

Problem Setting

Set of possible instances $X$
- an instance $x\in{X}$ is a feature vector $x = [x_1, x_2, ..., x_D]$
Labels $Y$ ( $\{0, 1\}$ if binary)
Unknown target function $f: X \rightarrow Y$
Set of hypotheses $H = \{h|h:X\rightarrow Y\}$
- i.e. each $h$ can be a decision tree. but performing an exhaustive search to find the best $h$ is too expensive. so we want some heuristic approach and only pick the feature that max/min accuracy/error.

Input - Training examples of $f$

$\{(x^1, y^1), (x^2, y^2), ..., (x^N, y^N) \}$

Output

A hypothesis $h\in{H}$ that best approximates $f$

Loss Function

$loss(h(x_i), y_i) = \begin{cases} 0, & if \; h(x_i) = y_i \\ 1, & otherwise \end{cases}$

Data Distribution (Unknown)

Assume our training data are random samples from a probability distribution $D$ over $(x, y)$ pairs.

Expected Loss

$\epsilon$ is the expected loss of $h$ over $D$ wrt $loss$ (we want to minimize it) $\epsilon = \mathop{\mathbb{E}}_{(x_i,y_i)\text{~}D}[loss(h(x_i), y_i)] = \sum\limits_{(x_i, y_i)}^{N} D(x_i, y_i)loss(h(x_i), y_i)$
But we don't know $D$ ʕº̫͡ºʔ
We can only compute the training error
$\hat{\epsilon} = \sum\limits_{i=1}^{N} \frac{1}{N} loss(h(x^i), y^i))$

Learning Algorithms

As usual, we care about time and space efficiency. But we also care a great deal about the amount of data we need. 3 criteria for successful learning: 1. enough data 2. a rule that makes a low number of mistakes on the training data 3. make the rule as simple as possible Very often there is trade-off between 2 and 3.

Learning Models

Example (or instance) is the object being classified
An example is described by a set of attributes (aka features, variables, or dimensions)
Label (or class) is the category
Concept is the mapping from examples to labels. $c: X \rightarrow \{0, 1\}$
Concept class is a collection of concepts

Ex. A patient might be described by gender, age, weight, blood pressure, body temp, etc.

During training, the learning algorithm is supplied with labeled examples. During testing, only unlabeled examples are provided.

We often assume only two labels, 0 and 1. We also assume there is a mapping from examples to labels. $c: X \rightarrow \{0, 1\}$ . X is the space of all possible examples (aka domain or instance space).

References

A Course in Machine Learning by Hal Daumé III
COS511 Theoretical Machine Learning by Rob Schapire
Quora - What is hypothesis in machine learning?

PreviousRandomized Algorithms NextLearning Theory

Last updated 5 years ago