Machine Learning
Intro
Machine learning is a core subbranch of AI. In general, ML is about learning to do better in the future based on what was experienced in the past. In the field of computer science, we design algorithms to learn data (i.e. finding patterns) and do stuff (i.e. making predictions).
Course Goal
Identify if ML is an apporopriate solution for a problem
What types of algorithms might be applicable
How to apply these algorithms
Topics
Basic Supervised Learning
Decision Trees
KNN
Perceptron
Advanced Supervised Learning
Linear regression and gradient descnet
Support Vector Machines
Probabilistic models
Neural Networks
Kernels
Ensemble Learning
Unsupervised Learning
K-means
PCA
Expectation Maximization
Notation
Problem Setting
Set of possible instances
an instance is a feature vector
Labels ( if binary)
Unknown target function
Set of hypotheses
i.e. each can be a decision tree. but performing an exhaustive search to find the best is too expensive. so we want some heuristic approach and only pick the feature that max/min accuracy/error.
Input - Training examples of
Output
A hypothesis that best approximates
Loss Function
Data Distribution (Unknown)
Assume our training data are random samples from a probability distribution over pairs.
Expected Loss
is the expected loss of over wrt (we want to minimize it)
But we don't know ʕº̫͡ºʔ
We can only compute the training error
Learning Algorithms
As usual, we care about time and space efficiency. But we also care a great deal about the amount of data we need. 3 criteria for successful learning: 1. enough data 2. a rule that makes a low number of mistakes on the training data 3. make the rule as simple as possible Very often there is trade-off between 2 and 3.
Learning Models
Example (or instance) is the object being classified
An example is described by a set of attributes (aka features, variables, or dimensions)
Label (or class) is the category
Concept is the mapping from examples to labels.
Concept class is a collection of concepts
Ex. A patient might be described by gender, age, weight, blood pressure, body temp, etc.
During training, the learning algorithm is supplied with labeled examples. During testing, only unlabeled examples are provided.
We often assume only two labels, 0 and 1. We also assume there is a mapping from examples to labels. . X is the space of all possible examples (aka domain or instance space).
References
A Course in Machine Learning by Hal Daumé III
COS511 Theoretical Machine Learning by Rob Schapire
Last updated