This is an overview in point form of the content in the ML class.

INTRODUCTION

Examples of machine learning

Database mining (Large datasets from growth of automation/web)
- clickstream data
- medical records
- biology
- engineering
Applications that can't be programmed by hand
- autonomous helicopter
- handwriting recognition
- most of Natural Language Processing (NLP)
- Computer Vision
Self-customising programs
- Amazon
- Netfilx product recommendations
Understanding human learning (brain, real AI)

What is Machine Learning?

Definitions of Machine Learning
- Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
There are several different types of ML algorithms. The two main types are:
- Supervised learning
  - teach computer how to do something
- Unsupervised learning
  - computer learns by itself
Other types of algorithms are:
- Reinforcement learning
- Recommender systems

Supervised Learning

Supervised Learning in which the "right answers" are given
- Regression: predict continuous valued output (e.g. price)
- Classification: discrete valued output (e.g. 0 or 1)

Unsupervised Learning

Unsupervised Learning in which the categories are unknown
- Clustering: cluster patterns (categories) are found in the data
- Cocktail party problem: overlapping audio tracks are separated out
  - [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

e.g. housing prices, price per square-foot
- Supervised Learning
- Regression
Dataset called training set
Notation:

m	number of training examples
x's	"input" variable / features
y's	"output" variable / "target" variable
(x,y)	one training example
(x⁽ⁱ⁾,y⁽ⁱ⁾)	i^th training example

Training Set -> Learning Algorithm -> h (hypothesis)
Size of house (x) -> h -> Estimated price (y)
- h maps from x's to y's
How do we represent h?
- h_Θ(x) = h(x) = Θ₀ + Θ₁x
Linear regression with one variable (x)
- Univariate linear regression

Cost Function

Helps us figure out how to fit the best possible straight line to our data
h_Θ(x) = Θ₀ + Θ₁x
Θ_i's: Parameters
How to choose parameters (Θ_i's)?
- Choose Θ₀, Θ₁ so that h_Θ(x) is close to y for our training examples (x,y)
- Minimise ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ for Θ₀, Θ₁
  - h_Θ(x⁽ⁱ⁾) = Θ₀ + Θ₁x⁽ⁱ⁾
- J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- J(Θ₀,Θ₁) is the Cost Function, also known in this case as the Squared Error Function

Cost Function - Intuition I

Summary:
- Hypothesis: h_Θ(x) = Θ₀ + Θ₁x
- Parameters: Θ₀, Θ₁
- Cost Function: J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- Goal: minimise Θ₀, Θ₁ J(Θ₀, Θ₁)
Simplified:
- h_Θ(x) = Θ₁x
- minimise Θ₁ J(Θ₁)
Can plot simplified model in 2D

Cost Function - Intuition II

Can plot J(Θ₀,Θ₁) in 3D
Can plot with Contour Map (Contour Plot)

Gradient Descent

repeat until convergence { $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$ }
α = learning rate

Gradient Descent Intuition

min
Θ₁ J(Θ₁)
- For Θ₁ > local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ positive, moves toward local minimum
- For Θ₁ < local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ negative, moves toward local minimum
If learning rate α is too small algorithm takes a long time to run
If learning rate α is too large algorithm may not converge or may diverge
When partial derivative is zero Θ₁ converges
As we approach a local minimum, gradient descent automatically takes smaller steps
- So no need to decrease α over time

Gradient Descent for Linear Regression

Gradient descent algorithm:
- repeat until convergence {
  - $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$
  - for j=0 and j=1
- }
Linear Regression Model:
- h_Θ(x) = Θ₀ + Θ₁x
- J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- min
  Θ₀,Θ₁ J(Θ₀,Θ₁)
Partial derivatives:
- j=0: ${\frac {\partial }{\partial \theta _{0}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
- j=1: ${\frac {\partial }{\partial \theta _{1}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
Gradient descent algorithm:
- repeat until convergence {
  - $\theta _{0}:=\theta _{0}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
  - $\theta _{1}:=\theta _{1}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
- }

What's Next

Two extensions:
- In min J(Θ₀,Θ₁), solve for Θ₀,Θ₁ exactly without needing iterative algorithm (gradient descent)
- Learn with larger number of features
Linear Algebra topics:
- What are matrices and vectors
- Addition, subtraction and multiplication with matrices and vectors
- Matrix inverse, transpose
${\begin{bmatrix}1&2\\0&1\end{bmatrix}}^{T}{\begin{bmatrix}1&2\\0&1\end{bmatrix}}$
${\begin{bmatrix}2&0\\0&1\end{bmatrix}}^{-1}{\begin{bmatrix}2\\1\end{bmatrix}}-3{\begin{bmatrix}1\\5\end{bmatrix}}$

LINEAR ALGEBRA REVIEW

Matrices and Vectors

Matrix: a rectangular array of numbers
Dimension of Matrix: number of rows by number of columns
- $\mathbb {R} ^{4\times 2}$ = 4 rows and 2 columns
- $\mathbb {R} ^{2\times 3}$ = 2 rows and 3 columns
Matrix elements:
- A_ij = "i,j entry" in the i^th row, j^th column
- $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$ $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$
  - A₁₁ = 1402
  - A₁₂ = 191
  - A₃₂ = 1437
  - A₄₁ = 147
  - A₄₃ = undefined (error)
Vector: an n x 1 matrix
- $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$ $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$
  - $\mathbb {R} ^{4}$ : n = 4; 4-dimensional vector
- y_i = i^th element
- 1-indexed vs 0-indexed
Notation (generally):
- A,B,C,X = capital = matrix
- a,b,x,y = lower case = vector or scalar

Addition and Scalar Multiplication

Matrix addition:
- ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\\0&1\end{bmatrix}}={\begin{bmatrix}5&0.5\\4&10\\3&2\end{bmatrix}}$ ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\\0&1\end{bmatrix}}={\begin{bmatrix}5&0.5\\4&10\\3&2\end{bmatrix}}$
  - 3x2 matrix + 3x2 matrix = 3x2 matrix
- ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\end{bmatrix}}=error$ ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\end{bmatrix}}=error$
  - 3x2 matrix + 2x2 matrix = error
Scalar multiplication:
- $3\times {\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}={\begin{bmatrix}3&0\\6&15\\9&3\end{bmatrix}}={\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}\times 3$ $3\times {\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}={\begin{bmatrix}3&0\\6&15\\9&3\end{bmatrix}}={\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}\times 3$
  - scalar x 3x2 matrix = 3x2 matrix = 3x2 matrix x scalar
- ${\begin{bmatrix}4&0\\6&3\end{bmatrix}}/4={\frac {1}{4}}{\begin{bmatrix}4&0\\6&3\end{bmatrix}}={\begin{bmatrix}1&0\\{\frac {3}{2}}&{\frac {3}{4}}\end{bmatrix}}$
Combination of operands:
- ${\begin{aligned}3\times {\begin{bmatrix}1\\4\\2\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}3\\0\\2\end{bmatrix}}/3&={\begin{bmatrix}3\\12\\6\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}1\\0\\{\frac {2}{3}}\end{bmatrix}}&={\begin{bmatrix}2\\12\\10{\frac {1}{3}}\end{bmatrix}}\end{aligned}}$ ${\begin{aligned}3\times {\begin{bmatrix}1\\4\\2\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}3\\0\\2\end{bmatrix}}/3&={\begin{bmatrix}3\\12\\6\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}1\\0\\{\frac {2}{3}}\end{bmatrix}}&={\begin{bmatrix}2\\12\\10{\frac {1}{3}}\end{bmatrix}}\end{aligned}}$
  - 3x1 matrix = 3-dimensional vector

Matrix Vector Multiplication

${\begin{bmatrix}1&3\\4&0\\2&1\end{bmatrix}}{\begin{bmatrix}1\\5\end{bmatrix}}={\begin{bmatrix}16\\4\\7\end{bmatrix}}$ ${\begin{bmatrix}1&3\\4&0\\2&1\end{bmatrix}}{\begin{bmatrix}1\\5\end{bmatrix}}={\begin{bmatrix}16\\4\\7\end{bmatrix}}$
- 1x1 + 3x5 = 16
- 4x1 + 0x5 = 4
- 2x1 + 1x5 = 7
- 3x2 matrix * 2x1 matrix = 3x1 matrix
prediction = DataMatrix x parameters
- h_Θ(x) = -40 + 0.25x = ${\begin{bmatrix}-40\\0.25\end{bmatrix}}$
- ${\begin{bmatrix}1&2104\\1&1416\\1&1534\\1&852\end{bmatrix}}\times {\begin{bmatrix}-40\\0.25\end{bmatrix}}={\begin{bmatrix}-40\times 1+0.25\times 2104\\-40\times 1+0.25\times 1416\\-40\times 1+0.25\times 1534\\-40\times 1+0.25\times 852\\\end{bmatrix}}$

ML class overview

Contents

INTRODUCTION

Examples of machine learning

What is Machine Learning?

Supervised Learning

Unsupervised Learning

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

Cost Function

Cost Function - Intuition I

Cost Function - Intuition II

Gradient Descent

Gradient Descent Intuition

Gradient Descent for Linear Regression

What's Next

LINEAR ALGEBRA REVIEW

Matrices and Vectors

Addition and Scalar Multiplication

Matrix Vector Multiplication

Navigation menu

ML class overview

Navigation menu

Search