Latest revision as of 18:59, 10 October 2011

This is an overview in point form of the content in the ML class.

INTRODUCTION

Examples of machine learning

Database mining (Large datasets from growth of automation/web)
- clickstream data
- medical records
- biology
- engineering
Applications that can't be programmed by hand
- autonomous helicopter
- handwriting recognition
- most of Natural Language Processing (NLP)
- Computer Vision
Self-customising programs
- Amazon
- Netfilx product recommendations
Understanding human learning (brain, real AI)

What is Machine Learning?

Definitions of Machine Learning
- Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
There are several different types of ML algorithms. The two main types are:
- Supervised learning
  - teach computer how to do something
- Unsupervised learning
  - computer learns by itself
Other types of algorithms are:
- Reinforcement learning
- Recommender systems

Supervised Learning

Supervised Learning in which the "right answers" are given
- Regression: predict continuous valued output (e.g. price)
- Classification: discrete valued output (e.g. 0 or 1)

Unsupervised Learning

Unsupervised Learning in which the categories are unknown
- Clustering: cluster patterns (categories) are found in the data
- Cocktail party problem: overlapping audio tracks are separated out
  - [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

e.g. housing prices, price per square-foot
- Supervised Learning
- Regression
Dataset called training set
Notation:

m	number of training examples
x's	"input" variable / features
y's	"output" variable / "target" variable
(x,y)	one training example
(x⁽ⁱ⁾,y⁽ⁱ⁾)	i^th training example

Training Set -> Learning Algorithm -> h (hypothesis)
Size of house (x) -> h -> Estimated price (y)
- h maps from x's to y's
How do we represent h?
- h_Θ(x) = h(x) = Θ₀ + Θ₁x
Linear regression with one variable (x)
- Univariate linear regression

Cost Function

Helps us figure out how to fit the best possible straight line to our data
h_Θ(x) = Θ₀ + Θ₁x
Θ_i's: Parameters
How to choose parameters (Θ_i's)?
- Choose Θ₀, Θ₁ so that h_Θ(x) is close to y for our training examples (x,y)
- Minimise ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ for Θ₀, Θ₁
  - h_Θ(x⁽ⁱ⁾) = Θ₀ + Θ₁x⁽ⁱ⁾
- J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- J(Θ₀,Θ₁) is the Cost Function, also known in this case as the Squared Error Function

Cost Function - Intuition I

Summary:
- Hypothesis: h_Θ(x) = Θ₀ + Θ₁x
- Parameters: Θ₀, Θ₁
- Cost Function: J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- Goal: minimise Θ₀, Θ₁ J(Θ₀, Θ₁)
Simplified:
- h_Θ(x) = Θ₁x
- minimise Θ₁ J(Θ₁)
Can plot simplified model in 2D

Cost Function - Intuition II

Can plot J(Θ₀,Θ₁) in 3D
Can plot with Contour Map (Contour Plot)

Gradient Descent

repeat until convergence { $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$ }
α = learning rate

Gradient Descent Intuition

min
Θ₁ J(Θ₁)
- For Θ₁ > local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ positive, moves toward local minimum
- For Θ₁ < local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ negative, moves toward local minimum
If learning rate α is too small algorithm takes a long time to run
If learning rate α is too large algorithm may not converge or may diverge
When partial derivative is zero Θ₁ converges
As we approach a local minimum, gradient descent automatically takes smaller steps
- So no need to decrease α over time

Gradient Descent for Linear Regression

Gradient descent algorithm:
- repeat until convergence {
  - $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$
  - for j=0 and j=1
- }
Linear Regression Model:
- h_Θ(x) = Θ₀ + Θ₁x
- J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
- min
  Θ₀,Θ₁ J(Θ₀,Θ₁)
Partial derivatives:
- j=0: ${\frac {\partial }{\partial \theta _{0}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
- j=1: ${\frac {\partial }{\partial \theta _{1}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
Gradient descent algorithm:
- repeat until convergence {
  - $\theta _{0}:=\theta _{0}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
  - $\theta _{1}:=\theta _{1}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
- }

What's Next

Two extensions:
- In min J(Θ₀,Θ₁), solve for Θ₀,Θ₁ exactly without needing iterative algorithm (gradient descent)
- Learn with larger number of features
Linear Algebra topics:
- What are matrices and vectors
- Addition, subtraction and multiplication with matrices and vectors
- Matrix inverse, transpose
${\begin{bmatrix}1&2\\0&1\end{bmatrix}}^{T}{\begin{bmatrix}1&2\\0&1\end{bmatrix}}$
${\begin{bmatrix}2&0\\0&1\end{bmatrix}}^{-1}{\begin{bmatrix}2\\1\end{bmatrix}}-3{\begin{bmatrix}1\\5\end{bmatrix}}$

LINEAR ALGEBRA REVIEW

Matrices and Vectors

Matrix: a rectangular array of numbers
Dimension of Matrix: number of rows by number of columns
- $\mathbb {R} ^{4\times 2}$ = 4 rows and 2 columns
- $\mathbb {R} ^{2\times 3}$ = 2 rows and 3 columns
Matrix elements:
- A_ij = "i,j entry" in the i^th row, j^th column
- $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$ $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$
  - A₁₁ = 1402
  - A₁₂ = 191
  - A₃₂ = 1437
  - A₄₁ = 147
  - A₄₃ = undefined (error)
Vector: an n x 1 matrix
- $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$ $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$
  - $\mathbb {R} ^{4}$ : n = 4; 4-dimensional vector
- y_i = i^th element
- 1-indexed vs 0-indexed
Notation (generally):
- A,B,C,X = capital = matrix
- a,b,x,y = lower case = vector or scalar

Addition and Scalar Multiplication

Matrix addition:
- ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\\0&1\end{bmatrix}}={\begin{bmatrix}5&0.5\\4&10\\3&2\end{bmatrix}}$ ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\\0&1\end{bmatrix}}={\begin{bmatrix}5&0.5\\4&10\\3&2\end{bmatrix}}$
  - 3x2 matrix + 3x2 matrix = 3x2 matrix
- ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\end{bmatrix}}=error$ ${\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}+{\begin{bmatrix}4&0.5\\2&5\end{bmatrix}}=error$
  - 3x2 matrix + 2x2 matrix = error
Scalar multiplication:
- $3\times {\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}={\begin{bmatrix}3&0\\6&15\\9&3\end{bmatrix}}={\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}\times 3$ $3\times {\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}={\begin{bmatrix}3&0\\6&15\\9&3\end{bmatrix}}={\begin{bmatrix}1&0\\2&5\\3&1\end{bmatrix}}\times 3$
  - scalar x 3x2 matrix = 3x2 matrix = 3x2 matrix x scalar
- ${\begin{bmatrix}4&0\\6&3\end{bmatrix}}/4={\frac {1}{4}}{\begin{bmatrix}4&0\\6&3\end{bmatrix}}={\begin{bmatrix}1&0\\{\frac {3}{2}}&{\frac {3}{4}}\end{bmatrix}}$
Combination of operands:
- ${\begin{aligned}3\times {\begin{bmatrix}1\\4\\2\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}3\\0\\2\end{bmatrix}}/3&={\begin{bmatrix}3\\12\\6\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}1\\0\\{\frac {2}{3}}\end{bmatrix}}&={\begin{bmatrix}2\\12\\10{\frac {1}{3}}\end{bmatrix}}\end{aligned}}$ ${\begin{aligned}3\times {\begin{bmatrix}1\\4\\2\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}3\\0\\2\end{bmatrix}}/3&={\begin{bmatrix}3\\12\\6\end{bmatrix}}+{\begin{bmatrix}0\\0\\5\end{bmatrix}}-{\begin{bmatrix}1\\0\\{\frac {2}{3}}\end{bmatrix}}&={\begin{bmatrix}2\\12\\10{\frac {1}{3}}\end{bmatrix}}\end{aligned}}$
  - 3x1 matrix = 3-dimensional vector

Matrix Vector Multiplication

${\begin{bmatrix}1&3\\4&0\\2&1\end{bmatrix}}{\begin{bmatrix}1\\5\end{bmatrix}}={\begin{bmatrix}16\\4\\7\end{bmatrix}}$ ${\begin{bmatrix}1&3\\4&0\\2&1\end{bmatrix}}{\begin{bmatrix}1\\5\end{bmatrix}}={\begin{bmatrix}16\\4\\7\end{bmatrix}}$
- 1x1 + 3x5 = 16
- 4x1 + 0x5 = 4
- 2x1 + 1x5 = 7
- 3x2 matrix * 2x1 matrix = 3x1 matrix
prediction = DataMatrix x parameters
- h_Θ(x) = -40 + 0.25x = ${\begin{bmatrix}-40\\0.25\end{bmatrix}}$
- ${\begin{bmatrix}1&2104\\1&1416\\1&1534\\1&852\end{bmatrix}}\times {\begin{bmatrix}-40\\0.25\end{bmatrix}}={\begin{bmatrix}-40\times 1+0.25\times 2104\\-40\times 1+0.25\times 1416\\-40\times 1+0.25\times 1534\\-40\times 1+0.25\times 852\\\end{bmatrix}}$

ML class overview: Difference between revisions

Latest revision as of 18:59, 10 October 2011

Contents

INTRODUCTION

Examples of machine learning

What is Machine Learning?

Supervised Learning

Unsupervised Learning

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

Cost Function

Cost Function - Intuition I

Cost Function - Intuition II

Gradient Descent

Gradient Descent Intuition

Gradient Descent for Linear Regression

What's Next

LINEAR ALGEBRA REVIEW

Matrices and Vectors

Addition and Scalar Multiplication

Matrix Vector Multiplication

Navigation menu

@@ Line 1: / Line 1: @@
-* [[ML class]]
+This is an overview in point form of the content in the [[ML class]].
-* [[ML_class#INTRODUCTION|INTRODUCTION]]
-** [[ML_class#Welcome|Examples of machine learning]]
+= [[ML_class#INTRODUCTION|INTRODUCTION]] =
-*** Database mining (Large datasets from growth of automation/web)
-**** clickstream data
+== [[ML_class#Welcome|Examples of machine learning]] ==
-**** medical records
-**** biology
+* Database mining (Large datasets from growth of automation/web)
-**** engineering
+** clickstream data
-*** Applications that can't be programmed by hand
+** medical records
-**** autonomous helicopter
+** biology
-**** handwriting recognition
+** engineering
-**** most of Natural Language Processing (NLP)
+* Applications that can't be programmed by hand
-**** Computer Vision
+** autonomous helicopter
-*** Self-customising programs
+** handwriting recognition
-**** Amazon
+** most of Natural Language Processing (NLP)
-**** Netfilx product recommendations
+** Computer Vision
-*** Understanding human learning (brain, real AI)
+* Self-customising programs
-** [[ML_class#What_is_Machine_Learning.3F|What is Machine Learning?]]
+** Amazon
-*** Definitions of Machine Learning
+** Netfilx product recommendations
-**** Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
+* Understanding human learning (brain, real AI)
-**** Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
-*** There are several different types of ML algorithms. The two main types are:
+== [[ML_class#What_is_Machine_Learning.3F|What is Machine Learning?]] ==
-**** Supervised learning
-***** teach computer how to do something
+* Definitions of Machine Learning
-**** Unsupervised learning
+** Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
-***** computer learns by itself
+** Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
-*** Other types of algorithms are:
+* There are several different types of ML algorithms. The two main types are:
-**** Reinforcement learning
+** Supervised learning
-**** Recommender systems
+*** teach computer how to do something
-** [[ML_class#Supervised_Learning|Supervised Learning]]
+** Unsupervised learning
-*** Supervised Learning in which the "right answers" are given
+*** computer learns by itself
-**** '''Regression''': predict continuous valued output (e.g. price)
+* Other types of algorithms are:
-**** '''Classification''': discrete valued output (e.g. 0 or 1)
+** Reinforcement learning
-** [[ML_class#Unsupervised_Learning|Unsupervised Learning]]
+** Recommender systems
-*** Unsupervised Learning in which the categories are unknown
-**** '''Clustering''': cluster patterns (categories) are found in the data
+== [[ML_class#Supervised_Learning|Supervised Learning]] ==
-**** '''Cocktail party problem''': overlapping audio tracks are separated out
-***** [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
+* Supervised Learning in which the "right answers" are given
-* [[ML_class#LINEAR_REGRESSION_WITH_ONE_VARIABLE|LINEAR REGRESSION WITH ONE VARIABLE]]
+** '''Regression''': predict continuous valued output (e.g. price)
-** [[ML_class#Model_Representation|Model Representation]]
+** '''Classification''': discrete valued output (e.g. 0 or 1)
-*** e.g. housing prices, price per square-foot
-**** Supervised Learning
+== [[ML_class#Unsupervised_Learning|Unsupervised Learning]] ==
-**** Regression
-*** Dataset called '''training set'''
+* Unsupervised Learning in which the categories are unknown
-*** Notation:
+** '''Clustering''': cluster patterns (categories) are found in the data
+** '''Cocktail party problem''': overlapping audio tracks are separated out
+*** [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
+= [[ML_class#LINEAR_REGRESSION_WITH_ONE_VARIABLE|LINEAR REGRESSION WITH ONE VARIABLE]] =
+== [[ML_class#Model_Representation|Model Representation]] ==
+* e.g. housing prices, price per square-foot
+** Supervised Learning
+** Regression
+* Dataset called '''training set'''
+* Notation:
 {|class="wikitable"
 | '''m'''     || number of training examples
@@ Line 55: / Line 67: @@
 | '''(x<sup>(i)</sup>,y<sup>(i)</sup>)''' || i<sup>th</sup> training example
 |}
-*** Training Set -> Learning Algorithm -> <em>h</em> (hypothesis)
+* Training Set -> Learning Algorithm -> <em>h</em> (hypothesis)
-*** Size of house (x) -> <em>h</em> -> Estimated price (y)
+* Size of house (x) -> <em>h</em> -> Estimated price (y)
-**** <em>h</em> maps from '''x''''s to '''y''''s
+** <em>h</em> maps from '''x''''s to '''y''''s
-*** How do we represent <em>h</em>?
+* How do we represent <em>h</em>?
-**** <em>h</em><sub>Θ</sub>(x) = <em>h</em>(x) = Θ<sub>0</sub> + Θ<sub>1</sub>x
+** <em>h</em><sub>Θ</sub>(x) = <em>h</em>(x) = Θ<sub>0</sub> + Θ<sub>1</sub>x
-*** Linear regression with one variable (x)
+* Linear regression with one variable (x)
-**** Univariate linear regression
+** Univariate linear regression
-** [[ML_class#Cost_Function|Cost Function]]
-*** Helps us figure out how to fit the best possible straight line to our data
+== [[ML_class#Cost_Function|Cost Function]] ==
-*** <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x
-*** <em>Θ<sub>i</sub></em>'s: Parameters
+* Helps us figure out how to fit the best possible straight line to our data
-*** How to choose parameters (<em>Θ<sub>i</sub></em>'s)?
+* <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x
-**** Choose <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub> so that <em>h<sub>Θ</sub></em>(x) is close to <em>y</em> for our training examples (<em>x</em>,<em>y</em>)
+* <em>Θ<sub>i</sub></em>'s: Parameters
-**** Minimise <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math> for <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>
+* How to choose parameters (<em>Θ<sub>i</sub></em>'s)?
-***** <em>h<sub>Θ</sub></em>(x<sup>(i)</sup>) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x<sup>(i)</sup>
+** Choose <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub> so that <em>h<sub>Θ</sub></em>(x) is close to <em>y</em> for our training examples (<em>x</em>,<em>y</em>)
-**** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
+** Minimise <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math> for <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>
-**** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) is the Cost Function, also known in this case as the [http://en.wikipedia.org/wiki/Mean_squared_error Squared Error Function]
+*** <em>h<sub>Θ</sub></em>(x<sup>(i)</sup>) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x<sup>(i)</sup>
-** [[ML_class#Cost_Function_-_Intuition_I|Cost Function - Intuition I]]
+** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
-*** Summary:
+** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) is the Cost Function, also known in this case as the [http://en.wikipedia.org/wiki/Mean_squared_error Squared Error Function]
-**** '''Hypothesis''': <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x
-**** '''Parameters''': <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>
+== [[ML_class#Cost_Function_-_Intuition_I|Cost Function - Intuition I]] ==
-**** '''Cost Function''': J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
-**** '''Goal''': minimise <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub> J(<em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>)
+* Summary:
-*** Simplified:
+** '''Hypothesis''': <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x
-**** <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>1</sub>x
+** '''Parameters''': <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>
-**** minimise <em>Θ</em><sub>1</sub> J(<em>Θ</em><sub>1</sub>)
+** '''Cost Function''': J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
-*** Can plot simplified model in 2D
+** '''Goal''': minimise <em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub> J(<em>Θ</em><sub>0</sub>, <em>Θ</em><sub>1</sub>)
-** [[ML_class#Cost_Function_-_Intuition_II|Cost Function - Intuition II]]
+* Simplified:
-*** Can plot J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) in 3D
+** <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>1</sub>x
-*** Can plot with Contour Map (Contour Plot)
+** minimise <em>Θ</em><sub>1</sub> J(<em>Θ</em><sub>1</sub>)
-** [[ML_class#Gradient_Descent|Gradient Descent]]
+* Can plot simplified model in 2D
-*** Repeat until convergence { <math>\theta_j \colon= \theta_j - \alpha\frac{\part}{\part\theta_j}J(\theta_0,\theta_1)</math> }
-*** α = learning rate
+== [[ML_class#Cost_Function_-_Intuition_II|Cost Function - Intuition II]] ==
-** [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]]
-*** <div style="display:inline;">min<br>Θ<sub>1</sub></div> J(Θ<sub>1</sub>)
+* Can plot J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) in 3D
-**** For Θ<sub>1</sub> &gt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum
+* Can plot with Contour Map (Contour Plot)
-**** For Θ<sub>1</sub> &lt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum
-*** If learning rate α is too small algorithm takes a long time to run
+== [[ML_class#Gradient_Descent|Gradient Descent]] ==
-*** If learning rate α is too large algorithm may not converge or may diverge
-*** When partial derivative is zero Θ<sub>1</sub> converges
+* repeat until convergence { <math>\theta_j \colon= \theta_j - \alpha\frac{\part}{\part\theta_j}J(\theta_0,\theta_1)</math> }
-*** As we approach a local minimum, gradient descent automatically takes smaller steps
+* α = learning rate
-**** So no need to decrease α over time
+== [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]] ==
+* <table style="display:inline;border:none;"><tr><td style="border:none;text-align:center;">min<br>Θ<sub>1</sub></td><td style="border:none;">J(Θ<sub>1</sub>)</td></tr></table>
+** For Θ<sub>1</sub> &gt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum
+** For Θ<sub>1</sub> &lt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum
+* If learning rate α is too small algorithm takes a long time to run
+* If learning rate α is too large algorithm may not converge or may diverge
+* When partial derivative is zero Θ<sub>1</sub> converges
+* As we approach a local minimum, gradient descent automatically takes smaller steps
+** So no need to decrease α over time
+== [[ML_class#Gradient_Descent_for_Linear_Regression|Gradient Descent for Linear Regression]] ==
+* Gradient descent algorithm:
+** repeat until convergence {
+*** <math>\theta_j \colon= \theta_j - \alpha\frac{\part}{\part\theta_j}J(\theta_0,\theta_1)</math>
+*** for j=0 and j=1
+** }
+* Linear Regression Model:
+** <em>h<sub>Θ</sub></em>(x) = <em>Θ</em><sub>0</sub> + <em>Θ</em><sub>1</sub>x
+** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
+** <table style="display:inline;border:none;"><tr><td style="border:none;text-align:center;">min<br>Θ<sub>0</sub>,Θ<sub>1</sub></td><td style="border:none;">J(Θ<sub>0</sub>,Θ<sub>1</sub>)</td></tr></table>
+* Partial derivatives:
+** j=0: <math>\frac{\part}{\part\theta_0}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})</math>
+** j=1: <math>\frac{\part}{\part\theta_1}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) x^{(i)}</math>
+* Gradient descent algorithm:
+** repeat until convergence {
+*** <math>\theta_0 := \theta_0 -\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})</math>
+*** <math>\theta_1 := \theta_1 -\alpha \frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}) x^{(i)}</math>
+** }
+== [[ML_class#What.27s_Next|What's Next]] ==
+* Two extensions:
+** In <code>min J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>)</code>, solve for <em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub> exactly without needing iterative algorithm (gradient descent)
+** Learn with larger number of features
+* Linear Algebra topics:
+** What are matrices and vectors
+** Addition, subtraction and multiplication with matrices and vectors
+** Matrix inverse, transpose
+* <math>\begin{bmatrix}1 & 2 \\0 & 1 \end{bmatrix}^T\begin{bmatrix}1 & 2 \\0 & 1 \end{bmatrix}</math>
+* <math>\begin{bmatrix}2 & 0 \\0 & 1 \end{bmatrix}^{-1}\begin{bmatrix}2 \\1 \end{bmatrix}-3\begin{bmatrix}1 \\5 \end{bmatrix}</math>
+= [[ML_class#LINEAR_ALGEBRA_REVIEW|LINEAR ALGEBRA REVIEW]] =
+== [[ML_class#Matrices_and_Vectors|Matrices and Vectors]] ==
+* '''Matrix''': a rectangular array of numbers
+* '''Dimension of Matrix''': number of rows by number of columns
+** <math>\mathbb R^{4\times2}</math> = 4 rows and 2 columns
+** <math>\mathbb R^{2\times3}</math> = 2 rows and 3 columns
+* Matrix elements:
+** <em>A<sub>ij</sub></em> = "<em>i,j</em> entry" in the <em>i<sup>th</sup></em> row, <em>j<sup>th</sup></em> column
+** <math>A = \begin{bmatrix}1402 & 191 \\1371 & 821 \\949 & 1437 \\147 & 1448 \end{bmatrix}</math>
+*** <em>A<sub>11</sub></em> = 1402
+*** <em>A<sub>12</sub></em> = 191
+*** <em>A<sub>32</sub></em> = 1437
+*** <em>A<sub>41</sub></em> = 147
+*** <em>A<sub>43</sub></em> = undefined (error)
+* '''Vector''': an n x 1 matrix
+** <math>y = \begin{bmatrix}460 \\232 \\315 \\178 \end{bmatrix}</math>
+*** <math>\mathbb R^4</math>: n = 4; 4-dimensional vector
+** <em>y<sub>i</sub></em> = <em>i<sup>th</sup></em> element
+** 1-indexed vs 0-indexed
+* Notation (generally):
+** A,B,C,X = capital = matrix
+** a,b,x,y = lower case = vector or scalar
+== [[ML_class#Addition_and_Scalar_Multiplication|Addition and Scalar Multiplication]] ==
+* Matrix addition:
+** <math>\begin{bmatrix}1 & 0 \\2 & 5 \\3 & 1\end{bmatrix} + \begin{bmatrix}4 & 0.5 \\2 & 5 \\0 & 1\end{bmatrix} = \begin{bmatrix}5 & 0.5 \\4 & 10 \\3 & 2\end{bmatrix}</math>
+*** 3x2 matrix + 3x2 matrix = 3x2 matrix
+** <math>\begin{bmatrix}1 & 0 \\2 & 5 \\3 & 1\end{bmatrix} + \begin{bmatrix}4 & 0.5 \\2 & 5\end{bmatrix} = error</math>
+*** 3x2 matrix + 2x2 matrix = error
+* Scalar multiplication:
+** <math>3 \times \begin{bmatrix}1 & 0 \\2 & 5 \\3 & 1\end{bmatrix} = \begin{bmatrix}3 & 0 \\6 & 15 \\9 & 3\end{bmatrix} = \begin{bmatrix}1 & 0 \\2 & 5 \\3 & 1\end{bmatrix} \times 3</math>
+*** scalar x 3x2 matrix = 3x2 matrix = 3x2 matrix x scalar
+** <math>\begin{bmatrix}4 & 0 \\6 & 3\end{bmatrix} / 4 = \frac{1}{4}\begin{bmatrix}4 & 0 \\6 & 3\end{bmatrix} = \begin{bmatrix}1 & 0 \\\frac{3}{2} & \frac{3}{4}\end{bmatrix}</math>
+* Combination of operands:
+** <math>\begin{align}
+\times \begin{bmatrix}1 \\4 \\2 \end{bmatrix} + \begin{bmatrix}0 \\0 \\5 \end{bmatrix} - \begin{bmatrix}3 \\0 \\2 \end{bmatrix} / 3
+&= \begin{bmatrix}3 \\12 \\6 \end{bmatrix} + \begin{bmatrix}0 \\0 \\5 \end{bmatrix} - \begin{bmatrix}1 \\0 \\\frac{2}{3} \end{bmatrix}
+&= \begin{bmatrix}2 \\12 \\10\frac{1}{3} \end{bmatrix}
+\end{align}</math>
+*** 3x1 matrix = 3-dimensional vector
+== [[ML_class#Matrix_Vector_Multiplication|Matrix Vector Multiplication]] ==
+* <math>\begin{bmatrix}1 & 3 \\4 & 0 \\2 & 1\end{bmatrix} \begin{bmatrix}1 \\5 \end{bmatrix} = \begin{bmatrix}16 \\4 \\7 \end{bmatrix}</math>
+** 1x1 + 3x5 = 16
+** 4x1 + 0x5 = 4
+** 2x1 + 1x5 = 7
+** 3x2 matrix * 2x1 matrix = 3x1 matrix
+* prediction = DataMatrix x parameters
+** <em>h</em><sub>Θ</sub>(x) = -40 + 0.25x = <math>\begin{bmatrix}-40 \\0.25 \end{bmatrix}</math>
+** <math>\begin{bmatrix}1 & 2104 \\1 & 1416 \\1 & 1534 \\1 & 852 \end{bmatrix} \times \begin{bmatrix}-40 \\0.25 \end{bmatrix} = \begin{bmatrix}-40\times1 + 0.25 \times 2104 \\-40\times1 + 0.25 \times 1416 \\-40\times1 + 0.25 \times 1534 \\-40\times1 + 0.25 \times 852 \\ \end{bmatrix}</math>

ML class overview: Difference between revisions

Latest revision as of 18:59, 10 October 2011

Navigation menu

Search