ML class overview: Difference between revisions

Revision as of 00:16, 8 October 2011

ML class
INTRODUCTION
- Examples of machine learning
  - Database mining (Large datasets from growth of automation/web)
    - clickstream data
    - medical records
    - biology
    - engineering
  - Applications that can't be programmed by hand
    - autonomous helicopter
    - handwriting recognition
    - most of Natural Language Processing (NLP)
    - Computer Vision
  - Self-customising programs
    - Amazon
    - Netfilx product recommendations
  - Understanding human learning (brain, real AI)
- What is Machine Learning?
  - Definitions of Machine Learning
    - Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
    - Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
  - There are several different types of ML algorithms. The two main types are:
    - Supervised learning
      - teach computer how to do something
    - Unsupervised learning
      - computer learns by itself
  - Other types of algorithms are:
    - Reinforcement learning
    - Recommender systems
- Supervised Learning
  - Supervised Learning in which the "right answers" are given
    - Regression: predict continuous valued output (e.g. price)
    - Classification: discrete valued output (e.g. 0 or 1)
- Unsupervised Learning
  - Unsupervised Learning in which the categories are unknown
    - Clustering: cluster patterns (categories) are found in the data
    - Cocktail party problem: overlapping audio tracks are separated out
      - [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
LINEAR REGRESSION WITH ONE VARIABLE
- Model Representation
  - e.g. housing prices, price per square-foot
    - Supervised Learning
    - Regression
  - Dataset called training set
  - Notation:

m	number of training examples
x's	"input" variable / features
y's	"output" variable / "target" variable
(x,y)	one training example
(x⁽ⁱ⁾,y⁽ⁱ⁾)	i^th training example

- - Training Set -> Learning Algorithm -> h (hypothesis)
  - Size of house (x) -> h -> Estimated price (y)
    - h maps from x's to y's
  - How do we represent h?
    - h_Θ(x) = h(x) = Θ₀ + Θ₁x
  - Linear regression with one variable (x)
    - Univariate linear regression
- Cost Function
  - Helps us figure out how to fit the best possible straight line to our data
  - h_Θ(x) = Θ₀ + Θ₁x
  - Θ_i's: Parameters
  - How to choose parameters (Θ_i's)?
    - Choose Θ₀, Θ₁ so that h_Θ(x) is close to y for our training examples (x,y)
    - Minimise ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ for Θ₀, Θ₁
      - h_Θ(x⁽ⁱ⁾) = Θ₀ + Θ₁x⁽ⁱ⁾
    - J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - J(Θ₀,Θ₁) is the Cost Function, also known in this case as the Squared Error Function
- Cost Function - Intuition I
  - Summary:
    - Hypothesis: h_Θ(x) = Θ₀ + Θ₁x
    - Parameters: Θ₀, Θ₁
    - Cost Function: J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - Goal: minimise Θ₀, Θ₁ J(Θ₀, Θ₁)
  - Simplified:
    - h_Θ(x) = Θ₁x
    - minimise Θ₁ J(Θ₁)
  - Can plot simplified model in 2D
- Cost Function - Intuition II
  - Can plot J(Θ₀,Θ₁) in 3D
  - Can plot with Contour Map (Contour Plot)
- Gradient Descent
  - repeat until convergence { $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$ }
  - α = learning rate
- Gradient Descent Intuition
  - min
    Θ₁ J(Θ₁)
    - For Θ₁ > local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ positive, moves toward local minimum
    - For Θ₁ < local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ negative, moves toward local minimum
  - If learning rate α is too small algorithm takes a long time to run
  - If learning rate α is too large algorithm may not converge or may diverge
  - When partial derivative is zero Θ₁ converges
  - As we approach a local minimum, gradient descent automatically takes smaller steps
    - So no need to decrease α over time
- Gradient Descent for Linear Regression
  - Gradient descent algorithm:
    - repeat until convergence { $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$ ; for j=1 and j=0 }
  - Linear Regression Model:
    - h_Θ(x) = Θ₀ + Θ₁x
    - J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - min
      Θ₀,Θ₁ J(Θ₀,Θ₁)
  - j=0: ${\frac {\partial }{\partial \theta _{0}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
  - j=1: ${\frac {\partial }{\partial \theta _{1}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)}).x^{(i)}$

@@ Line 97: / Line 97: @@
 *** As we approach a local minimum, gradient descent automatically takes smaller steps
 **** So no need to decrease α over time
-** [ML_class#Gradient_Descent_for_Linear_Regression|Gradient Descent for Linear Regression]
+** [[ML_class#Gradient_Descent_for_Linear_Regression|Gradient Descent for Linear Regression]]
 *** Gradient descent algorithm:
 **** repeat until convergence { <math>\theta_j \colon= \theta_j - \alpha\frac{\part}{\part\theta_j}J(\theta_0,\theta_1)</math>; for j=1 and j=0 }
@@ Line 104: / Line 104: @@
 **** J(<em>Θ</em><sub>0</sub>,<em>Θ</em><sub>1</sub>) = <math>\frac{1}{2m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})^2</math>
 **** <table style="display:inline;border:none;"><tr><td style="border:none;text-align:center;">min<br>Θ<sub>0</sub>,Θ<sub>1</sub></td><td style="border:none;">J(Θ<sub>0</sub>,Θ<sub>1</sub>)</td></tr></table>
-*** j=0: <math>\frac{\part}{\part\theta_0}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})
+*** j=0: <math>\frac{\part}{\part\theta_0}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})</math>
+*** j=1: <math>\frac{\part}{\part\theta_1}J(\theta_0,\theta_1)=\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)}).x^{(i)}</math>

ML class overview: Difference between revisions

Revision as of 00:16, 8 October 2011

Navigation menu

Search