ML class overview: Difference between revisions

From John's wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 89: Line 89:
*** α = learning rate
*** α = learning rate
** [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]]
** [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]]
*** min Θ<sub>1</sub> J(Θ<sub>1</sub>)
*** <div>min<br>Θ<sub>1</sub></div> J(Θ<sub>1</sub>)
**** For Θ<sub>1</sub> &gt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum
**** For Θ<sub>1</sub> &gt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum
**** For Θ<sub>1</sub> &lt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum
**** For Θ<sub>1</sub> &lt; local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum

Revision as of 16:00, 5 October 2011

  • ML class
  • INTRODUCTION
    • Examples of machine learning
      • Database mining (Large datasets from growth of automation/web)
        • clickstream data
        • medical records
        • biology
        • engineering
      • Applications that can't be programmed by hand
        • autonomous helicopter
        • handwriting recognition
        • most of Natural Language Processing (NLP)
        • Computer Vision
      • Self-customising programs
        • Amazon
        • Netfilx product recommendations
      • Understanding human learning (brain, real AI)
    • What is Machine Learning?
      • Definitions of Machine Learning
        • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
        • Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
      • There are several different types of ML algorithms. The two main types are:
        • Supervised learning
          • teach computer how to do something
        • Unsupervised learning
          • computer learns by itself
      • Other types of algorithms are:
        • Reinforcement learning
        • Recommender systems
    • Supervised Learning
      • Supervised Learning in which the "right answers" are given
        • Regression: predict continuous valued output (e.g. price)
        • Classification: discrete valued output (e.g. 0 or 1)
    • Unsupervised Learning
      • Unsupervised Learning in which the categories are unknown
        • Clustering: cluster patterns (categories) are found in the data
        • Cocktail party problem: overlapping audio tracks are separated out
          • [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
  • LINEAR REGRESSION WITH ONE VARIABLE
    • Model Representation
      • e.g. housing prices, price per square-foot
        • Supervised Learning
        • Regression
      • Dataset called training set
      • Notation:
m number of training examples
x's "input" variable / features
y's "output" variable / "target" variable
(x,y) one training example
(x(i),y(i)) ith training example
      • Training Set -> Learning Algorithm -> h (hypothesis)
      • Size of house (x) -> h -> Estimated price (y)
        • h maps from x's to y's
      • How do we represent h?
        • hΘ(x) = h(x) = Θ0 + Θ1x
      • Linear regression with one variable (x)
        • Univariate linear regression
    • Cost Function
      • Helps us figure out how to fit the best possible straight line to our data
      • hΘ(x) = Θ0 + Θ1x
      • Θi's: Parameters
      • How to choose parameters (Θi's)?
        • Choose Θ0, Θ1 so that hΘ(x) is close to y for our training examples (x,y)
        • Minimise for Θ0, Θ1
          • hΘ(x(i)) = Θ0 + Θ1x(i)
        • J(Θ0,Θ1) =
        • J(Θ0,Θ1) is the Cost Function, also known in this case as the Squared Error Function
    • Cost Function - Intuition I
      • Summary:
        • Hypothesis: hΘ(x) = Θ0 + Θ1x
        • Parameters: Θ0, Θ1
        • Cost Function: J(Θ0,Θ1) =
        • Goal: minimise Θ0, Θ1 J(Θ0, Θ1)
      • Simplified:
        • hΘ(x) = Θ1x
        • minimise Θ1 J(Θ1)
      • Can plot simplified model in 2D
    • Cost Function - Intuition II
      • Can plot J(Θ0,Θ1) in 3D
      • Can plot with Contour Map (Contour Plot)
    • Gradient Descent
      • Repeat until convergence { }
      • α = learning rate
    • Gradient Descent Intuition
      • min
        Θ1
        J(Θ1)
        • For Θ1 > local minimum: positive, moves toward local minimum
        • For Θ1 < local minimum: negative, moves toward local minimum
      • If learning rate α is too small algorithm takes a long time to run
      • If learning rate α is too large algorithm may not converge or may diverge
      • When partial derivative is zero Θ1 converges
      • As we approach a local minimum, gradient descent automatically takes smaller steps
        • So no need to decrease α over time