ML class overview: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 89: | Line 89: | ||
*** α = learning rate | *** α = learning rate | ||
** [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]] | ** [[ML_class#Gradient_Descent_Intuition|Gradient Descent Intuition]] | ||
*** <table style="display:inline;border:none;"><tr><td style="border:none; text-align:center;">min<br>Θ<sub>1</sub></td><td style="border:none">J(Θ<sub>1</sub>)</td></tr></table> | *** <table style="display:inline;border:none;margin:0;padding:0;"><tr><td style="border:none; text-align:center;">min<br>Θ<sub>1</sub></td><td style="border:none">J(Θ<sub>1</sub>)</td></tr></table> | ||
**** For Θ<sub>1</sub> > local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum | **** For Θ<sub>1</sub> > local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> positive, moves toward local minimum | ||
**** For Θ<sub>1</sub> < local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum | **** For Θ<sub>1</sub> < local minimum: <math>\frac{d}{d\theta_1}J(\theta_1)</math> negative, moves toward local minimum |
Revision as of 16:08, 5 October 2011
- ML class
- INTRODUCTION
- Examples of machine learning
- Database mining (Large datasets from growth of automation/web)
- clickstream data
- medical records
- biology
- engineering
- Applications that can't be programmed by hand
- autonomous helicopter
- handwriting recognition
- most of Natural Language Processing (NLP)
- Computer Vision
- Self-customising programs
- Amazon
- Netfilx product recommendations
- Understanding human learning (brain, real AI)
- Database mining (Large datasets from growth of automation/web)
- What is Machine Learning?
- Definitions of Machine Learning
- Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
- Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
- There are several different types of ML algorithms. The two main types are:
- Supervised learning
- teach computer how to do something
- Unsupervised learning
- computer learns by itself
- Supervised learning
- Other types of algorithms are:
- Reinforcement learning
- Recommender systems
- Definitions of Machine Learning
- Supervised Learning
- Supervised Learning in which the "right answers" are given
- Regression: predict continuous valued output (e.g. price)
- Classification: discrete valued output (e.g. 0 or 1)
- Supervised Learning in which the "right answers" are given
- Unsupervised Learning
- Unsupervised Learning in which the categories are unknown
- Clustering: cluster patterns (categories) are found in the data
- Cocktail party problem: overlapping audio tracks are separated out
- [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
- Unsupervised Learning in which the categories are unknown
- Examples of machine learning
- LINEAR REGRESSION WITH ONE VARIABLE
- Model Representation
- e.g. housing prices, price per square-foot
- Supervised Learning
- Regression
- Dataset called training set
- Notation:
- e.g. housing prices, price per square-foot
- Model Representation
m | number of training examples |
x's | "input" variable / features |
y's | "output" variable / "target" variable |
(x,y) | one training example |
(x(i),y(i)) | ith training example |
- Training Set -> Learning Algorithm -> h (hypothesis)
- Size of house (x) -> h -> Estimated price (y)
- h maps from x's to y's
- How do we represent h?
- hΘ(x) = h(x) = Θ0 + Θ1x
- Linear regression with one variable (x)
- Univariate linear regression
- Cost Function
- Helps us figure out how to fit the best possible straight line to our data
- hΘ(x) = Θ0 + Θ1x
- Θi's: Parameters
- How to choose parameters (Θi's)?
- Choose Θ0, Θ1 so that hΘ(x) is close to y for our training examples (x,y)
- Minimise for Θ0, Θ1
- hΘ(x(i)) = Θ0 + Θ1x(i)
- J(Θ0,Θ1) =
- J(Θ0,Θ1) is the Cost Function, also known in this case as the Squared Error Function
- Cost Function - Intuition I
- Summary:
- Hypothesis: hΘ(x) = Θ0 + Θ1x
- Parameters: Θ0, Θ1
- Cost Function: J(Θ0,Θ1) =
- Goal: minimise Θ0, Θ1 J(Θ0, Θ1)
- Simplified:
- hΘ(x) = Θ1x
- minimise Θ1 J(Θ1)
- Can plot simplified model in 2D
- Summary:
- Cost Function - Intuition II
- Can plot J(Θ0,Θ1) in 3D
- Can plot with Contour Map (Contour Plot)
- Gradient Descent
- Repeat until convergence { }
- α = learning rate
- Gradient Descent Intuition
min
Θ1J(Θ1) - For Θ1 > local minimum: positive, moves toward local minimum
- For Θ1 < local minimum: negative, moves toward local minimum
- If learning rate α is too small algorithm takes a long time to run
- If learning rate α is too large algorithm may not converge or may diverge
- When partial derivative is zero Θ1 converges
- As we approach a local minimum, gradient descent automatically takes smaller steps
- So no need to decrease α over time