ML class overview

ML class
INTRODUCTION
- Examples of machine learning
  - Database mining (Large datasets from growth of automation/web)
    - clickstream data
    - medical records
    - biology
    - engineering
  - Applications that can't be programmed by hand
    - autonomous helicopter
    - handwriting recognition
    - most of Natural Language Processing (NLP)
    - Computer Vision
  - Self-customising programs
    - Amazon
    - Netfilx product recommendations
  - Understanding human learning (brain, real AI)
- What is Machine Learning?
  - Definitions of Machine Learning
    - Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
    - Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
  - There are several different types of ML algorithms. The two main types are:
    - Supervised learning
      - teach computer how to do something
    - Unsupervised learning
      - computer learns by itself
  - Other types of algorithms are:
    - Reinforcement learning
    - Recommender systems
- Supervised Learning
  - Supervised Learning in which the "right answers" are given
    - Regression: predict continuous valued output (e.g. price)
    - Classification: discrete valued output (e.g. 0 or 1)
- Unsupervised Learning
  - Unsupervised Learning in which the categories are unknown
    - Clustering: cluster patterns (categories) are found in the data
    - Cocktail party problem: overlapping audio tracks are separated out
      - [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');
LINEAR REGRESSION WITH ONE VARIABLE
- Model Representation
  - e.g. housing prices, price per square-foot
    - Supervised Learning
    - Regression
  - Dataset called training set
  - Notation:

m	number of training examples
x's	"input" variable / features
y's	"output" variable / "target" variable
(x,y)	one training example
(x⁽ⁱ⁾,y⁽ⁱ⁾)	i^th training example

- - Training Set -> Learning Algorithm -> h (hypothesis)
  - Size of house (x) -> h -> Estimated price (y)
    - h maps from x's to y's
  - How do we represent h?
    - h_Θ(x) = h(x) = Θ₀ + Θ₁x
  - Linear regression with one variable (x)
    - Univariate linear regression
- Cost Function
  - Helps us figure out how to fit the best possible straight line to our data
  - h_Θ(x) = Θ₀ + Θ₁x
  - Θ_i's: Parameters
  - How to choose parameters (Θ_i's)?
    - Choose Θ₀, Θ₁ so that h_Θ(x) is close to y for our training examples (x,y)
    - Minimise ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$ for Θ₀, Θ₁
      - h_Θ(x⁽ⁱ⁾) = Θ₀ + Θ₁x⁽ⁱ⁾
    - J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - J(Θ₀,Θ₁) is the Cost Function, also known in this case as the Squared Error Function
- Cost Function - Intuition I
  - Summary:
    - Hypothesis: h_Θ(x) = Θ₀ + Θ₁x
    - Parameters: Θ₀, Θ₁
    - Cost Function: J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - Goal: minimise Θ₀, Θ₁ J(Θ₀, Θ₁)
  - Simplified:
    - h_Θ(x) = Θ₁x
    - minimise Θ₁ J(Θ₁)
  - Can plot simplified model in 2D
- Cost Function - Intuition II
  - Can plot J(Θ₀,Θ₁) in 3D
  - Can plot with Contour Map (Contour Plot)
- Gradient Descent
  - repeat until convergence { $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$ }
  - α = learning rate
- Gradient Descent Intuition
  - min
    Θ₁ J(Θ₁)
    - For Θ₁ > local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ positive, moves toward local minimum
    - For Θ₁ < local minimum: ${\frac {d}{d\theta _{1}}}J(\theta _{1})$ negative, moves toward local minimum
  - If learning rate α is too small algorithm takes a long time to run
  - If learning rate α is too large algorithm may not converge or may diverge
  - When partial derivative is zero Θ₁ converges
  - As we approach a local minimum, gradient descent automatically takes smaller steps
    - So no need to decrease α over time
- Gradient Descent for Linear Regression
  - Gradient descent algorithm:
    - repeat until convergence {
      - $\theta _{j}\colon =\theta _{j}-\alpha {\frac {\partial }{\partial \theta _{j}}}J(\theta _{0},\theta _{1})$
      - for j=0 and j=1
    - }
  - Linear Regression Model:
    - h_Θ(x) = Θ₀ + Θ₁x
    - J(Θ₀,Θ₁) = ${\frac {1}{2m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})^{2}$
    - min
      Θ₀,Θ₁ J(Θ₀,Θ₁)
  - Partial derivatives:
    - j=0: ${\frac {\partial }{\partial \theta _{0}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
    - j=1: ${\frac {\partial }{\partial \theta _{1}}}J(\theta _{0},\theta _{1})={\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
  - Gradient descent algorithm:
    - repeat until convergence {
      - $\theta _{0}:=\theta _{0}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})$
      - $\theta _{1}:=\theta _{1}-\alpha {\frac {1}{m}}\sum _{i=1}^{m}(h_{\theta }(x^{(i)})-y^{(i)})x^{(i)}$
    - }
- What's Next
  - Two extensions:
    - In min J(Θ₀,Θ₁), solve for Θ₀,Θ₁ exactly without needing iterative algorithm (gradient descent)
    - Learn with larger number of features
  - Linear Algebra topics:
    - What are matrices and vectors
    - Addition, subtraction and multiplication with matrices and vectors
    - Matrix inverse, transpose
  - ${\begin{bmatrix}1&2\\0&1\end{bmatrix}}^{T}{\begin{bmatrix}1&2\\0&1\end{bmatrix}}$
  - ${\begin{bmatrix}2&0\\0&1\end{bmatrix}}^{-1}{\begin{bmatrix}2\\1\end{bmatrix}}-3{\begin{bmatrix}1\\5\end{bmatrix}}$
LINEAR ALGEBRA REVIEW
- Matrices and Vectors
  - Definitions:
    - Matrix: a rectangular array of numbers
    - Dimension of Matrix: number of rows by number of columns
      - $\mathbb {R} ^{4\times 2}$ = 4 rows and 2 columns
      - $\mathbb {R} ^{2\times 3}$ = 2 rows and 3 columns
  - Matrix elements:
    - A_ij = "i,j entry" in the i^th row, j^th column
    - $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$ $A={\begin{bmatrix}1402&191\\1371&821\\949&1437\\147&1448\end{bmatrix}}$
      - A₁₁ = 1402
      - A₁₂ = 191
      - A₃₂ = 1437
      - A₄₁ = 147
      - A₄₃ = undefined (error)
  - Vector: an n x 1 matrix
    - $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$ $y={\begin{bmatrix}460\\232\\315\\178\end{bmatrix}}$
      - n = 4; 4-dimensional vector
      - $\mathbb {R} ^{4}$
    - y_i = i^th element
    - 1-indexed vs 0-indexed

ML class overview

Navigation menu

Search