ML class overview

From John's wiki
(Redirected from ML class mind map)
Jump to navigation Jump to search

This is an overview in point form of the content in the ML class.

INTRODUCTION

Examples of machine learning

  • Database mining (Large datasets from growth of automation/web)
    • clickstream data
    • medical records
    • biology
    • engineering
  • Applications that can't be programmed by hand
    • autonomous helicopter
    • handwriting recognition
    • most of Natural Language Processing (NLP)
    • Computer Vision
  • Self-customising programs
    • Amazon
    • Netfilx product recommendations
  • Understanding human learning (brain, real AI)

What is Machine Learning?

  • Definitions of Machine Learning
    • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
    • Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
  • There are several different types of ML algorithms. The two main types are:
    • Supervised learning
      • teach computer how to do something
    • Unsupervised learning
      • computer learns by itself
  • Other types of algorithms are:
    • Reinforcement learning
    • Recommender systems

Supervised Learning

  • Supervised Learning in which the "right answers" are given
    • Regression: predict continuous valued output (e.g. price)
    • Classification: discrete valued output (e.g. 0 or 1)

Unsupervised Learning

  • Unsupervised Learning in which the categories are unknown
    • Clustering: cluster patterns (categories) are found in the data
    • Cocktail party problem: overlapping audio tracks are separated out
      • [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

  • e.g. housing prices, price per square-foot
    • Supervised Learning
    • Regression
  • Dataset called training set
  • Notation:
m number of training examples
x's "input" variable / features
y's "output" variable / "target" variable
(x,y) one training example
(x(i),y(i)) ith training example
  • Training Set -> Learning Algorithm -> h (hypothesis)
  • Size of house (x) -> h -> Estimated price (y)
    • h maps from x's to y's
  • How do we represent h?
    • hΘ(x) = h(x) = Θ0 + Θ1x
  • Linear regression with one variable (x)
    • Univariate linear regression

Cost Function

  • Helps us figure out how to fit the best possible straight line to our data
  • hΘ(x) = Θ0 + Θ1x
  • Θi's: Parameters
  • How to choose parameters (Θi's)?
    • Choose Θ0, Θ1 so that hΘ(x) is close to y for our training examples (x,y)
    • Minimise for Θ0, Θ1
      • hΘ(x(i)) = Θ0 + Θ1x(i)
    • J(Θ0,Θ1) =
    • J(Θ0,Θ1) is the Cost Function, also known in this case as the Squared Error Function

Cost Function - Intuition I

  • Summary:
    • Hypothesis: hΘ(x) = Θ0 + Θ1x
    • Parameters: Θ0, Θ1
    • Cost Function: J(Θ0,Θ1) =
    • Goal: minimise Θ0, Θ1 J(Θ0, Θ1)
  • Simplified:
    • hΘ(x) = Θ1x
    • minimise Θ1 J(Θ1)
  • Can plot simplified model in 2D

Cost Function - Intuition II

  • Can plot J(Θ0,Θ1) in 3D
  • Can plot with Contour Map (Contour Plot)

Gradient Descent

  • repeat until convergence { }
  • α = learning rate

Gradient Descent Intuition

  • min
    Θ1
    J(Θ1)
    • For Θ1 > local minimum: positive, moves toward local minimum
    • For Θ1 < local minimum: negative, moves toward local minimum
  • If learning rate α is too small algorithm takes a long time to run
  • If learning rate α is too large algorithm may not converge or may diverge
  • When partial derivative is zero Θ1 converges
  • As we approach a local minimum, gradient descent automatically takes smaller steps
    • So no need to decrease α over time

Gradient Descent for Linear Regression

  • Gradient descent algorithm:
    • repeat until convergence {
      • for j=0 and j=1
    • }
  • Linear Regression Model:
    • hΘ(x) = Θ0 + Θ1x
    • J(Θ0,Θ1) =
    • min
      Θ01
      J(Θ01)
  • Partial derivatives:
    • j=0:
    • j=1:
  • Gradient descent algorithm:
    • repeat until convergence {
    • }

What's Next

  • Two extensions:
    • In min J(Θ0,Θ1), solve for Θ0,Θ1 exactly without needing iterative algorithm (gradient descent)
    • Learn with larger number of features
  • Linear Algebra topics:
    • What are matrices and vectors
    • Addition, subtraction and multiplication with matrices and vectors
    • Matrix inverse, transpose

LINEAR ALGEBRA REVIEW

Matrices and Vectors

  • Matrix: a rectangular array of numbers
  • Dimension of Matrix: number of rows by number of columns
    • = 4 rows and 2 columns
    • = 2 rows and 3 columns
  • Matrix elements:
    • Aij = "i,j entry" in the ith row, jth column
      • A11 = 1402
      • A12 = 191
      • A32 = 1437
      • A41 = 147
      • A43 = undefined (error)
  • Vector: an n x 1 matrix
      • : n = 4; 4-dimensional vector
    • yi = ith element
    • 1-indexed vs 0-indexed
  • Notation (generally):
    • A,B,C,X = capital = matrix
    • a,b,x,y = lower case = vector or scalar

Addition and Scalar Multiplication

  • Matrix addition:
      • 3x2 matrix + 3x2 matrix = 3x2 matrix
      • 3x2 matrix + 2x2 matrix = error
  • Scalar multiplication:
      • scalar x 3x2 matrix = 3x2 matrix = 3x2 matrix x scalar
  • Combination of operands:
      • 3x1 matrix = 3-dimensional vector

Matrix Vector Multiplication

    • 1x1 + 3x5 = 16
    • 4x1 + 0x5 = 4
    • 2x1 + 1x5 = 7
    • 3x2 matrix * 2x1 matrix = 3x1 matrix
  • prediction = DataMatrix x parameters
    • hΘ(x) = -40 + 0.25x =