ML class overview

From John's wiki
Jump to navigation Jump to search

This is an overview in point form of the content in the ML class.

INTRODUCTION

Examples of machine learning

  • Database mining (Large datasets from growth of automation/web)
    • clickstream data
    • medical records
    • biology
    • engineering
  • Applications that can't be programmed by hand
    • autonomous helicopter
    • handwriting recognition
    • most of Natural Language Processing (NLP)
    • Computer Vision
  • Self-customising programs
    • Amazon
    • Netfilx product recommendations
  • Understanding human learning (brain, real AI)

What is Machine Learning?

  • Definitions of Machine Learning
    • Arthur Samuel (1959). Machine Learning: Field of study that gives computers the ability to learn without being explicitly programmed.
    • Tom Mitchell (1998). Well-posed Learning Problem: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on on T, as measured by P, improves with experience E.
  • There are several different types of ML algorithms. The two main types are:
    • Supervised learning
      • teach computer how to do something
    • Unsupervised learning
      • computer learns by itself
  • Other types of algorithms are:
    • Reinforcement learning
    • Recommender systems

Supervised Learning

  • Supervised Learning in which the "right answers" are given
    • Regression: predict continuous valued output (e.g. price)
    • Classification: discrete valued output (e.g. 0 or 1)

Unsupervised Learning

  • Unsupervised Learning in which the categories are unknown
    • Clustering: cluster patterns (categories) are found in the data
    • Cocktail party problem: overlapping audio tracks are separated out
      • [W,s,v] = svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x');

LINEAR REGRESSION WITH ONE VARIABLE

Model Representation

  • e.g. housing prices, price per square-foot
    • Supervised Learning
    • Regression
  • Dataset called training set
  • Notation:
m number of training examples
x's "input" variable / features
y's "output" variable / "target" variable
(x,y) one training example
(x(i),y(i)) ith training example
  • Training Set -> Learning Algorithm -> h (hypothesis)
  • Size of house (x) -> h -> Estimated price (y)
    • h maps from x's to y's
  • How do we represent h?
    • hΘ(x) = h(x) = Θ0 + Θ1x
  • Linear regression with one variable (x)
    • Univariate linear regression

Cost Function

  • Helps us figure out how to fit the best possible straight line to our data
  • hΘ(x) = Θ0 + Θ1x
  • Θi's: Parameters
  • How to choose parameters (Θi's)?
    • Choose Θ0, Θ1 so that hΘ(x) is close to y for our training examples (x,y)
    • Minimise for Θ0, Θ1
      • hΘ(x(i)) = Θ0 + Θ1x(i)
    • J(Θ0,Θ1) =
    • J(Θ0,Θ1) is the Cost Function, also known in this case as the Squared Error Function

Cost Function - Intuition I

  • Summary:
    • Hypothesis: hΘ(x) = Θ0 + Θ1x
    • Parameters: Θ0, Θ1
    • Cost Function: J(Θ0,Θ1) =
    • Goal: minimise Θ0, Θ1 J(Θ0, Θ1)
  • Simplified:
    • hΘ(x) = Θ1x
    • minimise Θ1 J(Θ1)
  • Can plot simplified model in 2D

Cost Function - Intuition II

  • Can plot J(Θ0,Θ1) in 3D
  • Can plot with Contour Map (Contour Plot)

Gradient Descent

  • repeat until convergence { }
  • α = learning rate

Gradient Descent Intuition

  • min
    Θ1
    J(Θ1)
    • For Θ1 > local minimum: positive, moves toward local minimum
    • For Θ1 < local minimum: negative, moves toward local minimum
  • If learning rate α is too small algorithm takes a long time to run
  • If learning rate α is too large algorithm may not converge or may diverge
  • When partial derivative is zero Θ1 converges
  • As we approach a local minimum, gradient descent automatically takes smaller steps
    • So no need to decrease α over time

Gradient Descent for Linear Regression

  • Gradient descent algorithm:
    • repeat until convergence {
      • for j=0 and j=1
    • }
  • Linear Regression Model:
    • hΘ(x) = Θ0 + Θ1x
    • J(Θ0,Θ1) =
    • min
      Θ01
      J(Θ01)
  • Partial derivatives:
    • j=0:
    • j=1:
  • Gradient descent algorithm:
    • repeat until convergence {
    • }

What's Next

  • Two extensions:
    • In min J(Θ0,Θ1), solve for Θ0,Θ1 exactly without needing iterative algorithm (gradient descent)
    • Learn with larger number of features
  • Linear Algebra topics:
    • What are matrices and vectors
    • Addition, subtraction and multiplication with matrices and vectors
    • Matrix inverse, transpose

LINEAR ALGEBRA REVIEW

Matrices and Vectors

  • Matrix: a rectangular array of numbers
  • Dimension of Matrix: number of rows by number of columns
    • = 4 rows and 2 columns
    • = 2 rows and 3 columns
  • Matrix elements:
    • Aij = "i,j entry" in the ith row, jth column
      • A11 = 1402
      • A12 = 191
      • A32 = 1437
      • A41 = 147
      • A43 = undefined (error)
  • Vector: an n x 1 matrix
      • : n = 4; 4-dimensional vector
    • yi = ith element
    • 1-indexed vs 0-indexed
  • Notation (generally):
    • A,B,C,X = capital = matrix
    • a,b,x,y = lower case = vector or scalar

Addition and Scalar Multiplication

  • Matrix addition:
      • 3x2 matrix + 3x2 matrix = 3x2 matrix
      • 3x2 matrix + 2x2 matrix = error
  • Scalar multiplication:
      • scalar x 3x2 matrix = 3x2 matrix = 3x2 matrix x scalar
  • Combination of operands:
      • 3x1 matrix = 3-dimensional vector

Matrix Vector Multiplication

    • 1x1 + 3x5 = 16
    • 4x1 + 0x5 = 4
    • 2x1 + 1x5 = 7
    • 3x2 matrix * 2x1 matrix = 3x1 matrix
  • prediction = DataMatrix x parameters
    • hΘ(x) = -40 + 0.25x =