These are my notes for Week 2 of the AI class.

Probability in AI

Introduction

So the next units will be concerned with probabilities. Particularly with structured probabilities using Bayes Networks. This is some of the most involved material in this class. And since this is a Stanford level class you'll find out that some of the quizzes are actually real hard. So as you go through the material I hope the hardness of the quizzes won't discourage you, it'll really entice you, to take a piece of paper and a pen and work them out.

Let me give you a flavour of a Bayes Network using an example. Suppose you find in the morning that your car won't start. Well there are many causes why your car might not start. One is that your battery is flat. Even for a flat battery there's multiple causes: one is that it's just plain dead, and one is that the battery is OK but it's not charging. The reason why a battery might not charge is that the alternator may be broken or the fan belt may be broken. If you look at this influence diagram, also called a Bayes Network, you'd find that there's many different ways to explain that the car won't start, and a natural question you might have is "can we diagnose the problem?"

One diagnostic tool is a battery meter, which may increase or decrease your belief that the battery may cause you car failure. You might also know your battery age -- older batteries tend to go dead more often. Then there's many other ways to look at reasons why the car won't start. You might expect the lights, the oil light, the gas gauge. You might even dip into the engine to see what the oil level is, with a dipstick. And all of those relate to alternative reasons why your car might not be starting, like no oil, or no gas, the fuel line might be blocked, or the starter might be broken. All of these can influence your measurements, like the oil light or the gas gauge, in different ways. For example a flat battery might have an effect on the lights, or the oil light, and on the gas gauge; but it won't really have an effect the oil you measure with a dipstick. That is affected by the oil level, which also affects the oil light. Gas will affect the gas gauge and of course without gas the car doesn't start.

So this is a complicated structure that really describes one way to understand how a car doesn't start. A car is a complex system, it has lots of variables you can't really measure immediately, and it has sensors which allow you to understand a little bit about the state of the car.

What the Bayes Network does, is it really assists you in reasoning from observable variables like the car won't start and the value of the dipstick to hidden causes like is the fan belt broken or is the battery dead? What we have here is a Bayes Network. A Bayes Network is composed of nodes, these nodes correspond to events that you might or might not know, they're typically called random variables. These nodes are linked by arcs and the arcs suggest that a child of an arc is influenced by its parent, but not in a deterministic way it might be influenced in a probabilistic way. Which means an older battery for example has a higher chance of causing the battery to be dead, but it's not clear that every old battery is dead.

There's a total of 16 variables in this Bayes Network. What the graph structure and associated probabilities specify is a huge probability distribution in the space of all of these 16 variables. If they were binary, which we'll assume throughout this unit, they can take 2¹⁶ different values, which is a lot.

The Bayes Network as we'll find out is a compact representation of a distribution over this very very large joint probability distribution of all of these variables. Further, once we've specified the Bayes Network we can observe for example that the car won't start, you can observe things like the oil light, and the lights and the battery and then compute probabilities of hypotheses like the alternator is broken, or the fan belt is broken, or the battery is dead. So in this class we're going to talk about how to construct these Bayes Networks, what the semantics are, and how to reason in this Bayes Network to find out about variables we can't observe like whether the fan belt is broken or not.

That's an overview. Throughout this unit we are going to assume that every event is discrete, in fact it's binary. We'll start with some consideration of basic probability. We'll work our way into some simpler Bayes Networks. We'll talk about concepts like conditional independence. Then we'll define Bayes Network more generally. Move into a concept like D-separation, and start doing parameter counts. Later on Peter will tell you about inference in Bayes Networks, so we won't be doing that in this class.

I can't overemphasise how important this class is. Bayes Networks are used extensively in almost all fields of smart computer systems. In diagnostics, for prediction, for machine learning. In fields like finance, inside Google, in robotics.

Bayes Networks are also the building blocks of more advanced AI techniques such as particle filters, Hidden Markov Models, MDPs and POMDPs, Kalman filters, and many others. These are words that don't sound familiar quite yet but as you go through the class I can promise you that you will get to know what they mean.

So let's start now with the very very basics.

Probability/Coin Flip

So let's talk about probabilities. Probabilities are the corner stone of artificial intelligence. They are used to express uncertainty, and the management of uncertainty is really key to many many things in AI such as machines learning and Bayes Network inference and filtering and robotics and computer vision and so on.

So we're going to start with some very basic questions and we're going to work our way out from there.

Here's a coin, and the coin came up heads or tails. And my question is the following: suppose the probability for heads is 0.5 what's the probability for it coming up tails?

So the right answer is a half, or 0.5, and the reason is that the coin can only come up heads or tails, we know that it has to be either one, therefore the total probability of both coming up is one. So half the probability is assigned to heads, and the other half is assigned to tails.

Let me ask the next quiz. Suppose the probability of heads is a quarter, 0.25, what is the probability of tails?

And the answer is three quarters, it's a loaded coin and the reason is, well, each of them has a certain probability. The total of those is one. The quarter is claimed by heads. Therefore, 3/4 remain for tail, which is the answer over here.

Here's another quiz. What's the probability that the coin comes up heads, heads, heads, three times in a row, assuming that each one of those has a probability of a half and that these coin flips are independent.

The answer is 0.125. Each head has a probability of a half. We can multiply those probabilities because they are independent events. And that gives us 1/8 or 0.125.

Now let's flip the coin four times, and let's call X_i the result of the i-th coin flip. So each X_i is going to be drawn from heads (H) or tails (T). What's the probability that all four of those flips give us the same result, no matter what it is, assuming that each one of those has identically and an equally distributed probability of coming up heads of a half.

And the answer is, well, there's 2 ways that we can achieve this. One is the all heads and one is all tails. You already know that 4 times heads is 1/16, and we know that 4 times tails is also 1/16. These are completely independent events. The probability of either one occurring is 1/16 plus 1/16, which is 1/8, which is 0.125.

So here's another one. What's the probability that within the set of X₁, X₂, X₃ and X₄ there are at least three (or more) heads?

And the solution is let's look at different sequences in which heads occurs at least 3 times. It could be head, head, head, head, in which it comes 4 times. It could be head, head, head, tail and so on, all the way to tail, head, head, head. There is 1, 2, 3, 4, 5 of those outcomes. Each of them has a 16^th for probability, so it's 5 times a 16th, which is 0.3125.

Probability Summary

So we just learned a number of things. One is about complementary probability. If an event has a certain probability, P, then the complementary event has probability 1-P. We also learned about independence. If 2 random variables, X and Y, are independent, which you're going to write as shown below, that means the probability of the joint that any 2 variables can assume is the product of the marginals. So rather than asking the question, "What is the probability for any combination that these 2 coins or maybe 5 coins could have taken?" we can now look at the probability of each coin individually, look at its probability and just multiply them up.

Dependence

So let me ask you about dependence. Suppose we flip 2 coins. Our first coin is a fair coin, and we're going to denote the outcome by X_i. So the chance of X₁ coming up heads is half. But now we branch into picking a coin based on the first outcome. So if the first outcome was heads, you pick a coin whose probability of coming up heads is going to be 0.9. The way I word this is by conditional probability, the probability of the second coin flip coming up heads provided that, or given that, X₁, the first coin flip, was heads, is 0.9. The first coin flip might also come up tails, in which case I pick a very different coin. In this case I pick a coin which with 0.8 probability will once again give me tails, conditioned on the first coin flip coming up tails. So my question for you is, what's the probability of the second coin flip coming up heads?

The answer is 0.55. The way to compute this is by the theorem of total probability. The probability of X₂ equals heads. There's two ways I can get into this outcome. One is via this path over here, and one is via this path over here. Let me just write both of them down.

So first of all, it could be the probability of X₂ equals heads given that, and I will assume, X₁ was heads already. Now I have to add the complementary event. Suppose X₁ came up tails. Then I can ask the question, what is the probability that X₂ comes up heads regardless, even though X₁ was tails.

Plugging the numbers gives us the following. This one over here is 0.9 times a half. The probability of tails is 0.8 thereby my heads probability becomes 1 minus 0.8, which is 0.2. Adding all this together gives me 0.45 plus 01, which is exactly 0.55.

What We Learned

So, we actually just learned some interesting lessons. The probability of any random variable Y can be written as the probability of Y given that some other random variable X assumes value i times the probability of X equals i, summed over all possible outcomes i for the random variable X. This is called total probability.

The second thing we learned has to do with negation of probabilities. We found that probability of not X given Y is 1 minus probability of X given Y. Now, you might be tempted to say "What about the probability of X given not Y?" "Is this the same as 1 minus probability of X given Y?" And the answer is absolutely no. That's not the case. If you condition on something that has certain probability value, you can take the event you're looking at and negate this, but you can never negate your conditional variable and assume these values add up to 1.

Weather

We assume there is sometimes sunny days and sometimes rainy days, and on day 1, which we're going to call D₁, the probability of sunny is 0.9. And then let's assume that a sunny day follows a sunny day with 0.8 chance, and a rainy day follows a sunny day with, well?

Well, the correct answer is 0.2, which is a negation of this event over here.

A sunny day follows a rainy day with 0.6 chance, and a rainy day follows a rainy day with: please give me your number.

So, what are the chances that D₂ is sunny? Suppose the same dynamics apply from D₂ to D₃, so just replace D₃ over here with D₂s over there. That means the transition probabilities from one day to the next remain the same. Tell me, what's the probability that D₃ is sunny?

AI class unit 3

Contents

Probability in AI

Introduction

Probability/Coin Flip

Probability Summary

Dependence

What We Learned

Weather

Navigation menu

AI class unit 3

Probability in AI

Introduction

Probability/Coin Flip

Probability Summary

Dependence

What We Learned

Weather

Navigation menu

Search