AI class unit 9: Difference between revisions

From John's wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 15: Line 15:
{{#ev:youtubehd|9D35JSWSJAg}}
{{#ev:youtubehd|9D35JSWSJAg}}


Planning under uncertainty. In this class so far we talked a good deal about planning. We talked about uncertainty and probabilities, and we also talked about learning, but all three items were discussed separately. We never bought planning and uncertainty together, uncertainty and learning, or planning and learning. So the class today will fuse planning and uncertainty using techniques known as Markov Decision Processes or MDPs, and Partially Observable Markov Decision Processes or POMDPs. We also have a class coming up on reinforcement learning which combines all three of these aspects.
Planning under uncertainty. In this class so far we talked a good deal about planning. We talked about uncertainty and probabilities, and we also talked about learning, but all three items were discussed separately. We never bought planning and uncertainty together, uncertainty and learning, or planning and learning.
 
So the class today will fuse planning and uncertainty using techniques known as Markov Decision Processes or MDPs, and Partially Observable Markov Decision Processes or POMDPs. We also have a class coming up on reinforcement learning which combines all three of these aspects: planning, uncertainty, and machine learning.
 
You might remember in the very first class we distinguished very different characteristics of agent tasks, and here are some of those. We distinguished deterministic from stochastic environments, and we also talked about fully observable versus partially observable.
 
In the area of planning so far all of our algorithms fall into this field over here, like A*, depth first, breadth first and so on. The MDP algorithms which I will talk about first fall into the intersection of fully observable yet stochastic.
 
And just to remind us what the difference was, stochastic is an environment where the outcome of an action is somewhat random. Whereas an environment that's deterministic is where the outcome of an action is predictable and always the same.
 
An environment is fully observable if you can see the state of the environment which means if you can make all decisions based on the momentary sensor input. Whereas if you need memory, it's partially observable.
 
Planning in the partially observable case is called POMDP, and towards the end of this class, I will briefly talk about POMDPs but not in any depth. So most of this class focuses on Markov decision processes as opposed to partially observable Markov decision processes.
 
So what is a Markov decision process? One way you can specify a Markov decision process is by a graph. Suppose you have states S<sub>1</sub>, S<sub>2</sub>, and S<sub>3</sub>, and you have actions A<sub>1</sub> and A<sub>2</sub>. In a state transition graph like this is a finite state machine, and it becomes Markov if the outcomes of actions are somewhat random. So for example if A<sub>1</sub> over here, with a 50% probability leads to state S<sub>2</sub> but with another 50% probability leads to state S<sub>3</sub>.
 
So put differently, a Markov decision process of states, actions, a state transition matrix, often written of the following form which is just about the same as a conditional state transition probability that a state S' is the correct posterior state after executing action A in a state S.
 
And the missing thing is the objective for the Markov decision process. What do we want to achieve? For that we often define a reward function, and for the sake of this lecture, I will attach rewards just to states. So each state will have a function R attached that tells me how good the state is. So for example it might be worth $10 to be in the state over here, $0 to be in the state over here, and $100 to be in a state over here. So the planning problem is now the problem which (inaudible) an action to each possible state. So that we maximise our total reward.

Revision as of 21:43, 11 November 2011

These are my notes for unit 9 of the AI class.

Planning under Uncertainty

Introduction

{{#ev:youtubehd|DgH6NaJHfVQ}}

So today is an exciting day. We'll talk about planning under uncertainty, and it really puts together some of the material we've talked about in past classes. We talked about planning, but not under uncertainty, and you've had many, many classes on uncertainty, and now it gets to the points where we can make decisions under uncertainty.

This is really important for my own research field like robotics where the world is full of uncertainty, and the type of techniques I'll tell you about today will really make it possible to drive robots in actual physical roles and find good plans for these robots to execute.

Planning Under Uncertainty MDP

{{#ev:youtubehd|9D35JSWSJAg}}

Planning under uncertainty. In this class so far we talked a good deal about planning. We talked about uncertainty and probabilities, and we also talked about learning, but all three items were discussed separately. We never bought planning and uncertainty together, uncertainty and learning, or planning and learning.

So the class today will fuse planning and uncertainty using techniques known as Markov Decision Processes or MDPs, and Partially Observable Markov Decision Processes or POMDPs. We also have a class coming up on reinforcement learning which combines all three of these aspects: planning, uncertainty, and machine learning.

You might remember in the very first class we distinguished very different characteristics of agent tasks, and here are some of those. We distinguished deterministic from stochastic environments, and we also talked about fully observable versus partially observable.

In the area of planning so far all of our algorithms fall into this field over here, like A*, depth first, breadth first and so on. The MDP algorithms which I will talk about first fall into the intersection of fully observable yet stochastic.

And just to remind us what the difference was, stochastic is an environment where the outcome of an action is somewhat random. Whereas an environment that's deterministic is where the outcome of an action is predictable and always the same.

An environment is fully observable if you can see the state of the environment which means if you can make all decisions based on the momentary sensor input. Whereas if you need memory, it's partially observable.

Planning in the partially observable case is called POMDP, and towards the end of this class, I will briefly talk about POMDPs but not in any depth. So most of this class focuses on Markov decision processes as opposed to partially observable Markov decision processes.

So what is a Markov decision process? One way you can specify a Markov decision process is by a graph. Suppose you have states S1, S2, and S3, and you have actions A1 and A2. In a state transition graph like this is a finite state machine, and it becomes Markov if the outcomes of actions are somewhat random. So for example if A1 over here, with a 50% probability leads to state S2 but with another 50% probability leads to state S3.

So put differently, a Markov decision process of states, actions, a state transition matrix, often written of the following form which is just about the same as a conditional state transition probability that a state S' is the correct posterior state after executing action A in a state S.

And the missing thing is the objective for the Markov decision process. What do we want to achieve? For that we often define a reward function, and for the sake of this lecture, I will attach rewards just to states. So each state will have a function R attached that tells me how good the state is. So for example it might be worth $10 to be in the state over here, $0 to be in the state over here, and $100 to be in a state over here. So the planning problem is now the problem which (inaudible) an action to each possible state. So that we maximise our total reward.