AI class unit 9

From John's wiki
Jump to navigation Jump to search

These are my notes for unit 9 of the AI class.

Planning under Uncertainty

Introduction

So today is an exciting day. We'll talk about planning under uncertainty, and it really puts together some of the material we've talked about in past classes. We talked about planning, but not under uncertainty, and you've had many, many classes on uncertainty, and now it gets to the points where we can make decisions under uncertainty.

This is really important for my own research field like robotics where the world is full of uncertainty, and the type of techniques I'll tell you about today will really make it possible to drive robots in actual physical roles and find good plans for these robots to execute.

Planning Under Uncertainty MDP

Planning under uncertainty. In this class so far we talked a good deal about planning. We talked about uncertainty and probabilities, and we also talked about learning, but all three items were discussed separately. We never bought planning and uncertainty together, uncertainty and learning, or planning and learning.

So the class today will fuse planning and uncertainty using techniques known as Markov Decision Processes or MDPs, and Partially Observable Markov Decision Processes or POMDPs. We also have a class coming up on reinforcement learning which combines all three of these aspects: planning, uncertainty, and machine learning.

You might remember in the very first class we distinguished very different characteristics of agent tasks, and here are some of those. We distinguished deterministic from stochastic environments, and we also talked about fully observable versus partially observable.

In the area of planning so far all of our algorithms fall into this field over here, like A*, depth first, breadth first and so on. The MDP algorithms which I will talk about first fall into the intersection of fully observable yet stochastic.

And just to remind us what the difference was, stochastic is an environment where the outcome of an action is somewhat random. Whereas an environment that's deterministic is where the outcome of an action is predictable and always the same.

An environment is fully observable if you can see the state of the environment which means if you can make all decisions based on the momentary sensor input. Whereas if you need memory, it's partially observable.

Planning in the partially observable case is called POMDP, and towards the end of this class, I will briefly talk about POMDPs but not in any depth. So most of this class focuses on Markov decision processes as opposed to partially observable Markov decision processes.

So what is a Markov decision process? One way you can specify a Markov decision process is by a graph. Suppose you have states S1, S2, and S3, and you have actions A1 and A2. In a state transition graph like this is a finite state machine, and it becomes Markov if the outcomes of actions are somewhat random. So for example if A1 over here, with a 50% probability leads to state S2 but with another 50% probability leads to state S3.

So put differently, a Markov decision process of states, actions, a state transition matrix, often written of the following form which is just about the same as a conditional state transition probability that a state S' is the correct posterior state after executing action A in a state S.

And the missing thing is the objective for the Markov decision process. What do we want to achieve? For that we often define a reward function, and for the sake of this lecture, I will attach rewards just to states. So each state will have a function R attached that tells me how good the state is. So for example it might be worth $10 to be in the state over here, $0 to be in the state over here, and $100 to be in a state over here. So the planning problem is now the problem which (inaudible) an action to each possible state. So that we maximise our total reward.

Robot Tour Guide Examples

Before diving into too much detail, let me explain to you why MDPs really matter. What you see here is a robotic tour guide that the University of Bonn, with my assistance, deployed in the German museum in Bonn, and the objective of this robot was to navigate the museum and guide visitors, mostly kids, from exhibit to exhibit.

This is a challenging planning problem because as the robot moves it can't really predict its action outcomes because of the randomness of the environment and the carpet and the wheels of the robot. The robot is not able to really follow its own commands very well, and it has to take this into consideration during the planning process so when it finds itself in a location it didn't expect, it knows what to do.

In the second video here, you see a successor robot that was deployed in the Smithsonian National Museum of American History in the late 1990s where it guided many, many thousands of kids through the entrance hall of the museum, and once again, this is a challenging planning problem. As you can see people are often in the way of the robot. The robot has to take detours. This one is particularly difficult because there were obstacles that were invisible like a downward staircase. So this is a challenging localisation problem trying to find out where you are, but that's for a later class.

In the video here, you see a robot being deployed in a nursing home with the objective to assist elderly people by guiding them around, bringing them to appointments, reminding them to take their medication, and interacting with them, and this robot has been active for many, many years and been used, and, again, it's a very challenging planning problem to navigate through this elderly home.

And the final robot I'm showing you here was build with my colleague Will Whittaker at Carnegie Melon University. The objective here was to explore abandoned mines. Pennsylvania and West Virginia and other states are heavily mined. There's many abandoned old coal mines, and for many of these mines it's unknown what the conditions are and where exactly they are. They're not really human accessible. They tend to have roof fall and very low oxygen levels. So we built a robot that went inside and built maps of those mines.

All these problems have in common that they have really challenging planning problems. The environments are stochastic. That is, the outcome of actions are unknown, and the robot has to be able to react to all kinds of situations, even the ones that it didn't plan for.

MDP Grid World

Problems with Conventional Planning 1

Branching Factor Question

Note

For this problem (and only this problem) assume actions are stochastic in a way that is different than described in 4. MDP Gridworld. Instead of an action north possibly going east or west, an action north will possibly go northeast or northwest (i.e. to the diagonal squares).

Likewise for the other directions e.g. an action west will possibly go west, northwest or southwest (i.e. to the diagonals).

Answer

Problems with Conventional Planning 2

Policy Question 1

Note

Stochastic actions are as in 4. MDP Grid World. An action North moves North with 80% chance otherwise East with 10% chance or West with 10% chance. Likewise for the other directions.

Answer

Policy Question 2

Note

Stochastic actions are as in 4. MDP Grid World. An action North moves North with 80% chance otherwise East with 10% chance or West with 10% chance. Likewise for the other directions.

Answer

Policy Question 3

Note

Stochastic actions are as in 4. MDP Grid World. An action North moves North with 80% chance otherwise East with 10% chance or West with 10% chance. Likewise for the other directions.

Policy Answer 3 Question 4

Note

Stochastic actions are as in 4. MDP Grid World. An action North moves North with 80% chance otherwise East with 10% chance or West with 10% chance. Likewise for the other directions.

Answer

MDP and Costs

Value Iteration 1

Value Iteration 2

Value Iteration 3

Deterministic Question 1

Answer

Deterministic Question 2

Answer

Deterministic Question 3

Answer

Stochastic Question 1

Answer

Stochastic Question 2

Answer

Value Iterations and Policy 1

Value Iterations and Policy 2

MDP Conclusion

Partial Observability Introduction

POMDP vs MDP

POMDP

Planning Under Uncertainty Conclusion