AI class unit 8

From John's wiki
Revision as of 09:21, 7 November 2011 by Sixsigma (talk | contribs)
Jump to navigation Jump to search

These are my notes for unit 8 of the AI class.

Planning

Introduction

{{#ev:youtubehd|dqeEg5V_IPM}}

Hi, and welcome back. This unit is about planning. We defined AI to be the study and process of finding appropriate actions for an agent. So in some sense planning is really the core of all of AI. The technique we looked at so far was problem solving search over a state space using techniques like A*.

Given a state space and a problem description, we can find a solution, a path to the goal. Those approaches are great for a variety of environments, but they only work when the environment is deterministic and fully observable.

In this unit, we will see how to relax those constraints.

Problem Solving vs Planning

{{#ev:youtubehd|gZza8lZr1Oc}}

You remember how problem solving worked. We have a state space like this, and we're given a start state and a goal to reach, and then we'd search for a path to find that goal, and maybe we find this path. Now the way a problem solving agent would work, is first it does all the work to figure out the path to the goal just doing by thinking, and then it starts to execute that path, to drive or walk, however you want to get there, from the start state to the end state.

But think about what would happen if you did that in real life. If you did all your planning ahead of time, you had the complete goal, and then without interacting with the world, without sensing it at all, you started to execute that path. Well, this has in fact been studied. People have gone out and blindfolded walkers, put them in a field and told them to walk in a straight line, and the results are not pretty. Here are the GPS tracks to prove it.

So we take a hiker, we put him at a start location, say here, and we blindfold him so that he can't see anything in the horizon, but just has enough to see his or her feet so that they won't stumble over something, and tell them execute the plan of going forward. Put one foot in front of each other and walk forward in a straight line, and these are the typical paths we see. Start out going straight for a while, but then go in loop de loops and end up not at a straight path at all. These ones over here, starting in this location, are even more convoluted. They get going straight for a little bit and then go in very tight loops.

So people are incapable of walking a straight line without any feedback from the environment. Now here on this yellow path, this one did much better, and why was that? Well it's because these paths were on overcast days, and so there was no input to make sense of. Whereas on this path was on a very sunny day, and so even though the hiker couldn't see farther than a few feet in front of him, he could see shadows and say, "As long as I keep the shadows pointing in the right direction then I can go in a relatively straight line."

So the moral is we need some feedback from the environment. We can't just plan ahead and come up with a whole plan. We've got to interleave planning and executing.

Planning vs Execution

{{#ev:youtubehd|BmqedPZZA4A}}

Now why do we have to interleave planning and execution? Mostly because of properties of the environment that make it difficult to deal with. The most important one is if the environment is stochastic. That is, if we don't know for sure what an action is going to do.

If we know what everything is going to do, we can plan it out right from the start, but if we don't, we have to be able to deal with contingencies of, say I tried to move forward and the wheels slipped, and I went some place else, or the brakes might skid, or if we're walking our feet don't go 100% straight, or consider the problem of traffic lights. If the traffic light is red, then the result of the action of go forward through the intersection is bound to be different than if the traffic light is green.

Another difficulty we have to deal with is multi-agent environments. If there are other cars and people that can get in our way, we have to plan about what they're going to do, and we have to react when they do something unexpected, and we can only know that at execution time, not at planning time.

The other big problem is with partial observability. Suppose we've come up with a plan to go from A to S to F to B. That plan looks like it will work, but we know that at S, the road to F is sometimes closed, and there will be a sign there telling us whether it's closed or not, but when we start off we can't read that sign. So that's partial observability.

Another way to look at it is when we start off we don't know what state we're in. We know we're in A, but we don't know if we're in A in the state where the road is closed or if we're in A in the state where the road is open, and it's not until we get to S that we discover what state we're actually in, and then we know if we can continue along that route or if we have to take a detour south.

Now in addition to these properties of the environment, we can also have difficulty because of lack of knowledge on our own part. So if some model of the world is unknown, that is, for example, we have a map or GPS software that's inaccurate or incomplete, then we won't be able to execute a straight-line plan.

And similarly, often we want to deal with a case where the plans have to be hierarchical. And certainly a plan like this is at a very high level. We can't really execute the action of going from A to S when we're in a car. All the actions that we can actually execute are things like turn the steering wheel a little bit to the right, press on the pedal a little bit more. So those are the low-level steps of the plan, but those aren't sketched out in detail when we start, when we only have the high-level parts of the plan, and then it's during execution that we schedule the rest of the low-level parts of the plan.

Now most of these difficulties can be addressed by changing our point of view. Instead of planning in the space of world states, we plan in the space of belief states. To understand that let's look at a state.

Vacuum Cleaner Example

{{#ev:youtubehd|lb1bEUa9WXg}}

Here's a state space diagram for a simple problem. It involves a room with two locations. The left we call A, and the right we call B, and in that environment there's a vacuum cleaner, and there may or may not be dirt in either of the two locations, so that gives us eight total states. Dirt is here or not, here or not, and the vacuum cleaner is here or here. So that's two times two times two is eight possible states, and I've drawn here the states based diagram with all the transitions for the three possible actions, and the actions are moving right, so we'd go from this state to this state, moving left, we'd go from this state to this state, and sucking up dirt, we'd go from this state to this state, for example; and in this state space diagram, if we have a fully deterministic, fully observable world, it's easy to plan. Say we start in this state, and we want to be -- end up in a goal state where both sides are clean. We can execute the suck-dirt action and get here and then move right, and then suck dirt again, and now we end up in a goal state where everything is clean.

Now suppose our robot vacuum cleaner's sensors break down, and so the robot can no longer perceive either which location it's in or whether there's any dirt. So we now have an unobservable or sensor-less world rather than a fully observable one, and how does the agent then represent the state of the world? Well it could be in any one of these eight states, and so all we can do to represent the current state is draw a big circle or box around everything, and say, "I know I'm somewhere inside here."

Now that doesn't seem like it helps very much. What good is it to know that we don't really know anything at all? But the point is that we can search in the state space of belief states rather than in the state space of actual spaces. So we believe that we're in one of these eight states, and now when we execute an action, we're going to get to another belief state. Let's take a look at how that works.

Sensorless Vacuum Cleaner Problem

{{#ev:youtubehd|hyjwEymVfL4}}

This is the belief state space for the sensor-less vacuum problem. So we started off here. We drew the circle around this belief state. So we don't know anything about where we are, but the amazing thing is, if we execute actions we can gain knowledge about the world even without sensing.

So let's say we move right, then we'll know we're in the right-hand location. Either we were in the left, and we moved right, and arrived there, or we were in the right to begin with, and we bumped against the wall and stayed there. So now we end up in this state. We now know more about the world. We're down to four possibilities rather than eight, even though we haven't observed anything.

And now note something interesting, that in the real world, the operations of going left and going right are inverses of each other, but in the belief state world going right and going left are not inverses. If we go right, and then we go left, we don't end up back where we were in a state of total uncertainty, rather going left takes us over here where we still know we're in one of four states rather than in one of eight states.

Note that it's possible to form a plan that reaches a goal without ever observing the world. Plans like that are called conform-it plans. For example, if the goal is to be in a clean location all we have to do is suck. So we go from one of these eight states to one of these four states and, in every one of those four, we're in a clean location. We don't know which of the four we're in, but we know we've achieved the goal.

It's also possible to arrive at a completely known state. For example, if we start here, we go left; we suck up the dirt there. We go right and suck up the dirt, now we're down to a belief state consisting of one single state that is we know exactly where we are.

Here's a question for you: how do I get from the state where I know my current square is clean, but know nothing else, to the belief state where I know that I'm in the right-hand side location and that that location is clean? What I want you to do is click on the sequence of actions, left, right or suck that will take u from that start to that goal.