AI class unit 8

From John's wiki
Revision as of 23:52, 8 November 2011 by Sixsigma (talk | contribs)
Jump to navigation Jump to search

These are my notes for unit 8 of the AI class.

Planning

Introduction

{{#ev:youtubehd|dqeEg5V_IPM}}

Hi, and welcome back. This unit is about planning. We defined AI to be the study and process of finding appropriate actions for an agent. So in some sense planning is really the core of all of AI. The technique we looked at so far was problem solving search over a state space using techniques like A*.

Given a state space and a problem description, we can find a solution, a path to the goal. Those approaches are great for a variety of environments, but they only work when the environment is deterministic and fully observable.

In this unit, we will see how to relax those constraints.

Problem Solving vs Planning

{{#ev:youtubehd|gZza8lZr1Oc}}

You remember how problem solving worked. We have a state space like this, and we're given a start state and a goal to reach, and then we'd search for a path to find that goal, and maybe we find this path. Now the way a problem solving agent would work, is first it does all the work to figure out the path to the goal just doing by thinking, and then it starts to execute that path, to drive or walk, however you want to get there, from the start state to the end state.

But think about what would happen if you did that in real life. If you did all your planning ahead of time, you had the complete goal, and then without interacting with the world, without sensing it at all, you started to execute that path. Well, this has in fact been studied. People have gone out and blindfolded walkers, put them in a field and told them to walk in a straight line, and the results are not pretty. Here are the GPS tracks to prove it.

So we take a hiker, we put him at a start location, say here, and we blindfold him so that he can't see anything in the horizon, but just has enough to see his or her feet so that they won't stumble over something, and tell them execute the plan of going forward. Put one foot in front of each other and walk forward in a straight line, and these are the typical paths we see. Start out going straight for a while, but then go in loop de loops and end up not at a straight path at all. These ones over here, starting in this location, are even more convoluted. They get going straight for a little bit and then go in very tight loops.

So people are incapable of walking a straight line without any feedback from the environment. Now here on this yellow path, this one did much better, and why was that? Well it's because these paths were on overcast days, and so there was no input to make sense of. Whereas on this path was on a very sunny day, and so even though the hiker couldn't see farther than a few feet in front of him, he could see shadows and say, "As long as I keep the shadows pointing in the right direction then I can go in a relatively straight line."

So the moral is we need some feedback from the environment. We can't just plan ahead and come up with a whole plan. We've got to interleave planning and executing.

Planning vs Execution

{{#ev:youtubehd|BmqedPZZA4A}}

Now why do we have to interleave planning and execution? Mostly because of properties of the environment that make it difficult to deal with. The most important one is if the environment is stochastic. That is, if we don't know for sure what an action is going to do.

If we know what everything is going to do, we can plan it out right from the start, but if we don't, we have to be able to deal with contingencies of, say I tried to move forward and the wheels slipped, and I went some place else, or the brakes might skid, or if we're walking our feet don't go 100% straight, or consider the problem of traffic lights. If the traffic light is red, then the result of the action of go forward through the intersection is bound to be different than if the traffic light is green.

Another difficulty we have to deal with is multi-agent environments. If there are other cars and people that can get in our way, we have to plan about what they're going to do, and we have to react when they do something unexpected, and we can only know that at execution time, not at planning time.

The other big problem is with partial observability. Suppose we've come up with a plan to go from A to S to F to B. That plan looks like it will work, but we know that at S, the road to F is sometimes closed, and there will be a sign there telling us whether it's closed or not, but when we start off we can't read that sign. So that's partial observability.

Another way to look at it is when we start off we don't know what state we're in. We know we're in A, but we don't know if we're in A in the state where the road is closed or if we're in A in the state where the road is open, and it's not until we get to S that we discover what state we're actually in, and then we know if we can continue along that route or if we have to take a detour south.

Now in addition to these properties of the environment, we can also have difficulty because of lack of knowledge on our own part. So if some model of the world is unknown, that is, for example, we have a map or GPS software that's inaccurate or incomplete, then we won't be able to execute a straight-line plan.

And similarly, often we want to deal with a case where the plans have to be hierarchical. And certainly a plan like this is at a very high level. We can't really execute the action of going from A to S when we're in a car. All the actions that we can actually execute are things like turn the steering wheel a little bit to the right, press on the pedal a little bit more. So those are the low-level steps of the plan, but those aren't sketched out in detail when we start, when we only have the high-level parts of the plan, and then it's during execution that we schedule the rest of the low-level parts of the plan.

Now most of these difficulties can be addressed by changing our point of view. Instead of planning in the space of world states, we plan in the space of belief states. To understand that let's look at a state.

Vacuum Cleaner Example

{{#ev:youtubehd|lb1bEUa9WXg}}

Here's a state space diagram for a simple problem. It involves a room with two locations. The left we call A, and the right we call B, and in that environment there's a vacuum cleaner, and there may or may not be dirt in either of the two locations, so that gives us eight total states. Dirt is here or not, here or not, and the vacuum cleaner is here or here. So that's two times two times two is eight possible states, and I've drawn here the states based diagram with all the transitions for the three possible actions, and the actions are moving right, so we'd go from this state to this state, moving left, we'd go from this state to this state, and sucking up dirt, we'd go from this state to this state, for example; and in this state space diagram, if we have a fully deterministic, fully observable world, it's easy to plan. Say we start in this state, and we want to be -- end up in a goal state where both sides are clean. We can execute the suck-dirt action and get here and then move right, and then suck dirt again, and now we end up in a goal state where everything is clean.

Now suppose our robot vacuum cleaner's sensors break down, and so the robot can no longer perceive either which location it's in or whether there's any dirt. So we now have an unobservable or sensor-less world rather than a fully observable one, and how does the agent then represent the state of the world? Well it could be in any one of these eight states, and so all we can do to represent the current state is draw a big circle or box around everything, and say, "I know I'm somewhere inside here."

Now that doesn't seem like it helps very much. What good is it to know that we don't really know anything at all? But the point is that we can search in the state space of belief states rather than in the state space of actual spaces. So we believe that we're in one of these eight states, and now when we execute an action, we're going to get to another belief state. Let's take a look at how that works.

Sensorless Vacuum Cleaner Problem

{{#ev:youtubehd|hyjwEymVfL4}}

This is the belief state space for the sensor-less vacuum problem. So we started off here. We drew the circle around this belief state. So we don't know anything about where we are, but the amazing thing is, if we execute actions we can gain knowledge about the world even without sensing.

So let's say we move right, then we'll know we're in the right-hand location. Either we were in the left, and we moved right, and arrived there, or we were in the right to begin with, and we bumped against the wall and stayed there. So now we end up in this state. We now know more about the world. We're down to four possibilities rather than eight, even though we haven't observed anything.

And now note something interesting, that in the real world, the operations of going left and going right are inverses of each other, but in the belief state world going right and going left are not inverses. If we go right, and then we go left, we don't end up back where we were in a state of total uncertainty, rather going left takes us over here where we still know we're in one of four states rather than in one of eight states.

Note that it's possible to form a plan that reaches a goal without ever observing the world. Plans like that are called conform-it plans. For example, if the goal is to be in a clean location all we have to do is suck. So we go from one of these eight states to one of these four states and, in every one of those four, we're in a clean location. We don't know which of the four we're in, but we know we've achieved the goal.

It's also possible to arrive at a completely known state. For example, if we start here, we go left; we suck up the dirt there. We go right and suck up the dirt, now we're down to a belief state consisting of one single state that is we know exactly where we are.

Here's a question for you: how do I get from the state where I know my current square is clean, but know nothing else, to the belief state where I know that I'm in the right-hand side location and that that location is clean? What I want you to do is click on the sequence of actions, left, right or suck that will take u from that start to that goal.

{{#ev:youtubehd|Bxd-j9s82Z8}}

And the answer is that the state of knowing that your current square is clean corresponds to this state. This belief state with four possible world states, and if I then execute the right action, followed by the suck action, then I end up in this belief state, and that satisfies the goal. I know I'm in the right-hand-side location and I know that location is clean.

Partially Observable Vacuum Cleaner Example

{{#ev:youtubehd|Bxd-j9s82Z8}}

We've been considering sensor-less planning in a deterministic world. Now I want to turn our attention to partially observable planning but still in a deterministic world.

Suppose we have what's called local sensing, that is our vacuum can see what location it is in and it can see what's going on in the current location, that is whether there's dirt in the current location or not, but it can't see anything about whether there's dirt in any other location.

So here's a partial diagram of the -- part of the belief state from that world, and I want it to show how the belief state unfolds as two things happen. First, as we take action, so we start in this state, and we take the action of going right, and in this case we still go from two states in our belief state to two new ones, but then, after we do an action, we do an observation, and we have the act-percept cycle, and now, once we get the observation, we can split that world, we can split our belief state to say, if we observe that we're in location B and it's dirty, then we know we're in this belief state here, which happens to have exactly one world state in it, and if we observe that we're clean then we know that we're in this state, which also has exactly one in it.

Now what does the act-observe cycle do to the sizes of the belief states? Well in a deterministic world, each of the individual world states within a belief state maps into exactly one other one. That's what we mean by deterministic, and so that means the size of the belief state will either stay the same or it might decrease if two of the actions sort of accidentally end up bringing you to the same place.

On the other hand, the observation works in kind of the opposite way. When we observe the world, what we're doing is we're taking the current belief state and partitioning it up into pieces. Observations alone can't introduce a new state -- a new world state -- into the belief state. All they can do is say, "some of them go here, and some of them go here." Now it may be that for some observation all the belief states go into one bin, and so we make an observation and we don't learn anything new, but at least the observation can't make us more confused than we were before the observation.

Stochastic Environment Problem

{{#ev:youtubehd|Rgn1RW0fcec}}

Now let's move on to stochastic environments. Let's consider a robot that has slippery wheels so that sometimes when you make a movement -- a left or a right action -- the wheels slip and you stay in the same location. And sometimes they work and you arrive where you expected to go. And let's assume that the suck action always works perfectly.

Now we get a belief state space that looks something like this. And notice that the results of actions will often result in a belief state that's larger than it was before -- that is, the action will increase uncertainty because we don't know what the result of the action is going to be. And so here for each of the individual world states belonging to a belief state, we have multiple outcomes for the action, and that's what stochastic means. And so we end up with a larger belief state here.

But in terms of the observation, the same thing holds as in the deterministic world. The observation partitions the belief state into smaller belief states. So in a stochastic partially observable environment, the actions tend to increase uncertainty, and the observations tend to bring that uncertainty back down.

Now, how would we do planning in this type of environment? Well I haven't told you yet, so you won't know the answer for sure, but I want you to try to figure it out anyway, even if you might get the answer wrong. So imagine I had the whole belief state from which I've diagrammed just a little bit here and I wanted to know how to get from this belief state to one in which all squares are clean.

So I'm going to give you some possible plans, and I want you to tell me whether you think each of these plans will always work, or maybe sometimes work depending on how the stochasticity works out. And so here are the possible plans.

Remember I'm starting here, and I want to know how to get to a belief state in which all the squares are clean. One possibility is suck, right, and suck; one is right, suck, left, suck; one is suck, right, right, suck; and the other is suck, right, suck, right, suck. So some of these actions might take you out this this little belief state here, but just use what you knew from the previous definition of the state space and the results of each of those actions and the fact that the right and left actions are nondeterministic and tell me which of these you think will always achieve the goal or will maybe achieve the goal.

And then I want you to also answer for the fill-in-the-blank plan -- that is, is there some plan, some ideal plan, which always or maybe achieves the goal?

{{#ev:youtubehd|uBBfv13Ky4I}}

And the answer is that any plan that would work in the deterministic world might work in the stochastic world if everything works out okay and all of these plans meet that criteria. But no finite plan is guaranteed to always work because a successful plan has to include at least one move action. And if we try a move action a finite number of times, each of those times the wheels might slip and it won't move, and so we can never be guaranteed to achieve the goal with a finite sequence of actions. Now, what about an infinite sequence of actions? Well, we can't represent that in the language we have so far where a plan is a linear sequence. But we can introduce a new notion of plans in which we do have infinite sequences.

Infinite Sequences

{{#ev:youtubehd|h3_8OLOhtG4}}

In this new notation, instead of writing plans as a linear sequence of, say, suck, move right, and suck; I'm going to write them as a tree structure. So we start off in this belief state here, which we'll diagram like this. And then we do a suck action. We end up in a new state. And then we do a right action, and now we have to observe the world, and if we observe that we're still in state A, we loop back to this part of the plan. And if we observe that we're in B, we go on and then execute the suck action. And now we're at the end of the plan.

So, we see that there's a choice point here, which we indicate with this sort of tie to say we're following a straight line, but now we can branch. There's a conditional, and we can either loop, or we can continue on, so we see that this finite representation represents an infinite sequence of plans. And we could write it in a more sort of linear notation as S, while we observe A do R, and then do S.

Now, what can we say about this plan? Does this plan achieve the goal? Well, what we can say is that if the stochasticity is independent, that is, if sometimes it works and sometimes it doesn't, then with probability 1 in the limit, this plan will, in fact, achieve the goal, but we can't state any bounded number of steps under which it's guaranteed to achieve the goal. We can only say that it's guaranteed at infinity.

Finding a Successful Plan

{{#ev:youtubehd|ffGFIoOhN3U}}

Now, I've told you what a successful plan looks like, but I haven't told you how to find one. The process of finding it can be done through search just as we did in problem solving.

So remember in problem solving we start off in a state, and it's a single state, not a belief state. And then we start searching a tree, and we have a big triangle of possible states that we search through, and then we find one path that gets us all the way to a goal state. And we pick from this big tree a single path.

So, with belief states and with branching plan structures, we do the same sort of process, only the tree is just a little bit more complicated. Here we show one of these trees, and it has different possibilities. So, for example, we start off here, and we have one possibility that the first action will be going right, or another possibility that the first action will be performing a suck. But then it also has branches that are part of the plan itself. So this branch here is actually part of the plan as we saw before. It's not a branch in the search space. It's a branch in the plan, so what we do is we search through this tree. We try right as a first action. We keep expanding nodes until we find a portion of the tree like this path is a portion of this search tree. We find that portion which is a successful plan according to the criteria of reaching the goal.