

Typically the agent does NOT "know" the full search space.
Here, "know" means "has available an explicit representation of".
Instead the agent knows the initial state, and it knows operators. An operator is a function which "expands" a node.
"Expanding" a node means computing the node that the agent could move to using the operator.
With this available knowledge, the general search algorithm that the agent can use is:

Imagine using a map to plan your trip from Arad to Bucharest. Start with your finger on Arad.

Arad is not a goal node so expand it to find all its successors: Sibiu, Timisoara, and Zerind.
Now make a choice of one of these to investigate next, say Sibiu.
Sibiu is not a goal node so expand it to get its successors: Arada, Fagaras, Oradea, and Rimnicu Vilcea.
Now make a choice of any one of the current leaf nodes to expand next, i.e. one of Arada, Fagaras, Oradea, Rimnicu Vilcea, Timisoara, and Zerind.
And so on, until the node you choose is a goal node.

The general search algorithm explores a search tree, NOT the search space directly.
Notice that expanding the node Sibiu gives Arad, which has already been visited.
In general the search algorithm does not "know" this.
In some domains it is easy to test whether multiple nodes correspond to the same state. In other domains it is impossible.
Consider the 8-queens problem. If we define states to be "any arrangement of 8 queens on the board", then it becomes very difficult to determine if we have already visited a state during search because there are 64^8 = 2^64 possible states.
We could try to keep track of the states we have already visited (say, with a has table), but this would require 2^64 entries!
But this problem can be handled in another way--by making the operators avoid states that have already been visited. For example, we could design an operator for the 8-queens problem that always places a queen in the leftmost unoccupied column. This insures that no state is ever revisited without the risk of missing a possible solution. (Why?)
What is called a "representation" in AI is often called a data structure elsewhere in computer science.
We will represent a node as a list (technically a tuple) with the following components:
The depth of a node is the number of nodes on the path to this node from the root.
The fringe of the tree is the set of leaf nodes of the tree. A leaf node is a node that is NOT the parent of any other node.
The search algorithm needs an efficient representation of the fringe. We shall use a queue, i.e. a special list where you can remove elements from the front and you can insert new elements into the middle.
The path cost of a node is the cost of the path from the root to this node. This is normally the sum of the costs of the operators used along the path to the node.

The variable nodes is the queue.
The well-formulated search problem includes the following functions:
The key point is that we can change the behavior of the search algorithm by changing the internals of the queuing function.
Different algorithms use different "insert" functions.
We can evaluate algorithms according to:
If a search strategy has no idea of the path cost or search cost from the current node to the goal, it is referred to as an uninformed or blind strategy.
A strategy which uses such knowledge is called informed or heuristic search.
Blind search is less effective than heuristic search, but it forms the basis of many heuristic search techniques.
BFS is the general search algorithm where the "insert" function is "enqueue-at-end". This means that newly generated nodes are added to the fringe at the end, so they are expanded last.
BFS first considers all paths of length 1, then all paths of length 2, and so on. This is why it is called "breadth-first".
The picture below shows intuitively how BFS works.

Consider a search tree where expanding a node always gives exactly b children, where the "branching factor" b >= 2.
So there is 1 root node
b nodes at depth 1
b*b nodes at depth 2
b*b*b nodes at depth 3 and so on.
The number of nodes at depth d or less is N = 1 + b + b^2 + ... + b^d.
Somewhat surprisingly, N = O(b^d).
Intuitively, almost all the nodes are at the deepest level, and the number of shallower nodes is negligible.
For a problem with branching factor b where the first solution is at depth d, the time complexity of BFS is O(b^d).
The space complexity of BFS is also O(b^d).
The big problem with depth-first search is that it uses about as much space as it uses time. Any real computer will run out of space before it runs out of time, at this rate.
DFS is the general search algorithm where the "insert" function is "enqueue-at-front". This means that newly generated nodes are added to the fringe at the beginning, so they are expanded immediately.
DFS goes down a path until it reaches a node that has no children. Then DFS "backtracks" and expands a sibling of the node that had no children. If this node has no siblings, then DFS looks for a sibling of the grandparent, and so on.
See the picture below for an illustration of DFS.

No matter how deep the current node is, DFS will always go deeper if it has a child.
The major weakness of DFS is that it will fail to terminate if there is an infinite path "to the left of" the path to the first solution.
In other words, for many problems DFS is not complete: a solution exists but DFS cannot find it.
The major advantage of DFS is that it only uses O(bm) space if the branching factor is b and the maximum depth is m.
(Explanation: there are m nodes on the longest path, and for each of these b-1 siblings must be stored.)
Iterative deepening is a very simple, very good, but counter-intuitive idea that was not discovered until the mid 1970s. Then it was invented by many people simultaneously.
The idea is to do depth-limited DFS repeatedly, with an increasing depth limit, until a solution is found.
Intuitively, this is a dubious idea because each repetition of depth-limited DFS will repeat uselessly all the work done by previous repetitions.
But, this useless repetition is not significant because a branching factor b > 1 implies that
# nodes at depth k >> # nodes at depth k-1 or less
Iterative deepening simulates BFS with linear space complexity.
For a problem with branching factor b where the first solution is at depth d, the time complexity of iterative deepening is O(b^d), and its space complexity is O(bd).
"Knowledge is power." How can we use knowledge to improve our search algorithm?
One common type of knowledge is a scoring function that estimates how good a node is as the next node to expand.
Note the word "estimates". If the scoring function was perfect, then we wouldn't need any search at all.
The word "heuristic" means trial-and-error, exploratory, unguaranteed, rule of thumb.
The heuristic best-first search algorithm is

The function "eval" scores nodes according to how promising they are for further search (lower is better)
The QUEUING-FN places new nodes into the queue according to their scores.
This type of queue is called a "priority queue" by algorithm designers.
The idea is that the next node to be taken from the queue and expanded will be the one with the highest priority, i.e. the lowest score.
A heuristic scoring function, h, is defined as follows:
h(n) = estimated cost of the cheapest path from state(n) to a goal
Greedy search is best-first search with h as its "eval" function.
For each particular problem, the designer must choose an appropriate h function.
For example, for the Romanian travel problem let h(n) be the straight line "as the crow flys" distance from state(n) to Bucharest.

Greedy search is called greedy because it always tries to make the biggest jump possible towards the goal, without "thinking" further ahead about what will happen from that point.
See the figure below for an example of how greedy search does not find the optimal path.

In the worst-case, the time and space complexity of greedy search are the same as for breadth-first search.
In the "average" case having a good heuristic function h should make greedy search much better.
Proving that greedy search is better is very difficult mathematically. It is common in AI that a heuristic idea is good but proving that it is good is very difficult--precisely because heuristics are not guaranteed.
A* search is best-first search with an admissible heuristic.
An admissible heuristic is an optimistic heuristic--one that never overestimates the remaining cost to the goal.
The idea is that if we want to find the optimal path to the goal, i.e. the shortest or cheapest path, then the next node to explore should be the one that looks like it will yield the OVERALL cheapest path.
The best estimate of the total cost of the path through a candidate node is
f(n) = g(n) + h(n)
where h is the same estimator function of remaining distance as above and
g(n) = cost of path from the initial state to state(n).
Note that the names f, g, h, and n are standard: you should remember to use
this notation.
Study the example in the figure below carefully.

The importance of the A* algorithm is that it can be proven to be:
The practical usefulness of A* depends entirely on having a good heuristic function.
Consider the 8 puzzle and two heuristic functions;
h1 = number of misplaced tiles
h2 = sum of distances of tiles from goal position
Note that h1 and h2 both always underestimate the number of tiles
that must be moved to reach a solution: they are both admissible.
Note also that h1(n) <= h2(n) for all states of the 8 puzzle.
Hence h2 is a better estimator than h1.
To invent an admissible heuristic, take the original problem and relax some of the demands so that it is possible to solve the easier problem without search.
Use the cost of the solution to the relaxed problem as the underestimate of the actual solution cost.