Hill Climbing and Best-First Search Methods

In this article we will discuss about:- 1. Algorithm for Hill Climbing 2. Difficulties of Hill Climbing 3. Determination of an Heuristic Function 4. Best-First Search 5. Best-First Algorithm for Best-First Search 6. Finding the Best Solution – A* Search.

Algorithm for Hill Climbing:

Begin:

1. Identify possible starting states and measure the distance (f) of their closeness with the goal node; Push them in a stack according to the ascending order of their f;

2. Repeat

ADVERTISEMENTS:

Pop stack to get the stack-top element;

If the stack-top element is the goal, announce it and exit

Else push its children into the stack in the ascending order of their f values-

Until the stack is empty-

ADVERTISEMENTS:

End

It is simply a loop which continually moves in the direction of increasing value- that is uphill. It terminates when it reaches “peak” where no neighbour has a higher value, the algorithm does not maintain a search tree, so the current node data structure need only record the state and its objective function value. Hill climbing does not look ahead beyond the immediate neighbours of the current state. This resembles trying to find the top of Mount Everest in a thick fog while suffering from amnesia.

To illustrate hill climbing, we will use the 8-queens problem. Local search algorithms typically use a complete state formulation, where each state has 8 queens on the board, one per column. The successor function returns all possible states generated by moving a single queen to another square in the same column (so each state has 8*7 = 56 successors).

The heuristic cost function h is the number of pairs of queens that are attacking each other, either directly or indirectly; the global minimum of this function is zero, which occurs only at perfect solutions. Hill climbing algorithms typically choose randomly among the set of best successors, if there is more than one.

ADVERTISEMENTS:

Hill climbing is sometime called greedy local search because it grabs a good neighbour state without thinking ahead about where to go next. Although greed is considered one of the seven deadly sins in Indian system of ethereal life. It turns out that greedy algorithms often perform quite well. Hill climbing often makes very rapid progress towards a solution because it is usually quite easy to improve a bad state.

Difficulties of Hill Climbing:

i. Local Maxima:

A local maximum is a peak which is higher than each of its neighboring states, but lower than the global maxima that is very difficult for greedy algorithms to navigate.

ii. Plateaux:

ADVERTISEMENTS:

A plateau is an area of the state space landscape where the evaluation function is flat. It can be flat local maximum, from which no uphill exit exists, or a shoulder from which it is possible to make progress. A hill climbing search might be unable to find its way off the plateau.

Ridge is a special kind of local maximum. It is an area of the search space which is higher than the corresponding areas and that itself has a slope. But the orientation of the high region, compared to the set of available moves and direction in which they move makes it impossible to traverse the ridge by single move. To overcome this move apply two or more rules before performing the test. This corresponds to moving in several directions at once.

In each case, the algorithm reaches a point at which no progress is being made. Starting for a randomly generated 8-queens state, steepest-ascent hill climbing gets stuck 86% of the time, solving only 14% of problem instances. It works quickly, taking just 4 steps on average when it succeeds and 3 when it gets stuck-not bad for a state space with 8⁸ = 17 million states.

The algorithm halts if it reaches a plateau where the best successor has the same value as the current state. Is it advisable to allow a sideway move in the hope that the plateau is really a shoulder. The answer is usually yes, but we must take care. If we always allow sideways moves when there are no uphill moves, an infinite loop will occur whenever the algorithm reaches a flat local maximum which is not a shoulder.

ADVERTISEMENTS:

One common solution is to put a limit on the number of consecutive sideways moves allowed. For example, we could allow up to say 100 consecutive sideways moves in the 8-queens problem. This raises the percentage of problem instances solved by hill climbing from 14% to 94%. Success comes at a cost: the algorithm averages roughly 21 steps for each successful instance and 64 for each failure.

Many variants of hill climbing have been invented stochastic hill climbing chooses at random from among the uphill moves: the probability of selection can vary with the steepness of the uphill move. This usually converges more slowly than steepest ascent but in some cases it finds better solution. First-choice hill climbing implements stochastic hill climbing by generating successors randomly until one is generated which is better than the current state. This is a good strategy when a state has many of successors.

The hill climbing algorithms described so far are incomplete — they often fail to find a goal when one exists because they can get stuck on local maxima. Random- restart hill climbing adopts the well known adage, if at first you don’t succeed, try, try again. It conducts a series of hill climbing searches from randomly generated initial states, stopping when a goal is found. It is complete with probability approaching 1, for the trivial reason that it will eventually generate a goal state as the initial state.

If each hill climbing search has a probability p of success, then the expected number of restarts required is I/p. For 8-queens instances with no sideways moves allowed, P = 0.14, so we need roughly 7 iterations to find a goal (6 failures and 1 success).

The expected number of steps is the cost of one successful iteration plus (1- p)/p times the cost of failure, or roughly 22 steps. When we allow sideways moves, 1/0.9 = 1.06 iterations are needed on average and (1*21) + (0.06/0.94) * 64 = 25 steps. For 8-queens then, random restart hill climbing is very effective indeed. Even for three million queens, the approach can find solutions in under a minute.

The success of hill climbing depends very much on the shape of the state-space landscape: if there are few local maxima and plateau, random-restart hill climbing will find a good solution very quickly. NP hard problems typically have an exponential number of local maxima to get stuck on. Despite this, a reasonably good local maximum can often be found after a small number of restarts.

The difficulties faced in the hill climbing search can be explained with the help of an interesting analogy of maze, shown in Fig. 4.7. (a), the corresponding search tree is given in Fig. 4.7. (b).

The figure shows the search tree for finding the way for a buffer through a maze. This is a state problem, as we are not interested in the shortest path but in the goal (state) only.

The start is marked with a bullet and the exit (goal state) is marked g, the rest of the letters mark the choice points in the maze. The figures in the brackets (figure b) show the heuristic evaluation function for each node. Here the evaluation function chosen is the distance measured from the node to the goal.

We may note the following points about the maze search tree:

1. Misleading Distances:

At node a it appears at first that b is the most promising direction. Alas! it leads to a dead end.

2. Local Minima:

From node b no where looks any better; whatever path we take appears (in terms of the heuristic) to take us farther from the goal. Hence b is called a local minimum. A simple search might step at b and never reach goal g, which is the global minimum.

3. Plateau:

The value of the heuristic evaluation function does not change between c and d; there is no sense of progress. In more complex problems there may be whole areas of the search space with no change of heuristic. When this happens the heuristic ceases to give any guidance about possible direct path.

In order to progress towards the goal we may have to get temporarily farther away from it. In short such a problem is difficult to solve and such problems do occur in real scenarios, so must be faced with efficient search algorithm(s).

Now we would show how a heuristic evaluation function is calculated and how its proper choice could lead to a good situation of a problem.

Determination of an Heuristic Function during Hill Climbing – An Example

Consider a block-world problem where similar and equal blocks (A to H) are given (Fig. 4.8). They are arranged in the initial state and need to be arranged as in the goal state.

The following two operators are given:

R1:

Pick up one block and put it on the table

R2:

Pick up one block and put it on another.

Let the heuristic function be defined in the following way:

(a) Add one point for every block which is resting on the thing it is supposed to be resting on. Subtract one point for every block which is sitting on the wrong thing.

Correspondingly initial state has a score of 4. Goal state has a score of 8. The most natural move could be to move block A onto the table. Putting A on table, from initial state as in Fig. 4.9., has score of 6. This move is very much allowed and this stage produces three states (Fig. 4.10.) with scores (a) 4, (b) 4, (c) 4.

Hill climbing will stop because all these states have the same score and produce less score than the current state (intermediate Fig. 4.9.). The process has reached a local maximum, (not the global maximum). The problem is that by purely local examination of support structures, (taking block as a unit) the current state appears to be better than any of its successors because more blocks rest on the correct objects.

To analyze this problem it is necessary to disassemble a good local structure (the stack from B to H) howsoever good it may be because it is wrong in the global context. The hill climbing does not look too far enough ahead. This fault is inherent in the statement of the heuristic function, so let us change it.

(b) Now define the heuristic function globally taking the whole structure of blocks as a single unit.

For each block which has the correct support structure i.e., if the complete structure below it is exactly as it should be, add one point for every block in the support structure. For each block which has an incorrect support structure, subtract one point for every block in the existing support structure.

Using this function, the goal state has the score = 28.

The initial state has the score = -28.

The three states produced from this now have scores:

a = -28, b = – 16, c = -15

The steepest ascent hill climbing will choose move (c) which is correct (max.) to lead us towards solution.

The new heuristic function points to the two aspects:

1. Incorrect structures are bad and should not be selected.

2. Correct structures are good and should be built up.

So the same hill-climbing procedure which failed with earlier heuristic function now works perfectly well. But alas! such a perfect heuristic function is difficult to construct as the example selected is of mathematical nature.

This difficulty can be illustrated with the help of an example: Suppose you as chief executive have gone to a new city to attend conference of chief executives of IT companies in a region.

The perfect heuristic function would need to have knowledge about the exact and dead-end streets; which in the case of a strange city is not always available. And even if perfect knowledge in principle, is available, say by keeping information about venue of conference in your information file, it may not be computationally tractable to use.

Thus, the hill climbing can be very inefficient in a large rough problem space. But this method when combined with other methods can lead profitably near to the solution.

Best-First Search:

This search procedure is an evaluation-function variant of breadth first search. The heuristic function used is an indicator of how far the node is from the goal node. Goal nodes have an evaluation function value of zero.

Best-first search is explained using a search graph given in Fig. 4.11; the principle already explained in table 4.2.

First the start node S is expanded. It has three children A, B and C with heuristic function values 3, 6 and 5 respectively. These values approximately indicate how far they are from the goal node. The child with minimum value namely A is chosen. The children of A are generated.

They are D and E with values 9 and 8. The search process has now four nodes to search for i.e., node D with value 9, node E with value 8, node B with value 6 and node C with value 5. Of them, node C has got the minimal value which is expanded to give node H with value 7. At this point, the nodes available for search are (D: 9), (E: 8), (B: 6) and (H: 7). Of these, B is minimal and hence B is expanded to give (F: 12), (G: 14).

At this juncture, the node available for search are (D: 9), (E: 8), (H: 7), (F: 12), and (G: 14) out of which (H: 7) is minimal and is expanded to give (I: 5), (J: 6). Nodes now available for expansion are (D: 9), (E: 8), (F: 12), (G: 14), (1:5), (J: 6). Of these, the node with minimal value is (I: 5) which is expanded to give the goal node. The various steps are shown in the table, (queue is not followed strictly as was done in Table 4.2.)

As we can see, best-first search is “jump all around” in the search graph to identify the node with minimal evaluation function value.

There is only a minor variation between hill climbing and best-first search. In the former, we sorted the children of the first node being generated, and in the latter we have to sort the entire list to identify the next node to be expanded.

Best-First Algorithm for Best-First Search:

1. Put the initial node on a list OPEN

2. If (OPEN is empty) or (OPEN = GOAL) terminate search

3. Remove the best node from OPEN. Call this node a

4. If (a = GOAL) terminate search with success

5. Else if node a has successors, generate all of them. Find out how far they are from the goal node. Sort all the children generated so far by the remaining distance from the goal.

6. Name this list as CLOSED

7. Replace OPEN with CLOSED

8. Go to Step 2

The paths found by best-first search are likely to give solutions faster than by Hill climbing because it expands a node which ‘seems’ closer to the goal. However, there is no guarantee on this, since ‘seems’ does not mean surety.

Best-first search resembles depth-first search in the way it prefers to follow a single path all the way to goal, but will backup when it hits a dead end. It suffers from the same defects as depth-first search—it is not optimal, and it is incomplete (because it can go along an infinite path and never return to try other possibilities).

The worst- case time and space complexity is O (b^d) where d is the maximum depth of the search space. With good heuristic function, however, the complexity can be reduced substantially. The amount of reduction, however depends on the particular problem and the quality of the heuristic.

Finding the Best Solution in Best-First Search – A* Search:

Hill climbing and best-first searches, with the help of good heuristic, find a solution faster than exhaustive search methods. But the solution they have obtained cannot tell if that is the best.

For example, hill climbing algorithm gets to a suboptimal solution l and the best- first solution finds the optimal solution h of the search tree, (Fig. 4.2.) but this is not the case always. Now suppose that heuristic function would have been so chosen that d would have value 4 instead of 2. Then instead of h the Best-first research would have found e as node, which is suboptimal, without affecting the goal reached through hill-climbing. If e were a dead end no solution whatsoever could be possible.

Fig. 4.2. First Few Steps of Breadth First Search on the Tree.

A* Search Algorithm:

This type of heurestic search makes use of the fact that most problem spaces provide some information which distinguishes among states in terms of their likelihood of leading to a goal. This information is called a heuristic evaluation function. In other words, the goal of a heuristic search is to reduce the number of nodes searched in seeking a goal.

Most widely used best first search form is called A*, which is pronounced as A star. It is a heuristic searching method, and used to minimize the search cost in a given problem. It aims to find the least-cost path from a given initial node to the specific goal. It is an extended form of best-first search algorithm.

Best first-search algorithm tries to find a solution to minimize the total cost of the search pathway, also. However, the difference from Best-First Search is that A* also takes into account the cost from the start, and not simply the local cost from the previously generated node. Best-first search finds a goal state in any predetermined problem space.

However, it cannot guarantee that it will choose the shortest path to the goal. For instance, if there are two options to chose from, one of which is a long way from the initial point but has a slightly shorter estimate of distance to the goal, and another that is very close to the initial state but has a slightly longer estimate of distance to the goal, best- first search will always choose to expand next the state with the shorter estimate. The A* algorithm fixes the best first search’s this particular drawback.

In short, A* algorithm searches all possible routes from a starting point until it finds the shortest path or cheapest cost to a goal. The terms like shortest path, cheapest cost here refer to a general notion. It could be some other alternative term depending on the problem.

For instance, in a map problem the cost is replaced by the term distance. Thus, A* may reduce the necessity to search all the possible pathways in a search space, and result in faster solution. A* evaluates nodes by combining g(n) and h(n).

In the standard terminology used when talking about A*:

f(n) = g(n) + h(n)

The purpose of this equation is to obtain the lowest/score in a given problem, n being node number crossed until the final node.

f(n) is the total search cost, g(n) is actual lowest cost (shortest distance traveled) of the path from initial start point to the node n, h(n) is the estimated of cost of cheapest (distance) from the node n to a goal node. This part of the equation is also called heuristic function/estimation. f(n) is sometimes called fitness number for that node.

At each node, the lowest/value is chosen to be the next step to expand until the goal node is chosen and reached for expansion. Whenever the heuristic function satisfies certain conditions, A* search is both complete and optimal.

Pseudo-Code A*:

Practical Application of A* (How A* Procedure Works):

A* is the most popular choice for path finding, because it’s fairly flexible and can be used in a wide range of contexts such as games (8-puzzle and a path finder).

To illustrate A* search consider Fig. 4.12 again with the same evaluation function values as in Fig. 4.11. Now associated with each node are three numbers, the evaluation function value, the cost function value and the fitness number.

The fitness number is the total of the evaluation function value and the cost-function value.

For example, for node K the fitness number is 21, which is obtained as follows:

(Evaluation function of K) + (cost function from start node S to node K). = 1 + (Cost function from S to C + Cost function from C to H + Cost function from H to I + Cost function from I to K) = 1 + 6 + 5 + 7 + 2 = 21.

While best-first search uses the evaluation function value only for expanding the best node, A* uses the fitness number for its computation.

Thus, if we are trying to find the cheapest solution, a reasonable thing is to try first the node with the lowest value of g (n) + h (n). It turns out that this strategy is quite reasonable provided that the heuristic function h (n) satisfies certain conditions already enumerated.

Next, we consider some important properties of heuristic search algorithms which evaluate its performance:

Characteristics of A* Search Algorithm:

1. Admissibility Condition:

An algorithm is admissible if it is guaranteed to return an optimal solution if it exists.

2. Completeness or Convergence Condition:

An algorithm is complete if it always terminates with a solution if it exists.

A search strategy is convergent if it promises finding a path, a solution graph, or information if they exist. If there is a solution, A* will always find a solution. The convergence properties of A * search algorithm are satisfied for any network with a non-negative cost function, either finite or infinite. For a network with a non-negative cost function, If A* terminates after finding a solution, or if there is no solution, then it is convergent.

The number of the paths in a cyclic path is finite. A node which is previously examined node is revisited only if the search finds a smaller cost than the previous one. The cost function is non-negative; therefore an edge can be examined only once. Thus, A* is convergent.

3. Dominance Property:

In two admissible algorithms A₁ (heuristic estimated value h’₁) and A₂ (heuristic estimated value h’₂ ) A₁ is said to be more dominant and more informed than A₂ if h’₁ > h’₂.

4. Optimality Property:

Solution quality is measured by the path cost function and an optimal solution has the lowest path cost among all solutions.

A very interesting observation about this algorithm is that it is admissible. That is for any node n on such path, h'(n) is always less than, equal to h(n). This is possible only when the evaluation function value never overestimates or underestimates, the distance of the node to the goal. Although the admissibility condition requires h’ to be a lower bound on h, it is to be expected that the more closely h’ approaches h, the better is the performance of the algorithm.

If h were identically equal to h’, an optimal solution path would be found without ever expanding a node off the path (assuming of course only one optimal solution exists). If h’ is identically zero, A* is reduced to blind uniform-cost algorithm (or breadth-first).

Admissible heuristics are by nature optimalistic, because they think the cost of solving the problem is less than it actually is since g (n) is the exact cost to reach n; we have an immediate consequence that f(n) never overestimates the true cost of a solution through n.

The example shown in Fig. 4.8 illustrates a A* Algorithm using Best-first search tree. Search graph can also be explored, to avoid duplicate paths. An algorithm to do this will operate by searching a directed graph in which each node represents a point in the problem space. A node of the problem state in A* represents an indication of how promising, it is a description of a parent link which points back to the best node from which it came and list of nodes which were generated from it.

Each node in A* search has the following characteristics:

1. Each node represents a state in the state space.

2. An indication of the promise of the node.

3. Link to the parents of the best node.

4. List of nodes from which it is generated.

The parent link will make it possible to recover the path to the goal once the goal is found. The list of successors will make it possible, if a better path is found to an already existing node, to propagate the improvement down to its successors. This type of graph is called OR graph, since each of its branches represents an alternative problem solving path.

OR graph finds a single path. Such structures represent the fact that we will know to get from a node to the goal state if we can discover how to get from that node to a goal state along anyone of the branches leaving it.

Problem with A* Search Algorithm:

According to Pearl & Korf (1987) the main shortcoming of A*, and any best-first search, is its memory requirement. Because the entire open pathway list must be saved, A* is space-limited in practice and is no more practical than breadth first search. For large search spaces, A* will run out of memory. Better algorithms exist which take cognizance to this fact. One such algorithm is Iterative Deeping A* (IDA*) Algorithm.

The search technique Depth-first Iterative Depending can be used along with heuristic estimating functions. We, here, make use of a cost cut-off instead of depth cut-off to obtain an algorithm which increments the cost, cut-off in a step by step style. This algorithm, IDA*, uses an admissible heuristic as used in A*, and hence the name Iterative Deepening A*.

IDA* deploys the depth first iterative deepening search to keep the space requirement to a minimum and also uses a cost cut-off strategy. It works iteratively; at each iteration it performs a depth-first search, cutting off a node n as soon its estimated cost of the function f(n) exceeds a specified f(x) threshold.

The threshold is initialised to the estimate of the cost of the f-initial state. After each iteration, the threshold used for the next iteration is set to the minimum estimated cost out of all the values which exceeded the current threshold.

Iterative Deepening A* Algorithm:

The iterative deepening search algorithm, searches the goal node in a depth first manner at limited depth. In each pass the depth is increased by one level to test presence of the goal node in that level. The A* algorithm, on the other hand, in each pass, selects the least cost (f) node for expansion.

The iterative deepening A* (or IDA*) algorithm presented below attempts to combine the partial features of iterative deepening and A* algorithms together. Here, the heuristic measure is used to check the depth cut-off, rather than the order of the selection of nodes for expansion.

The algorithm is formally presented below:

Algorithm IDA*:

Begin:

1. Initialize the current depth cut-off c = 1;

2. Push a set of starting nodes into a stack; Initalize the cut-off at next iteration

c’ = ∞;

3. While the stark is not empty do

Begin:

Pop stack and get the topmost element n;

If n is the goal, Then report success and

return n with the path from the starting node

Else Do:

Begin:

For each child n’ of n

If f (n’) < c Then push n’ into the stack

Else assign c’: = min (c’, f (n’));

End For;

End;

End While;

4. If the stack is empty and c’ = ∞ Then stop and exit;

5. If the stack is empty and c’ ≠ ∞ Then assign c: = c’ and return to step 2; End.

The above algorithm considers two depth cut-off levels. If the stack contains nodes whose children all have ‘f value lower than the cut-off value c, then these children are pushed into the stack to satisfy the depth first criteria of iterative deepening algorithms. However, when it fails, i.e., value of one or more child n’ of n exceeds the cut-off level c, then the c’ value of the node n is set to min (c’, f(n’)).

The algorithm terminates when either:

(i) The goal is identified (successful termination) or

(ii) The stack is empty and the cut-off value c’ = ∞

The main advantage of IDA* over A* lies in the memory requirement. The A* requires an exponential amount of memory because of no restriction on depth cut-off. The IDA* on the other hand expands a node n only when all its children n’ have f(n’) value less then the cut-off value c. Thus, it saves a considerable amount of memory.

Another important point to note is that IDA* expands the same nodes expanded by A* and finds an optimal solution when the heuristic function used is optimal.

Hill Climbing and Best-First Search Methods | Artificial Intelligence

Algorithm for Hill Climbing:

Difficulties of Hill Climbing:

Determination of an Heuristic Function during Hill Climbing – An Example

Best-First Search:

Best-First Algorithm for Best-First Search:

Finding the Best Solution in Best-First Search – A* Search: