Difference between revisions of "Graph search"

From PEGWiki
Jump to: navigation, search
m
(Components in graphs)
Line 64: Line 64:
  
 
===Components in graphs===
 
===Components in graphs===
 +
====Connected components in undirected graphs====
 
The problems of finding connected components in undirected graphs is solved using graph search; we start the search at some vertex, and after running the search, a vertex has its ''visited'' entry true if and only if it is in the same connected component as the starting vertex. This can be used to assign ''all'' vertices connected component IDs, so that two vertices are in the same component (and reachable from each other). (The proof that one pass of graph search will visit all vertices reachable from the starting vertex can be easily executed by induction on the unweighted distance from the starting vertex.) In fact, we can remove the ''visited'' array because we will be able to tell if a vertex has been visited yet by examining its ID number; if it has not been visited yet, then and ID number has not been assigned, and ''vice versa''. Pseudocode:
 
The problems of finding connected components in undirected graphs is solved using graph search; we start the search at some vertex, and after running the search, a vertex has its ''visited'' entry true if and only if it is in the same connected component as the starting vertex. This can be used to assign ''all'' vertices connected component IDs, so that two vertices are in the same component (and reachable from each other). (The proof that one pass of graph search will visit all vertices reachable from the starting vertex can be easily executed by induction on the unweighted distance from the starting vertex.) In fact, we can remove the ''visited'' array because we will be able to tell if a vertex has been visited yet by examining its ID number; if it has not been visited yet, then and ID number has not been assigned, and ''vice versa''. Pseudocode:
 
<pre>
 
<pre>
Line 85: Line 86:
 
We iterate through the vertices one at a time. If, for a given vertex, a component ID has not yet been assigned (it is currently -1), then we assign it the first unused non-negative integer and we do a graph search starting from that vertex, assigning this ID number to all vertices reachable from it. After doing the search, we increment the total number of components found so that the next component will be assigned a higher ID number. But if, when we reach a vertex, a component ID has already been assigned, then we skip that vertex.
 
We iterate through the vertices one at a time. If, for a given vertex, a component ID has not yet been assigned (it is currently -1), then we assign it the first unused non-negative integer and we do a graph search starting from that vertex, assigning this ID number to all vertices reachable from it. After doing the search, we increment the total number of components found so that the next component will be assigned a higher ID number. But if, when we reach a vertex, a component ID has already been assigned, then we skip that vertex.
  
 +
=====Flood fill=====
 +
The special case known as '''flood fill''' is often used as an example application of graph search. The task is to implement an algorithm that will accomplish something like the fill tool in a simple raster graphics editor (such as ''Paint'', which will be familiar to users of Microsoft Windows): given a two-dimensional array of numbers (with each number presumably representing a colour), one location in the array (a pixel), and a new number (a new colour), the objective is to fill the entire contiguous single-valued (monochromatic) area containing the given location with the new number.
 +
 +
In this case, the graph is defined implicitly, and there is no need to explicitly construct it. Each element of the array corresponds to a vertex, which can be identified simply by its row and column indices. Most vertices have degree four (with the exception of those on the boundary); the neighbours' indices can be determined by adding one to or subtracting one from either the row or the column.
 +
 +
The article on [[depth-first search]] contains code for the typical recursive DFS implementation of flood fill.
 +
 +
The flood fill algorithm can be used to count the number of "blobs" in a grid, too; every time we encounter a nonempty square, we flood fill it to make it empty, and so remove the entire blob. The total number of times we have to do this is the total number of blobs, since each blob will only be removed once.
 +
 +
====Biconnected components====
 
The more complex problem of finding [[biconnected component]]s (or finding [[bridge]]s and [[articulation points]]) in an undirected graph can be solved with a depth-first search algorithm.
 
The more complex problem of finding [[biconnected component]]s (or finding [[bridge]]s and [[articulation points]]) in an undirected graph can be solved with a depth-first search algorithm.
  
 +
====Directed graphs====
 
For directed graphs, the equivalent of connected components are [[strongly connected component]]s: again, maximal subsets of the vertices such that inside each component any vertex is reachable from any other. The three best-known algorithms for computing these, [[Tarjan's strongly connected components algorithm|Tarjan's algorithm]], [[Gabow's algorithm]], and [[Kosaraju's algorithm]], all use depth-first searches. The equivalents of articulation points in directed graphs &mdash; [[Dominator tree|dominators]] &mdash; are computed using depth-first search too.
 
For directed graphs, the equivalent of connected components are [[strongly connected component]]s: again, maximal subsets of the vertices such that inside each component any vertex is reachable from any other. The three best-known algorithms for computing these, [[Tarjan's strongly connected components algorithm|Tarjan's algorithm]], [[Gabow's algorithm]], and [[Kosaraju's algorithm]], all use depth-first searches. The equivalents of articulation points in directed graphs &mdash; [[Dominator tree|dominators]] &mdash; are computed using depth-first search too.
  

Revision as of 00:02, 3 March 2012

In graph theory, graph search or graph traversal refers to the technique of "visiting" each vertex of the graph in turn, executing some code whenever a vertex is "visited". By varying the order in which vertices are visited and the nature of the action performed when a vertex is visited, many interesting problems in graph theory can be solved efficiently. Conversely, many graph-theoretic algorithms can be classified as graph search algorithms (see the Applications below).

Basic principles of graph search

Although theoretically one can visit the vertices of a graph in any order, in all useful graph search algorithms the choice of which vertex to visit next is taken from the immediate neighbours of vertices that have already been visited. (Obviously, the first vertex visited does not follow this rule.) This means that usually the algorithm must be invoked separately on each connected component of the graph, because it is impossible to visit a vertex if the algorithm starts in a different component. The set of all vertices which have not yet been visited but may be visited next (because they are immediate neighbours of already-visited vertices) is called the fringe.

Pseudocode for generic graph search algorithm

input G
for each connected component C ∈ G
     for each v ∈ V(C)
          let visited[v] = false
     let u = some vertex ∈ V(C)
     let F = {u}
     while F ≠ {}
          u = some vertex ∈ F
          remove u from F
          if visited[u] = false
               visit u
               visited[u] = true
               for each v ∈ V(C) such that (u,v) ∈ E(G) and visited[v] = false
                    add v to F

Here is a line-by-line analysis of the code:

  • We run the graph search exactly once for each connected component in the graph (for each connected component C ∈ G).
  • Initially, no vertices in C (v ∈ V(C)) have been visited yet, so we set each vertex's entry in the visited array to false (let visited[v] = false).
  • The algorithm starts by choosing some vertex in the component (let u = some vertex ∈ V(C)) and placing it in the fringe F (let F = {u}).
  • From then on, the next vertex we visit will always be a vertex in the fringe, so we know we can stop when the fringe becomes empty (while F ≠ {}).
  • When the fringe is not empty, we select any vertex in F (u = some vertex ∈ F) and remove it from F. If u has already been visited, we have nothing more to do for u. Otherwise (if visited[u] = false), we "visit" u, whatever "visit" happens to mean in the particular algorithm we are using. We record that u has been visited (visited[u] = true).
  • After visiting u, we add any neighbours of u (for each v ∈ V(C) such that (u,v) ∈ E(G)) to the fringe F that have not been visited already (and visited[v] = false). Then the loop starts over again, meaning that these newly added vertices are eventually visited.

As it stands, this code has a serious flaw: how do we know what the connected components of the graph are? As a matter of fact, breadth-first search and depth-first search are usually used to find the connected components, but they are themselves graph search! The solution is as follows: if a connected component has already been examined, then all vertices in that component have been visited, or, contrapositively, if a vertex has not been visited, then its component has not been visited either. This allows us to rewrite the code as

input G
for each u ∈ V(G)
     let visited[u] = false
for each u ∈ V(G)
     if visited[u]=false
          let F = {u}
          while F ≠ {}
               v = some vertex ∈ F
               remove v from F
               if visited[v] = false
                    visit v
                    visited[v] = true
                    for each w ∈ V(G) such that (v,w) ∈ E(G) and visited[w] = false
                         add w to F

Notice that F is not necessarily a real set in the mathematical sense; F might contain multiple copies of a vertex. The nature of F is dependent upon what type of graph search algorithm is being used. Also, an actual implementation of graph search need not actually look like this. For example, depth-first search is often implemented recursively. However, all graph search algorithms can be made to fit this form.

Types of graph search algorithms

Many details have been left out of the generic code above. For example, we are free to choose the initial vertex u in any way we want, to iterate over the connected components in any order, and to iterate over the neighbours of u in any order. However, there are two places where variation is most important: the instruction visit u can mean practically anything, leading to endless variety in graph algorithms, and selecting the next vertex from F can be done in various ways. When you learn a new algorithm, try to identify these choices and understand how they are crucial to the correctness of the algorithm. There are three common paradigms for the selection of the next vertex: breadth-first search, depth-first search, and priority-first search.

Breadth-first search

In breadth-first search, the next vertex selected from F is always the oldest remaining vertex in F (the one that was placed into F first.) That is, F is a FIFO (first in, first out) queue.

Depth-first search

In depth-first search, the next vertex selected from F is always the newest remaining vertex (the one most recently added to F). That is, F is a LIFO (last-in, first out) stack.

Priority-first search

In priority-first search, each vertex is assigned a priority, and whenever a vertex is to be selected from F, the one with the highest priority is selected. This is the basis of Dijkstra's algorithm, A*, and Prim's algorithm (see Applications).

Applications of graph search

Graph search has the following uses, inter alia:

Components in graphs

Connected components in undirected graphs

The problems of finding connected components in undirected graphs is solved using graph search; we start the search at some vertex, and after running the search, a vertex has its visited entry true if and only if it is in the same connected component as the starting vertex. This can be used to assign all vertices connected component IDs, so that two vertices are in the same component (and reachable from each other). (The proof that one pass of graph search will visit all vertices reachable from the starting vertex can be easily executed by induction on the unweighted distance from the starting vertex.) In fact, we can remove the visited array because we will be able to tell if a vertex has been visited yet by examining its ID number; if it has not been visited yet, then and ID number has not been assigned, and vice versa. Pseudocode:

input G
let cnt = 0
for each u ∈ V(G)
     let id[u] = -1
for each u ∈ V(G)
     if id[u]=-1
          let F = {u}
          while F ≠ {}
               v = some vertex ∈ F
               remove v from F
               if id[v] = -1
                    visit v
                    id[v] = cnt
                    for each w ∈ V(G) such that (v,w) ∈ E(G) and id[w] = -1
                         add w to F
          cnt = cnt + 1

We iterate through the vertices one at a time. If, for a given vertex, a component ID has not yet been assigned (it is currently -1), then we assign it the first unused non-negative integer and we do a graph search starting from that vertex, assigning this ID number to all vertices reachable from it. After doing the search, we increment the total number of components found so that the next component will be assigned a higher ID number. But if, when we reach a vertex, a component ID has already been assigned, then we skip that vertex.

Flood fill

The special case known as flood fill is often used as an example application of graph search. The task is to implement an algorithm that will accomplish something like the fill tool in a simple raster graphics editor (such as Paint, which will be familiar to users of Microsoft Windows): given a two-dimensional array of numbers (with each number presumably representing a colour), one location in the array (a pixel), and a new number (a new colour), the objective is to fill the entire contiguous single-valued (monochromatic) area containing the given location with the new number.

In this case, the graph is defined implicitly, and there is no need to explicitly construct it. Each element of the array corresponds to a vertex, which can be identified simply by its row and column indices. Most vertices have degree four (with the exception of those on the boundary); the neighbours' indices can be determined by adding one to or subtracting one from either the row or the column.

The article on depth-first search contains code for the typical recursive DFS implementation of flood fill.

The flood fill algorithm can be used to count the number of "blobs" in a grid, too; every time we encounter a nonempty square, we flood fill it to make it empty, and so remove the entire blob. The total number of times we have to do this is the total number of blobs, since each blob will only be removed once.

Biconnected components

The more complex problem of finding biconnected components (or finding bridges and articulation points) in an undirected graph can be solved with a depth-first search algorithm.

Directed graphs

For directed graphs, the equivalent of connected components are strongly connected components: again, maximal subsets of the vertices such that inside each component any vertex is reachable from any other. The three best-known algorithms for computing these, Tarjan's algorithm, Gabow's algorithm, and Kosaraju's algorithm, all use depth-first searches. The equivalents of articulation points in directed graphs — dominators — are computed using depth-first search too.

Paths and spanning trees

One of the simplest uses of graph search is finding a path from one vertex s to another vertex t. For example, we might want to find a path out of a maze. The code for such an application might look like this:

input G,s,t
for each u ∈ V(G)
     let visited[u] = false
let F = {(s,s)}
while F ≠ {}
     (u,v) = some pair ∈ F
     remove (u,v) from F
     if visited[v] = false
          pred[v] = u
          visited[v] = true
          for each w ∈ V(G) such that (v,w) ∈ E(G) and visited[w] = false
               add (v,w) to F

Here, instead of simply pushing a vertex's neighbour onto the fringe F, we push a pair containing the original vertex and the neighbour. That way, when this pair is removed from F, we know both which vertex we wish to visit and where we "came from" (the predecessor). We record the predecessor in the array visited[]. After this algorithm has terminated, if t is in the same connected component as s, then at some point t will have been visited, and at that time pred[t] will have been set to some vertex which has t as its neighbour. That vertex, in turn, must have been visited, so its predecessor was recorded also, and we can ultimately trace back the path to s. (In this implementation, s is its own predecessor; this is not necessary however.) In other words, in the general case, s, pred[pred[pred[...t...]]], ..., pred[pred[pred[t]]], pred[pred[t]], pred[t], t will be a path from s to t after the algorithm has terminated.

As a matter of fact, this algorithm finds paths from s to all other vertices reachable from s. We can replace t by any other vertex and use the pred[] array to find a path from s to this vertex. Since every vertex except for s has one unique parent (predecessor), the vertices form a tree, with an edge between two vertices if and only if one is the other's predecessor when this algorithm is run. The root of the tree is the only vertex lacking a parent, or s. The algorithm is said to produce a spanning tree of the connected component containing s: a subgraph of that connected component which is also a tree.

Breadth-first search will find the shortest paths from one vertex to all others reachable from it in an unweighted graph; Dijkstra's algorithm and A* will do the same in a weighted graph. Prim's algorithm will find a minimum spanning tree in an undirected graph, a spanning tree whose total weight (the sum of the weights of all its edges) is minimal. Finding a path from one vertex to another is also essential to the augmenting path algorithms, including Dinić's algorithm and those using the Ford-Fulkerson method, for computing a maximum flow.

Topological sort

The problem of topological sorting can be easily solved using graph search (it does not matter if it is breadth-first, depth-first, or anything else). For reference, the problem is to find, in a directed acyclic graph, an ordering of the vertices such that if an edge (u,v) exists then u precedes v in the ordering. To solve this using graph search, we first add all sources to the fringe, because they can be placed first in the ordering, and subsequently we visit each vertex only after all vertices that must precede it have already been visited, adding it to the end of the ordering as it is visited.