Topological sort

From PEGWiki
Revision as of 18:46, 18 March 2011 by Brian (Talk | contribs) (Created page with "The '''topological sorting problem''' is the problem of determining a '''topological ordering''' of a digraph's vertices, that is, a list of the vertices such that an edge never ...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The topological sorting problem is the problem of determining a topological ordering of a digraph's vertices, that is, a list of the vertices such that an edge never exists from a vertex later in the list to a vertex earlier in the list.

Motivation

Topological sorting is perhaps best understood as the problem of dependency resolution. Examples are given here:

  • To earn a degree, a student has to complete certain required courses. However, these cannot be taken in just any order; some of these courses are prerequisites for others. For example, the Introduction to Algorithms in Bioinformatics course may depend on the Introduction to Algorithms course, which may depend on the Introduction to Data Structures course, which may in turn depend on the Introduction to Programming course. This is a dependency relationship; taking later courses depends on taking prerequisite earlier courses. We can construct a graph in which each vertex is a course and a directed edge exists from u to v if and only if the course corresponding to u is a prerequisite for the course corresponding to v. Topologically sorting this graph will give a possible order in which the courses may be taken (although it cannot, of course, be used to solve the more serious problem of scheduling conflicts).
  • Large software applications are usually heavily modularized, where, generally, each module is a chunk of code of manageable size containing a simple group of related functions stored in a single file or a small number of files. These modules may depend on simpler modules, which may in turn depend on simpler modules, and so on down. For example, the compiler gcc may depend on the arithmetic library gmp, which may depend on the macro language m4, which may depend on the C standard library libc. We cannot compile a module until all its dependencies have already been compiled. If each module is a vertex and an edge exists from one vertex to another if and only if the latter's module depends on the former's, then topologically sorting this graph gives a valid order in which we may compile all the modules. (The program /usr/bin/tsort was provided on many UNIX-based systems exactly for this purpose; it has been superseded by make, but it still exists on some modern systems.)
  • In some video games, each level consists of exploring a particular part of the world map. There is no preset order on how to explore the map, but some areas may not be explored right away because they depend on "earlier" areas having been explored (and presumably "beaten"). We let each area of the map be a vertex and let there be an edge from one vertex to another if the former's area must be explored before the latter's. A topological ordering of this graph corresponds to a valid order in which the map may be explored.

Existence of a solution

We first note that being a DAG is necessary for a digraph to have a topological ordering. This is not hard to see; if the graph contains a cycle, v_1 v_2 ... v_n v_1, then v_2 must occur after v_1 in the topological ordering, v_3 must occur after v_2, and so on, but then v_1 must occur after v_1, leading to the absurd conclusion that v_1 must occur after itself in the list. Intuitively, if a cyclic dependency exists among a set of tasks, none of those tasks can ever be executed.

The next question is whether this condition is also sufficient. That is, is it true that a digraph may be topologically ordered if and only if it is a DAG? The answer is yes.

Theorem: Every DAG may be topologically ordered.

Proof: By induction on the number of vertices. Clearly, a DAG with one vertex may be topologically ordered; the list in this case consists only of the vertex itself. Now, suppose that any DAG with n vertices may be ordered, and now consider any DAG with n+1 vertices. Recall that every DAG has at least one source. We let this be the first entry in our list. The subgraph left behind, which has n vertices, is still acyclic; obviously cycles cannot be introduced by removing edges. By the inductive hypothesis, we can topologically order this subgraph. Then, we append the topological ordering of this subgraph to the end of our current list, to obtain a list of vertices of the original n+1-vertex graph with the source in the first position. Now, given any two vertices u and v on the list, where, without loss of generality, u precedes v, either u is the first entry (the source), in which case there is obviously no v-u edge, or u and v are both not the first entry, in which case they must be properly ordered as they belong to the topologically ordered tail of the list. Therefore this new list of n+1 vertices is a topological ordering of the n+1-vertex graph. _\blacksquare

Uniqueness

In general, the topological ordering is not unique; there may be several different orders in which the graph may be traversed that satisfy the topological ordering property. It is, however, unique if and only if the partial order represented by the DAG is in fact a total order; that is, when the DAG has a Hamiltonian path, or, equivalently, the transitive closure of the DAG is the complete graph.