Editing Kruskal's algorithm

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
 
'''Kruskal's algorithm''' is a general-purpose algorithm for the [[minimum spanning tree]] problem, based on the [[disjoint sets data structure]]. The existence of very simple algorithms to maintain disjoint sets in almost [[Asymptotic analysis|constant time]] gives rise to simple implementations of Kruskal's algorithm whose running times are close to linear, usually outperforming [[Prim's algorithm]] in sparse graphs.
 
'''Kruskal's algorithm''' is a general-purpose algorithm for the [[minimum spanning tree]] problem, based on the [[disjoint sets data structure]]. The existence of very simple algorithms to maintain disjoint sets in almost [[Asymptotic analysis|constant time]] gives rise to simple implementations of Kruskal's algorithm whose running times are close to linear, usually outperforming [[Prim's algorithm]] in sparse graphs.
  
==Theory of the algorithm==
+
=Theory of the algorithm=
 
Kruskal's may be characterized as a [[greedy algorithm]], which builds the MST one edge at a time. As befits a MST algorithm, the greedy strategy is to continually add the remaining edge of lowest weight. Unlike Prim's, however, Kruskal's adds edges without regard to the connectivity of the partially built MST. We shall assume that a spanning tree exists for the following sections. (If you find them too difficult, skip them.)
 
Kruskal's may be characterized as a [[greedy algorithm]], which builds the MST one edge at a time. As befits a MST algorithm, the greedy strategy is to continually add the remaining edge of lowest weight. Unlike Prim's, however, Kruskal's adds edges without regard to the connectivity of the partially built MST. We shall assume that a spanning tree exists for the following sections. (If you find them too difficult, skip them.)
  
===Lemma===
+
==Lemma==
 
<p>Suppose that a spanning tree <math>T</math> is given of some graph <math>G</math>. Then, the addition of any edge <math>\notin E(T)</math> to <math>E(T)</math>, followed by the removal of any edge from the resulting cycle, yields a spanning tree of <math>G</math>.</p>
 
<p>Suppose that a spanning tree <math>T</math> is given of some graph <math>G</math>. Then, the addition of any edge <math>\notin E(T)</math> to <math>E(T)</math>, followed by the removal of any edge from the resulting cycle, yields a spanning tree of <math>G</math>.</p>
  
 
<p>''Proof'': Before the operation, the number of vertices is one more than the number of edges. After the operation, this is again true. As the addition of the new edge generates exactly one simple cycle, there are no longer any cycles after an edge on this cycle is removed. So the new <math>T</math> has a vertex count which exceeds its edge count by one and contains no cycles; it must therefore be a tree.</p>
 
<p>''Proof'': Before the operation, the number of vertices is one more than the number of edges. After the operation, this is again true. As the addition of the new edge generates exactly one simple cycle, there are no longer any cycles after an edge on this cycle is removed. So the new <math>T</math> has a vertex count which exceeds its edge count by one and contains no cycles; it must therefore be a tree.</p>
  
===The algorithm===
+
==The algorithm==
 
<p>We are now ready to present the algorithm. We begin with no knowledge of the edges of the MST, and add them one by one until the MST is complete. To do so we consider edges in increasing order of weight. When considering an edge, if adding it would create a cycle, we skip it; otherwise we add it. Once <math>V-1</math> edges have been added, we have constructed a MST. Kruskal's is both <i>correct</i> (Theorem 2) and <i>complete</i> (Theorem 1).</p>
 
<p>We are now ready to present the algorithm. We begin with no knowledge of the edges of the MST, and add them one by one until the MST is complete. To do so we consider edges in increasing order of weight. When considering an edge, if adding it would create a cycle, we skip it; otherwise we add it. Once <math>V-1</math> edges have been added, we have constructed a MST. Kruskal's is both <i>correct</i> (Theorem 2) and <i>complete</i> (Theorem 1).</p>
  
====Remark====
+
===Remark===
 
If adding an edge to the partially built MST generates a cycle, it will also do so if the edge is added to the partially built MST later on in the algorithm, since we only add edges and never remove them. The contrapositive is also true: if adding an edge does not generate a cycle, it wouldn't have generated a cycle if it were added earlier, either.
 
If adding an edge to the partially built MST generates a cycle, it will also do so if the edge is added to the partially built MST later on in the algorithm, since we only add edges and never remove them. The contrapositive is also true: if adding an edge does not generate a cycle, it wouldn't have generated a cycle if it were added earlier, either.
  
====Theorem 1====
+
===Theorem 1===
 
<p>Kruskal's algorithm will never fail to find a spanning tree in a connected graph.</p>
 
<p>Kruskal's algorithm will never fail to find a spanning tree in a connected graph.</p>
 
<p>''Proof'': By contradiction. Assume that all edges have been considered and the partially built tree is still not complete. Then, there exists some edge that connects two vertices in different connected components of the partially built tree. But this edge must have been considered at some point and discarded as adding it would have created a cycle. This is a contradiction, as adding it ''now'' would not create a cycle, ''per'' the Remark above.</p>
 
<p>''Proof'': By contradiction. Assume that all edges have been considered and the partially built tree is still not complete. Then, there exists some edge that connects two vertices in different connected components of the partially built tree. But this edge must have been considered at some point and discarded as adding it would have created a cycle. This is a contradiction, as adding it ''now'' would not create a cycle, ''per'' the Remark above.</p>
  
====Theorem 2====
+
===Theorem 2===
 
<p>Kruskal's algorithm will never produce a non-minimal spanning tree.</p>
 
<p>Kruskal's algorithm will never produce a non-minimal spanning tree.</p>
 
<p>''Proof'':<br/>
 
<p>''Proof'':<br/>
Line 41: Line 41:
 
-->
 
-->
  
==Implementation==
+
=Implementation=
 
We have so far glossed over a crucial detail: we must have a means of efficiently deciding whether an edge can be added without generating a cycle, and of adding that edge if it can. To do so, we note that a cycle is created if and only if the two endpoints of the edge are in the same connected component. We could, of course, answer this query through any [[graph search]] algorithm such as [[Depth-first search|DFS]] or [[Breadth-first search|BFS]]. However, that would make the algorithm quadratic time overall, which is undesirable. Instead, we notice that a data structure [[Disjoint sets data structure|already exists]] which can efficiently identify the component containing a vertex and add new edges (joining together components). Assuming that we know how to implement it, then, here is Kruskal's algorithm:
 
We have so far glossed over a crucial detail: we must have a means of efficiently deciding whether an edge can be added without generating a cycle, and of adding that edge if it can. To do so, we note that a cycle is created if and only if the two endpoints of the edge are in the same connected component. We could, of course, answer this query through any [[graph search]] algorithm such as [[Depth-first search|DFS]] or [[Breadth-first search|BFS]]. However, that would make the algorithm quadratic time overall, which is undesirable. Instead, we notice that a data structure [[Disjoint sets data structure|already exists]] which can efficiently identify the component containing a vertex and add new edges (joining together components). Assuming that we know how to implement it, then, here is Kruskal's algorithm:
 
<pre>
 
<pre>
Line 53: Line 53:
 
It would make sense to stop the loop after <math>V-1</math> edges have been added (instead of processing every edge, which is often unnecessary). Nevertheless, it does not affect the asymptotic complexity (see Analysis)
 
It would make sense to stop the loop after <math>V-1</math> edges have been added (instead of processing every edge, which is often unnecessary). Nevertheless, it does not affect the asymptotic complexity (see Analysis)
  
==Analysis==
+
=Analysis=
 
The usual implementation of Kruskal's sorts the edges by weight first, which takes <math>\mathcal{O}(E\lg E)</math> time. Following this, <math>\mathcal{O}(E)</math> union-find operations are performed. As each can be done in almost-constant time (see [[Disjoint set data structure|the article itself]]), this step requires approximately <math>\mathcal{O}(E)</math> time to complete. The cost of the sort then dominates the running time. As typical implementations of [[quicksort]], usually used here, are faster than those of [[heapsort]], which may be said to operate implicitly in [[Prim's algorithm]], a well-coded Kruskal's will outperform Prim's in sparse graphs. In dense graphs, Prim's non-heap implementation, taking <math>\mathcal{O}(E+V^2)</math> time, is likely to outperform Kruskals, with runtime close to <math>\mathcal{O}(V^2 \lg V)</math>. In-place quicksort takes expected linear stack space, and the disjoint set data structure takes linear extra space, so Kruskal's has a linear memory complexity.
 
The usual implementation of Kruskal's sorts the edges by weight first, which takes <math>\mathcal{O}(E\lg E)</math> time. Following this, <math>\mathcal{O}(E)</math> union-find operations are performed. As each can be done in almost-constant time (see [[Disjoint set data structure|the article itself]]), this step requires approximately <math>\mathcal{O}(E)</math> time to complete. The cost of the sort then dominates the running time. As typical implementations of [[quicksort]], usually used here, are faster than those of [[heapsort]], which may be said to operate implicitly in [[Prim's algorithm]], a well-coded Kruskal's will outperform Prim's in sparse graphs. In dense graphs, Prim's non-heap implementation, taking <math>\mathcal{O}(E+V^2)</math> time, is likely to outperform Kruskals, with runtime close to <math>\mathcal{O}(V^2 \lg V)</math>. In-place quicksort takes expected linear stack space, and the disjoint set data structure takes linear extra space, so Kruskal's has a linear memory complexity.
  
==References==
+
= References =
 
* Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. ''Introduction to Algorithms'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 23.2: The algorithms of Kruskal and Prim, pp.567&ndash;574.
 
* Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. ''Introduction to Algorithms'', Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 23.2: The algorithms of Kruskal and Prim, pp.567&ndash;574.
  
 
[[Category:Algorithms]]
 
[[Category:Algorithms]]
 
[[Category:Graph theory]]
 
[[Category:Graph theory]]

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)