Editing Segment tree

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
The '''segment tree''' is a highly versatile [[data structure]], based upon the [[Divide and conquer|divide-and-conquer]] paradigm, which can be thought of as a tree of intervals of an underlying array, constructed so that queries on ranges of the array as well as modifications to the array's elements may be efficiently performed.
+
The '''segment tree''' is a highly versatile data structure, based upon the [[Divide and conquer|divide-and-conquer]] paradigm, which can be thought of as a tree of intervals of an underlying array, constructed so that queries on ranges of the array as well as modifications to the array's elements may be efficiently performed.
  
 
==Motivation==
 
==Motivation==
Line 21: Line 21:
 
==Structure==
 
==Structure==
 
[[File:Segtree_92631507.png|200px|thumb|right|This segment tree.]]
 
[[File:Segtree_92631507.png|200px|thumb|right|This segment tree.]]
Suppose that we use the function defined above to evaluate <math>f(1,N)</math>, where <math>N</math> is the number of elements in the array. When <math>N</math> is large, this recursive call has two "children", one of which is the recursive call <math>f\left(1,\left\lfloor\frac{N+1}{2}\right\rfloor\right)</math>, and the other one of which is <math>f\left(\left\lfloor\frac{N+1}{2}\right\rfloor+1,N\right)</math>. Each of these children will then have two children of its own, and so on, down until the base case is reached. If we represent these recursive calls with a tree structure, the call <math>f(1,N)</math> would be the root, it would have two children, each child would have two more children, and so on; the base cases would be the leaves of the tree. We are now ready to specify the structure of the segment tree:
+
Suppose that we use the function defined above to evaluate <math>f(1,N)</math>, where <math>N</math> is the number of elements in the array. When <math>N</math> is large, this recursive call has two "children", one of which is the recursive call <math>f(1,\lfloor\frac{N+1}{2}\rfloor)</math>, and the other one of which is <math>f(\lfloor\frac{N+1}{2}\rfloor+1,N)</math>. Each of these children will then have two children of its own, and so on, down until the base case is reached. If we represent these recursive calls with a tree structure, the call <math>f(1,N)</math> would be the root, it would have two children, each child would have two more children, and so on; the base cases would be the leaves of the tree. We are now ready to specify the structure of the segment tree:
 
* it is a binary tree which represents some underlying array;
 
* it is a binary tree which represents some underlying array;
 
* each node is associated with some interval of the array and contains the value(s) of one or more functions of the elements in that interval;
 
* each node is associated with some interval of the array and contains the value(s) of one or more functions of the elements in that interval;
Line 100: Line 100:
 
           return query_rec(1,1,N,begin,end)
 
           return query_rec(1,1,N,begin,end)
 
</pre>
 
</pre>
==Variations==
 
The segment tree can be adapted to retrieve not only the minimum element in a range but also various other functions. Here are some examples taken from otherwise difficult contest problems:
 
===Maximum===
 
Analogously to the minimum: all <code>min</code> operations replaced by <code>max</code>.
 
===Sum or product===
 
Each non-leaf node contains the sum of the values at its children. For example, in the pseudocode above, all <code>min</code> operations would be replaced by addition operations. However, in the case of sums, the segment tree is superseded by the [[binary indexed tree]], which is smaller (in terms of space), faster, and simpler to code. Product can be implemented in the same way, with multiplication in place of the addition.
 
===Maximum/minimum prefix/suffix sum===
 
<p>A ''prefix'' of a range consists of the first <math>k</math> elements in that range (<math>k</math> may be zero); a ''suffix'' is analogously defined for the last <math>k</math> elements. The ''maximal sum prefix'' of a range is the prefix with maximal sum (where the empty range has sum zero). That maximal sum is known as the ''maximum prefix sum'' (''minimum prefix sum'' defined analogously). We would like to be able to query maximum prefix sum in an interval efficiently. For example, the maximum prefix sum in <math>[1,-2,3,-4]</math> is 2, the sum of the prefix <math>[1,-2,3]</math>; no other prefix has a higher sum.</p>
 
 
<p>To solve this problem with a segment tree, we let each node store two functions of its associated interval: the maximum prefix sum and the total sum (of all elements in the interval). The total sum at a non-leaf node is the sum of the total sums at the two children. To find the max prefix sum at a non-leaf node, we notice that the maximal sum prefix ends either within the left child interval or within the right; in the former case, we simply take the maximal prefix sum from the left child, whereas in the latter we add the total sum from the left child and the maximal prefix sum from the right. The maximum of these two values then gives our answer. The query operation is similar, but with potentially more than two adjacent intervals (in which the optimal location for the end of the maximal sum prefix might lie in any one of them).</p>
 
===Maximum/minimum subvector sum===
 
This problem asks for the maximal sum of any sub-interval of a given interval. That is, it is similar to the max prefix sum problem explained in the previous section, but the beginning of the sub-interval is not anchored to the beginning of the given interval. (The maximal subvector sum in <math>[1,-2,3,-4]</math> is 3, the sum of the sub-interval <math>[3]</math>.) Each node now has to store four pieces of information about its associated interval: the max prefix sum, the max suffix sum, the max subvector sum, and the total sum. Details are left as an exercise for the reader.
 
==="Stabbing query"===
 
<p>In this problem, the tree contains a dynamic set of intervals (that is, we may add or remove intervals) over a static set of endpoints (that is, we can only add intervals both of whose endpoints lie in a pre-determined set) lying on the one-dimensional number line. A query is the following question: ''how many intervals in the set contain the following point?''</p>
 
<p>To solve this problem, we first sort the endpoint set by coordinate (plus and minus infinity may be considered endpoints too, to simplify implementation details). The interval delimited by each pair of adjacent endpoints is assigned one leaf node in the segment tree, and a non-leaf node is associated with the union of the intervals at the children, as usual. When we add an interval to the tree, we increment the values (initially zero) stored at each of the nodes that would normally be selected if querying that interval (marked in yellow in the illustration); when we remove that interval, we decrement those values. To answer a query, we first locate (by [[binary search]]) the leaf node whose interval contains the query point, and then we add the counts stored at all nodes on the path from that leaf to the root. (In some sense, then, the natures of the ''query'' and ''update'' operations are exchanged.)</p>
 
<p>Retrieving the intervals themselves is not difficult; the count field in each node is replaced by a list of intervals stored at that node; an update consists of adding/removing intervals to/from lists, rather than incrementing/decrementing counts, and the query consists of outputting the contents of all lists encountered on the way from the leaf to the root, rather than the sum of the counters.</p>
 
 
==Extension to two or more dimensions==
 
<p>The segment tree data structure is not restricted to solving problems on one-dimensional arrays; in principle, it may be used on arrays of any dimension (time and space considerations, though, become more and more pressing in higher dimensions). Intervals are replaced by boxes. Hence, in a two-dimensional array, we might query the minimum element in a ''box'', the Cartesian product of two intervals (for example, the set of vertices with row coordinate between 1 and 3, inclusive, and column coordinate between 2 and 5, inclusive).</p>
 
<p>For the two-dimensional segment tree, we start with a one-dimensional segment tree in which each leaf represents a row of the array and other nodes represent contiguous collections of rows. Now, each node is itself a one-dimensional segment tree, in which a leaf node represents the intersection of a single column with the row set, and other nodes represent larger boxes, still contained within the row set. For example, a segment tree on a 4&times;4 array would have nodes for the row sets <math>[1,4], [1,2], [3,4], [1,1], [2,2], [3,3], [4,4]</math>, exactly as for a one-dimensional segment tree. The node for <math>[3,4]</math>, for example, would itself be a tree on the boxes <math>[3,4]\times[1,4], [3,4]\times[1,2], [3,4]\times[3,4], [3,4]\times[1,1], [3,4]\times[2,2], [3,4]\times[3,3], [3,4]\times[4,4]</math>.</p>
 
<p>The application of higher-dimensional segment trees to geometrical problems is more complex and is discussed in a separate article.</p>
 
 
==Lazy propagation==
 
<p>For certain types of segment trees, ''range updates'' are possible. Consider, for example, a variation in the range minimum query problem in which we can not only update individual elements but also request that every element in a given range be set to the same specified value. This would be known as a ''range update''. ''Lazy propagation'' is a technique that allows range updates to be carried out with the same asymptotic time complexity, <math>\mathcal{O}(\lg N)</math>, as individual updates.</p>
 
<p>The technique works as follows: each node contains an additional ''lazy'' field, which will be used for temporary storage. When this field is not being used, its value will be set to <math>+\infty</math>. When updating a range, we select the same set of nodes that we would select if querying that range, and update their lazy fields to the desired value (that is, we set them to the new value if the new value is lower), except for leaf nodes, whose values are immediately set to the new value. When a query or update operation requires us to access a proper descendant of any node whose ''lazy'' field is set (''i.e.'' not <math>+\infty</math>), we "push" the lazy field onto the two children: that is, we update the children's lazy fields according to the parent's and reset the parent's to <math>+\infty</math>. If, however, we access the node without accessing any of its proper descendants, and the lazy field is set, we simply use the value of the lazy field as the minimum for that node's associated interval.</p>
 
 
[[Category:Data structures]]
 

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)