Editing Range minimum query

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 1: Line 1:
The term '''range minimum query (RMQ)''' comprises all variations of the problem of finding the smallest element (or the position of the smallest element) in a contiguous subsequence of a list of items taken from a [[totally ordered set]] (usually numbers). This is one of the most extensively-studied problems in computer science, and many algorithms are known, each of which is appropriate for a specific variation.
+
The term '''range minimum query (RMQ)''' comprises all variations of the problem of finding the smallest element in a contiguous subsequence of a list of items taken from a [[totally ordered set]] (usually numbers). This is one of the most extensively-studied problems in computer science, and many algorithms are known, each of which is appropriate for a specific variation.
  
A single range minimum query is a set of indices <math>i < j</math> into an array <math>A</math>; the answer to this query is some <math>k \in [i,j)</math> (see [[half-open interval]]) such that <math>A_k \leq A_m</math> for all <math>m \in [i,j)</math>. In isolation, this query is answered simply by scanning through the range given and selecting the minimum element, which can take up to linear time. The problem is interesting because we often desire to answer a large number of queries, so that if, for example, 500000 queries are to be performed on a single array of 10000000 elements, then using this naive approach on each query individually is probably too slow.
+
A single range minimum query is a set of indices <math>i < j</math> into an array <math>A</math>; the answer to this query is some <math>k \in [i,j]</math> such that <math>A_k \leq A_m</math> for all <math>m \in [i,j]</math>. In isolation, this query is answered simply by scanning through the range given and selecting the minimum element, which can take up to linear time. The problem is interesting because we often desire to answer a large number of queries, so that if, for example, 500000 queries are to be performed on a single array of 10000000 elements, then using this naive approach on each query individually is probably too slow.
  
 
==Static==
 
==Static==
Line 8: Line 8:
 
===Sliding===
 
===Sliding===
 
This problem can be solved in linear time in the special case in which the intervals are guaranteed to be given in such an order that they are successive elements of a [[sliding window]]; that is, each interval given in input neither starts earlier nor ends later than the previous one. This is the [[sliding range minimum query]] problem; an algorithm is given in that article.
 
This problem can be solved in linear time in the special case in which the intervals are guaranteed to be given in such an order that they are successive elements of a [[sliding window]]; that is, each interval given in input neither starts earlier nor ends later than the previous one. This is the [[sliding range minimum query]] problem; an algorithm is given in that article.
 
===Naive precomputation===
 
This approach involves precomputing the minimum of every possible range in the given array and storing all the results in a table; this table will use up <math>\Theta(n^2)</math> space but queries will be answerable in constant time. This table can also be computed in <math>\Theta(n^2)</math> time by noting that the minimum of a range <math>A_{i..j-1}</math> occurs either in the last element, <math>A_{j-1}</math>, or in the rest of it, <math>A_{i..j-2}</math>, so that given the minimum in range <math>[i,j-1)</math> we can compute that of <math>[i,j)</math> in constant time.
 
 
Formally, we use the recurrence
 
:<math>\operatorname{RMQ}(i,j) = \begin{cases} A_i & \text{if }i = j-1 \\ \min(\operatorname{RMQ}(i,j-1), A_{j-1}) & \text{otherwise} \end{cases}</math>
 
  
 
===Division into blocks===
 
===Division into blocks===
 
In all other cases, we must consider other solutions. A simple solution for this and other related problems involves splitting the array into equally sized blocks (say, <math>m</math> elements each) and precomputing the minimum in each block. This precomputation will take <math>\Theta(n)</math> time, since it takes <math>\Theta(m)</math> time to find the minimum in each block, and there are <math>\Theta(n/m)</math> blocks.
 
In all other cases, we must consider other solutions. A simple solution for this and other related problems involves splitting the array into equally sized blocks (say, <math>m</math> elements each) and precomputing the minimum in each block. This precomputation will take <math>\Theta(n)</math> time, since it takes <math>\Theta(m)</math> time to find the minimum in each block, and there are <math>\Theta(n/m)</math> blocks.
  
After this, when we are given some query <math>[a,b)</math>, we note that this can be written as the union of the intervals <math>[a,c_0), [c_0, c_1), [c_1, c_2), ..., [c_{k-1}, c_k), [c_k, b)</math>, where all the intervals except for the first and last are individual blocks. If we can find the minimum in each of these subintervals, the smallest of those values will be the minimum in <math>[a,b)</math>. But because all the intervals in the middle (note that there may be zero of these if <math>a</math> and <math>b</math> are in the same block) are blocks, their minima can simply be looked up in constant time.
+
After this, when we are given some query <math>[a,b]</math>, we note that this can be written as the union of the intervals <math>[a,c_0), [c_0, c_1), [c_1, c_2), ..., [c_{k-1}, c_k), [c_k, b]</math>, where all the intervals except for the first and last are individual blocks. If we can find the minimum in each of these subintervals, the smallest of those values will be the minimum in <math>[a,b]</math>. But because all the intervals in the middle (note that there may be zero of these if <math>a</math> and <math>b</math> are in the same block) are blocks, their minima can simply be looked up in constant time.
  
Observe that the intermediate intervals are <math>O(n/m)</math> in number (because there are only about <math>n/m</math> blocks in total). Furthermore, if we pick <math>c_0</math> at the nearest available block boundary, and likewise with <math>c_k</math>, then the intervals <math>[a,c_0)</math> and <math>[c_k,b)</math> have size <math>O(m)</math> (since they do not cross block boundaries). By taking the minimum of all the precomputed block minima, and the elements in <math>[a,c_0)</math> and <math>[c_k,b)</math>, the answer is obtained in <math>O(m + n/m)</math> time. If we choose a block size of <math>m \approx \sqrt{n}</math>, we obtain <math>O(\sqrt{n})</math> overall.
+
Observe that the intermediate intervals are <math>O(n/m)</math> in number (because there are only about <math>n/m</math> blocks in total). Furthermore, if we pick <math>c_0</math> at the nearest available block boundary, and likewise with <math>c_k</math>, then the intervals <math>[a,c_0)</math> and <math>[c_k,b]</math> have size <math>O(m)</math> (since they do not cross block boundaries). By taking the minimum of all the precomputed block minima, and the elements in <math>[a,c_0)</math> and <math>[c_k,b]</math>, the answer is obtained in <math>O(m + n/m)</math> time. If we choose a block size of <math>m \approx \sqrt{n}</math>, we obtain <math>O(\sqrt{n})</math> overall.
  
 
===Segment tree===
 
===Segment tree===
Line 26: Line 20:
  
 
===Sparse table===
 
===Sparse table===
(Namely due to <ref>"Range Minimum Query and Lowest Common Ancestor". (n.d.). Retrieved from http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor#Range_Minimum_Query_%28RMQ%29</ref>.) At the expense of space and preprocessing time, we can even answer queries in <math>O(1)</math> using [[dynamic programming]]. Define <math>M_{i,k}</math> to be the minimum of the elements <math>A_i, A_{i+1}, ..., A_{i+2^k-1}</math> (or as many of those elements as actually exist); that is, the elements in an interval of size <math>2^k</math> starting from <math>i</math>. Then, we see that <math>M_{i,0} = A_i</math> for each <math>i</math>, and <math>M_{i,k+1} = \min(M_{i,k}, M_{i+2^k,k})</math>; that is, the minimum in an interval of size <math>2^{k+1}</math> is the smaller of the minima of the two halves of which it is composed, of size <math>2^k</math>. Thus, each entry of <math>M</math> can be computed in constant time, and in total <math>M</math> has about <math>n\cdot \log n</math> entries (since values of <math>k</math> for which <math>2^k > n</math> are not useful). Then, given the query <math>[a,b)</math>, simply find <math>k</math> such that <math>[a,a+2^k)</math> and <math>[b-2^k,b)</math> overlap but are contained within <math>[a,b)</math>; then we already know the minima in each of these two sub-intervals, and since they cover the query interval, the smaller of the two is the overall minimum. It's not too hard to see that the desired <math>k</math> is <math>\lfloor\log(b-a)\rfloor</math>; and then the answer is <math>\min(M_{a,k}, M_{b-2^k,k})</math>.
+
(Name due to <ref>"Range Minimum Query and Lowest Common Ancestor". (n.d.). Retrieved from http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=lowestCommonAncestor#Range_Minimum_Query_%28RMQ%29</ref>.) At the expense of space and preprocessing time, we can even answer queries in <math>O(1)</math> using [[dynamic programming]]. Define <math>M_{i,k}</math> to be the minimum of the elements <math>A_i, A_{i+1}, ..., A_{i+2^k-1}</math> (or as many of those elements as actually exist); that is, the elements in an interval of size <math>2^k</math> starting from <math>i</math>. Then, we see that <math>M_{i,0} = A_i</math> for each <math>i</math>, and <math>M_{i,k+1} = \min(M_{i,k}, M_{i+2^k,k})</math>; that is, the minimum in an interval of size <math>2^{k+1}</math> is the smaller of the minima of the two halves of which it is composed, of size <math>2^k</math>. Thus, each entry of <math>M</math> can be computed in constant time, and in total <math>M</math> has about <math>n\cdot \log n</math> entries (since values of <math>k</math> for which <math>2^k > n</math> are not useful). Then, given the query <math>[a,b]</math>, simply find <math>k</math> such that <math>[a,a+2^k)</math> and <math>(b-2^k,b]</math> overlap but are contained within <math>[a,b]</math>; then we already know the minima in each of these two sub-intervals, and since they cover the query interval, the smaller of the two is the overall minimum. It's not too hard to see that the desired <math>k</math> is <math>\lceil\log(b-a+1)\rceil</math>; and then the answer is <math>\min(M_{a,k}, M_{b-2^k+1,k})</math>.
  
 +
<!-- The TopCoder tutorial doesn't quite make sense here. How is the difference array used?
 
===Cartesian trees===
 
===Cartesian trees===
 
We can get the best of both worlds&mdash;that is, constant query time and linear preprocessing time and space&mdash;but the algorithm is somewhat more involved. It combines the block-based approach, the sparse table approach, and the use of [[Cartesian tree]]s.
 
We can get the best of both worlds&mdash;that is, constant query time and linear preprocessing time and space&mdash;but the algorithm is somewhat more involved. It combines the block-based approach, the sparse table approach, and the use of [[Cartesian tree]]s.
Line 35: Line 30:
 
''Proof'': An inorder traversal of the Cartesian tree gives the original array. Thus, consider the segment <math>S</math> of the inorder traversal beginning when <math>u</math> is visited and <math>v</math> is visited (<math>u</math> and <math>v</math> are as above); this must be equivalent to the segment of the array <math>A_{i..j}</math>. Now, if the LCA of <math>u</math> and <math>v</math> is <math>u</math>, then <math>v</math> is in its right subtree; but <math>S</math> must then contain only <math>u</math> and elements from <math>u</math>'s right subtree, since all nodes in <math>S</math>'s right subtree (including <math>v</math>) will be visited immediately after <math>u</math> itself; so that all nodes in <math>S</math> are descendants of <math>u</math>. Likewise, if <math>v</math> is the LCA, then <math>S</math> consists entirely of descendants of <math>v</math>. If the LCA is neither <math>u</math> nor <math>v</math>, then <math>u</math> must occur in its left subtree and <math>v</math> in its right subtree (because if they both occurred in the same subtree, then the root of that subtree would be a lower common ancestor, a contradiction). But all elements in the left subtree of the LCA, including <math>u</math> are visited immediately before the LCA, and all elements in the right, including <math>v</math>, are visited immediately after the LCA, and hence, again, all nodes in <math>S</math> are descendants of the LCA. Now, the labels on the nodes in <math>S</math> correspond to the elements <math>A_i, ..., A_j</math>, and all nodes in <math>S</math> have labels less than or equal to the label of the LCA, since Cartesian trees are min-heap-ordered; so it follows that the label of the LCA is the minimum element in the range. <math>_{\blacksquare}</math>
 
''Proof'': An inorder traversal of the Cartesian tree gives the original array. Thus, consider the segment <math>S</math> of the inorder traversal beginning when <math>u</math> is visited and <math>v</math> is visited (<math>u</math> and <math>v</math> are as above); this must be equivalent to the segment of the array <math>A_{i..j}</math>. Now, if the LCA of <math>u</math> and <math>v</math> is <math>u</math>, then <math>v</math> is in its right subtree; but <math>S</math> must then contain only <math>u</math> and elements from <math>u</math>'s right subtree, since all nodes in <math>S</math>'s right subtree (including <math>v</math>) will be visited immediately after <math>u</math> itself; so that all nodes in <math>S</math> are descendants of <math>u</math>. Likewise, if <math>v</math> is the LCA, then <math>S</math> consists entirely of descendants of <math>v</math>. If the LCA is neither <math>u</math> nor <math>v</math>, then <math>u</math> must occur in its left subtree and <math>v</math> in its right subtree (because if they both occurred in the same subtree, then the root of that subtree would be a lower common ancestor, a contradiction). But all elements in the left subtree of the LCA, including <math>u</math> are visited immediately before the LCA, and all elements in the right, including <math>v</math>, are visited immediately after the LCA, and hence, again, all nodes in <math>S</math> are descendants of the LCA. Now, the labels on the nodes in <math>S</math> correspond to the elements <math>A_i, ..., A_j</math>, and all nodes in <math>S</math> have labels less than or equal to the label of the LCA, since Cartesian trees are min-heap-ordered; so it follows that the label of the LCA is the minimum element in the range. <math>_{\blacksquare}</math>
  
Cartesian trees may be constructed in linear time and space, so we are within our <math>O(n)</math> preprocessing bound so far; we just need to solve the LCA problem on the Cartesian tree. We will do this, curiously enough, by reducing it back to RMQ (with linear preprocessing time) using the technique described in the [[Lowest common ancestor]] article. However, the array derived from the LCA-to-RMQ reduction (which we'll call <math>B</math>) has the property that any two adjacent elements differ by +1 or -1. We now focus on how to solve this restricted form of RMQ with linear preprocessing time and constant query time.
+
LCA queries may be answered in <math>O(1)</math> time with <math>O(n)</math> preprocessing. This involves reducing LCA back to RMQ using the technique described in the [[Lowest common ancestor]] article, but now the difference between adjacent elements of the array, which we'll call <math>B</math>, is either +1 or -1. We construct array <math>C</math> such that <math>C_i = B_{i+1} - B_i</math>, so that <math>C_i = \pm 1</math> for all <math>i</math>. Then, given some query <math>[i,j]</math> on <math>B</math>, we observe that <math>B_k - B_i = C_i + C_{i+1} + ... + C_{k-1}</math>, and therefore the smallest element in the range is the one at index <math>k</math> for which <math>C_i + ... + C_{k-1}</math> is minimal.
 
+
-->
First, divide <math>B</math> into blocks of size roughly <math>m = \frac{\log n}{2}</math>. Find the minimum element in each block and construct an array <math>C</math> such that <math>C_0</math> is the minimum in the leftmost block, <math>C_1</math> in the second-leftmost block, and so on. Construct a sparse table from array <math>C</math>. Now, if we are given a range consisting of any number of consecutive blocks in <math>B</math>, then the minimum of this range is given by the minimum of the corresponding minima in <math>C</math>---and, since we have constructed the sparse table of <math>C</math>, this query can be answered in constant time. Array <math>C</math> has size <math>n/m = \frac{n}{(\log n)/2} = \frac{2n}{\log n}</math>, so its sparse table is computed using time and space <math>O\left(\frac{n}{m} \log \frac{n}{m}\right) \subseteq O\left(\frac{n}{m} \log {2n}\right) = O\left(\frac{2n}{\log n} \log{2n}\right) = O(n)</math>.
+
 
+
Now consider an individual block of <math>B</math>. If we fix the first element of this block, then there are <math>2^{m-1} = O(2^m) = O(\sqrt{n})</math> possible combinations of values for the entire block, since with each following element we have a choice between two alternatives (it is either greater or less than the previous element by 1). Say that two blocks have the same ''kind'' if their corresponding elements differ by a constant (''e.g.'', [2,1,2,3,4] and [0,-1,0,1,2] are of the same kind). Hence there are only <math>O(\sqrt{n})</math> different kinds of blocks; and, furthermore, all blocks of a given kind have their minimum in the same position, regardless of the actual values of the elements. Therefore, for each kind of block, we will naively precompute all possible range minimum queries for that block. But each block has size <math>O\left(\frac{\log n}{2}\right)</math>, so the precomputation uses <math>O\left(\frac{\log^2 n}{4}\right)</math> space and time; and in total there are <math>O(\sqrt{n})</math> blocks, so in total this precomputation stage uses <math>O\left(\sqrt{n}\frac{\log^2 n}{4}\right) \subseteq O(n)</math> space and time. Finally, for each block in <math>B</math>, compute its kind, and store the result in another auxiliary array <math>D</math>. (This will use linear time and <math>O(n/m)</math> space.)
+
 
+
Now, we use the block-based approach to answering a query. We divide the interval given up into at most three subintervals. The first goes from the initial element to the last element of its block; the second spans zero or more complete blocks; and the third ends at the final element of the given interval and begins at the first element of its block. The minimum element is in one of these three subintervals. To find the minimum in the first subinterval, we look up the kind of block it is contained in (using array <math>D</math>), and then look up the precomputed position of the minimum in the desired range for that specific kind of block; and finally we look up the value at that position in <math>B</math>. We do something similar for the third subinterval. For the second subinterval, we use the sparse table lookup discussed two paragraphs above. The query is answered in constant time overall.
+
 
+
==Dynamic==
+
The block-based solution handles the dynamic case as well; we must simply remember, whenever we update an element, to recompute the minimum element in the block it is in. This gives <math>O(\sqrt{n})</math> time per update, and, assuming a uniform random distribution of updates, the expected update time is <math>O(1)</math>. This is because if we decrease an element, we need only check whether the new value is less than the current minimum (constant time), whereas if we increase an element, we only need to recompute the minimum if the element updated was the minimum before (which takes <math>O(\sqrt{n})</math> time but has a probability of occurring of only <math>O(1/m) = O(1/\sqrt{n})</math>). Unfortunately, the query still has average-case time <math>O(\sqrt{n})</math>.
+
 
+
The [[segment tree]] can be computed in linear time and allows both queries and updates to be answered in <math>O(\log n)</math> time. It also allows, with some cleverness, entire ranges to be updated at once (efficiently). Analysis of the average case is left as an exercise to the reader.
+
 
+
We can also use any balanced binary tree (or dictionary data structure such as a skip-list) and augment it to support range minimum query operations with O(log n) per Update (Insert/Delete) as well as Query.
+
  
 
==References==
 
==References==
 
Much of the information in this article was drawn from a single source:
 
Much of the information in this article was drawn from a single source:
 
<references/>
 
<references/>
 
[[Category:Pages needing code]]
 

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)