Editing Convex hull trick

The '''convex hull trick''' is a technique (perhaps best classified as a data structure) used to determine ''efficiently'', after preprocessing, which member of a set of linear functions in one variable attains an extremal value for a given value of the independent variable. It has little to do with [[convex hull]] algorithms.

==History==
The convex hull trick is perhaps best known in algorithm competitions from being required to obtain full marks in several USACO problems, such as [http://tjsct.wikidot.com/usaco-mar08-gold MAR08 "acquire"], which began to be featured in national olympiads after its debut in the IOI '02 task {{problem|ioi0221|Batch Scheduling}}, which itself credits the technique to a 1995 paper (see references). Very few online sources mention it, and almost none describe it. It has been suggested (van den Hooff, 2010) that this is because the technique is "obvious" to anybody who has learned the [[sweep line algorithm]] for the [[line segment intersection problem]].

==The problem==
<p>Suppose that a large set of linear functions in the form <math>y = m_i x + b_i</math> is given along with a large number of queries. Each query consists of a value of <math>x</math> and asks for the minimum value of <math>y</math> that can be obtained if we select one of the linear functions and evaluate it at <math>x</math>. For example, suppose our functions are <math>y=4</math>, <math>y=4/3+2/3\,x</math>, <math>y=12-3x</math>, and <math>y=3-1/2\,x</math> and we receive the query <math>x=1</math>. We have to identify which of these functions assumes the lowest <math>y</math>-value for <math>x=1</math>, or what that value is. (It is the function <math>y=4/3+2/3\,x</math>, assuming a value of 2.)</p>

The variation in which we seek the maximal line, not the minimal one, is a trivial modification and we will focus our attention on finding the minimal line.

<p>[[File:Convex_hull_trick1.png|200px|thumb|right|Graphical representation of the lines in this example]]
When the lines are graphed, this is easy to see: we want to determine, at the <math>x</math>-coordinate 1 (shown by the red vertical line), which line is "lowest" ( has the lowest <math>y</math>-coordinate). Here it is the heavy dashed line, <math>y=4/3+2/3\,x</math>.</p>

==Naive algorithm==
For each of the <math>Q</math> queries, of course, we may simply evaluate every one of the linear functions, and determine which one has the least value for the given <math>x</math>-value. If <math>M</math> lines are given along with <math>Q</math> queries, the complexity of this solution is <math>\mathcal{O}(MQ)</math>. The "trick" enables us to speed up the time for this computation to <math>\mathcal{O}((Q+M)\lg M)</math>, a significant improvement.

==Another example: APIO 2010 Commando==
You are given:
* A sequence of <math>N</math> '''positive''' integers (<math>1 \le N \le 10^6</math>).
* The integer coefficients of a quadratic function <math>f(x) = ax^2 + bx + c</math>, where <math>a < 0</math>.
The objective is to partition the sequence into contiguous subsequences such that the sum of <math>f</math> taken over all subsequences is maximized, where the value of a subsequence is the sum of its elements.

An <math>O(N^2)</math> dynamic programming approach is not hard to see.
We define:
# <code>sum(i,j) = x[i] + x[i+1] + ... + x[j]</code>
# <code>adjust(i,j) = a*sum(i,j)^2 + b*sum(i,j) + c</code>
Then let
:<math>\operatorname{dp}(n) = \max_{0 \le k < n} [\operatorname{dp}(k) + \operatorname{adjust}(k+1, n)]</math>.
Pseudocode:
<pre>
dp[0] &larr; 0
for n &isin; [1..N]
    for k &isin; [0..n-1]
        dp[n] &larr; max(dp[n], dp[k] + adjust(k + 1 , n))
</pre>
Now let's play around with the function "adjust".
 
Denote <math>\operatorname{sum}(1,x)</math> by <math>\sigma(x)</math>. Then, for some value of <math>k</math>, we can write
:<math>
\begin{array}{rl}
\operatorname{dp}(n) &= \operatorname{dp}(k) + a(\sigma(n)-\sigma(k))^2 + b(\sigma(n)-\sigma(k)) + c \\
&= \operatorname{dp}(k)+  a(\sigma(n)^2 - 2\sigma(n)\sigma(k) + \sigma(k)^2) + b(\sigma(n) - \sigma(k)) + c \\
&= (a\sigma(n)^2 + b\sigma(n) + c) + \operatorname{dp}(k) - 2a\sigma(n)\sigma(k) + a\sigma(k)^2 - b\sigma(k)
\end{array}
</math>
Now, let
# <math>z = \sigma(n)</math>
# <math>m = -2a\sigma(k)</math>
# <math>p = \operatorname{dp}(k) + a\sigma(k)^2 - b\sigma(k)</math>
Then, we see that <math>mz + p</math> is the quantity we aim to maximize by our choice of <math>k</math>. Adding <math>a\sigma(n)^2 + b\sigma(n) + c</math> (which is independent of <math>k</math>) to the maximum gives the correct value of <math>\operatorname{dp}(n)</math>. Also, <math>z</math> is independent of <math>k</math>, whereas <math>m</math> and <math>p</math> are independent of <math>n</math>, as required.
 
Unlike in task "acquire", we are interested in building the "upper envelope". We notice that the slope of the "maximal" line increases as <math>x</math> increases. Since the problem statement indicates <math>a < 0</math>, the slope of each line is positive. 

Suppose <math>k_2 > k_1</math>. For <math>k_1</math> we have slope <math>m_1 = -2a \sigma(k_1)</math>. For <math>k_2</math> we have slope <math>m_2 = -2a \sigma(k_2)</math>. But <math>\sigma(k_2) > \sigma(k_1)</math>, 
and since the given sequence is positive, so <math>m_2 > m_1</math>. We conclude that lines are added to our data structure in increasing order of slope.
 
Since <math>\sigma(n) > \sigma(n-1)</math>, query values are given in increasing order and a [[pointer walk]] suffices (it is not necessary to use binary search.)

==Fully dynamic variant==
<p>The convex hull trick is easy to implement when all insertions are given before all queries (offline version) or when each new line inserted has a lower slope than any line currently in the envelope. Indeed, by using a deque, we can easily allow insertion of lines with higher slope than any other line as well. However, in some applications, we might have no guarantee of either condition holding. That is, each new line to be added may have any slope whatsoever, and the insertions may be interspersed with queries, so that sorting the lines by slope ahead of time is impossible, and scanning through an array to find the lines to be removed could take linear time per insertion</p>

<p>It turns out, however, that it is possible to support arbitrary insertions in amortized logarithmic time. To do this, we store the lines in an ordered dynamic set (such as C++'s <code>std::set</code>). Each line possesses the attributes of slope and y-intercept, the former being the key, as well as an extra <math>left</math> field, the minimum <math>x</math>-coordinate at which this line is the lowest in the set. To insert a new line, we merely insert it into its correct position in the set, and then all the lines to be removed, if any, are contiguous with it. The procedure is then largely the same as for the case in which we always inserted lines of minimal slope: if the line to be added is <math>l_3</math>, the line to the left is <math>l_2</math>, and the line to the left of that is <math>l_1</math>, then we check if the <math>l_1</math>-<math>l_3</math> intersection is to the left of the <math>l_1</math>-<math>l_2</math> intersection; if so, <math>l_2</math> is discarded and we repeat; similarly, if lines <math>l_4</math> and <math>l_5</math> are on the right, then <math>l_4</math> can be removed if the <math>l_3</math>-<math>l_5</math> intersection is to the left of the <math>l_3</math>-<math>l_4</math> intersection, and this too is performed repeatedly until no more lines are to be discarded. We compute the new <math>left</math> values (for <math>l_3</math>, it is the <math>l_2</math>-<math>l_3</math> intersection, and for <math>l_4</math>, it is the <math>l_3</math>-<math>l_4</math> intersection).</p>

<p>To handle queries, we keep ''another'' set, storing the same data but this time ordered by the <math>left</math> value. When we insert or remove lines from that set (or update, in the case of <math>l_4</math> above), we use the <math>left</math> value from the element in that set to associate it with an element in this one. Whenever a query is made, therefore, all we have to do is to find the greatest <math>left</math> value in this set that is less than the query value; the corresponding line is the optimal one .</p>

==References==
* Brucker, P. (1995). Efficient algorithms for some path partitioning problems. ''Discrete Applied Mathematics'', 62(1-3), 77-85.
* Christiano, Paul. (2007). Land acquisition. Retrieved from an archived copy of the competition problem set at [http://tjsct.wikidot.com/usaco-mar08-gold http://tjsct.wikidot.com/usaco-mar08-gold]
* Peng, Richard. (2008). USACO MAR08 problem 'acquire' analysis. Retrieved from [http://ace.delos.com/TESTDATA/MAR08.acquire.htm http://ace.delos.com/TESTDATA/MAR08.acquire.htm]
* van den Hooff, Jelle. (2010). Personal communication.
* Wang, Hanson. (2010). Personal communication.

==Code==
* [[Convex hull trick/acquire.cpp|C++ solution for "acquire"]]
* [[Convex hull trick/commando.cpp|C++ solution for "commando"]]

==External links==
* [http://www.apio.olympiad.org/2010/ APIO 2010]

[[Category:Algorithms]]
[[Category:Data structures]]
@@ Line 14: / Line 14: @@
 ==Naive algorithm==
 For each of the <math>Q</math> queries, of course, we may simply evaluate every one of the linear functions, and determine which one has the least value for the given <math>x</math>-value. If <math>M</math> lines are given along with <math>Q</math> queries, the complexity of this solution is <math>\mathcal{O}(MQ)</math>. The "trick" enables us to speed up the time for this computation to <math>\mathcal{O}((Q+M)\lg M)</math>, a significant improvement.
-==The technique==
-<p>Consider the diagram above. Notice that the line <math>y=4</math> will ''never'' be the lowest one, regardless of the <math>x</math>-value. Of the remaining three lines, ''each one is the minimum in a single contiguous interval'' (possibly having plus or minus infinity as one bound). That is, the heavy dotted line is the best line at all <math>x</math>-values left of its intersection with the heavy solid line; the heavy solid line is the best line between that intersection and its intersection with the light solid line; and the light solid line is the best line at all <math>x</math>-values greater than that. Notice also that, as <math>x</math> increases, the slope of the minimal line decreases: 2/3, -1/2, -3. Indeed, it is not difficult to see that this is ''always'' true.</p>
-<p>Thus, if we remove "irrelevant" lines such as <math>y=4</math> in this example (the lines which will never give the minimum <math>y</math>-coordinate, regardless of the query value) and sort the remaining lines by slope, we obtain a collection of <math>N</math> intervals (where <math>N</math> is the number of lines remaining), in each of which one of the lines is the minimal one. If we can determine the endpoints of these intervals, it becomes a simple matter to use [[binary search]] to answer each query.</p>
-===Etymology===
-The term ''convex hull'' is sometimes misused to mean ''upper/lower envelope''. If we consider the "optimal" segment of each of the three lines (ignoring <math>y=4</math>), what we see is the ''lower envelope'' of the lines: nothing more or less than the set of points obtained by choosing the lowest point for every possible <math>x</math>-coordinate. (The lower envelope is highlighted in green in the diagram above.) Notice that the set bounded above by the lower envelope is convex. We could imagine the lower envelope being the upper convex hull of some set of points, and thus the name ''convex hull trick'' arises. (Notice that the problem we are trying to solve can again be reformulated as finding the intersection of a given vertical line with the lower envelope.)
-===Adding a line===
-<p>As we have seen, if the set of relevant lines has already been determined and sorted, it becomes trivial to answer any query in <math>\mathcal{O}(\lg N)</math> time ''via'' binary search. Thus, if we can add lines one at a time to our data structure, recalculating this information quickly with each addition, we have a workable algorithm: start with no lines at all (or one, or two, depending on implementation details) and add lines one by one until all lines have been added and our data structure is complete.</p>
-<p>Suppose that we are able to process all of the lines before needing to answer any of the queries. Then, we can sort them in descending order by slope beforehand, and merely add them one by one. When adding a new line, some lines may have to be removed because they are no longer relevant. If we imagine the lines to lie on a [[stack]], in which the most recently added line is at the top, as we add each new line, we consider if the line on the top of the stack is relevant anymore; if it still is, we push our new line and proceed. If it is not, we pop it off and repeat this procedure until either the top line is not worthy of being discarded or there is only one line left (the one on the bottom, which can never be removed).</p>
-<p>How, then, can we determine if the line should be popped from the stack? Suppose <math>l_1</math>, <math>l_2</math>, and <math>l_3</math> are the second line from the top, the line at the top, and the line to be added, respectively. Then, <math>l_2</math> becomes irrelevant if and only if the intersection point of <math>l_1</math> and <math>l_3</math> is to the left of the intersection of <math>l_1</math> and <math>l_2</math>. (This makes sense because it means that the interval in which <math>l_3</math> is minimal subsumes that in which <math>l_2</math> was previously.) We have assumed for the sake of simplicity that no three lines are concurrent.</p>
-===Analysis===
-<p>Clearly, the space required is <math>\mathcal{O}(M)</math>: we need only store the sorted list of lines, each of which is defined by two real numbers.</p>
-<p>The time required to sort all of the lines by slope is <math>\mathcal{O}(M \lg M)</math>. When iterating through them, adding them to the envelope one by one, we notice that every line is pushed onto our "stack" exactly once and that each line can be popped at most once. For this reason, the time required overall is <math>\mathcal{O}(M)</math> for this step; although each individual line, when being added, can theoretically take up to linear time if almost all of the already-added lines must now be removed, the total time is limited by the total number of times a line can be removed. The cost of sorting dominates, and the construction time is <math>\mathcal{O}(M \lg M).</math></p>
-==Case study: USACO MAR08 acquire==
-===Problem===
-<p>Up to 50000 rectangles of possibly differing dimensions are given and it is desired to acquire all of them. To acquire a subset of these rectangles incurs a cost that is proportional to neither the size of the subset nor to its total area but rather the product of the width of the widest rectangle in the subset times the height of the tallest rectangle in the subset. We wish to cleverly partition the rectangles into groups so that the total cost is minimized; what is the total cost if we do so? Rectangles may not be rotated; that is, we may not interchange the length and width of a rectangle.</p>
-===Observation 1: Irrelevant rectangles===
-Suppose that both of rectangle A's dimensions equal or exceed the corresponding dimensions of rectangle B. Then, we may remove rectangle B from the input because its presence cannot affect the answer, because we can merely compute an optimal solution without it and then insert it into whichever subset contains rectangle A without changing the cost. Rectangle B, then, is irrelevant. We first sort the rectangles in ascending order of height and then sweep through them in linear time to remove irrelevant rectangles. What remains is a list of rectangles in which height is monotonically increasing and width is monotonically decreasing. (Otherwise, a contradiction would exist to our assumption that all irrelevant rectangles have been removed.)
-===Observation 2: Contiguity===
-In the sorted list of remaining rectangles, ''each subset to be acquired is contiguous''. Thus, for example, if there are four rectangles, numbered 1, 2, 3, 4 according to their order in the sorted list, it is possible for the optimal partition to be <math>\{\{1,2\},\{3,4\}\}</math> but not <math>\{\{1,4\},\{2,3\}\}</math>; in the latter, <math>\{2,3\}</math> is contiguous but <math>\{1,4\}</math> is not. This is true because, in any subset, if rectangle A is the tallest and rectangle B is the widest, then all rectangles between them in the sorted list may be added to this subset without any additional cost, and therefore we will always assume that this occurs, making each subset contiguous.
-===DP solution===
-The remaining problem then is how to divide up the list of rectangles into contiguous subsets while minimizing the total cost. This problem admits a <math>\Theta(N^2)</math> solution by [[dynamic programming]], the pseudocode for which is shown below: ''Note that it is assumed that the list of rectangles comes "cooked"; that is, irrelevant rectangles have been removed and the remaining rectangles sorted.''
-<pre>
-input N
-for i ∈ [1..N]
-     input rect[i].h
-     input rect[i].w
-let cost[0] = 0
-for i ∈ [1..N]
-     let cost[i] = ∞
-     for j ∈ [0..i-1]
-         cost[i] = min(cost[i],cost[j]+rect[i].h*rect[j+1].w)
-print cost[N]
-</pre>
-In the above solution, <code>cost[k]</code> stores the minimum possible total cost for acquiring the first <code>k</code> rectangles. Obviously, <code>cost[0]=0</code>. To compute <code>cost[i]</code> when <code>i</code> is not zero, we notice that it is equal to the total cost of acquiring all previous subsets plus the total cost of acquiring the subset containing rectangle number <code>i</code>; the latter may be readily calculated if the size of the latter subset is known, because it is merely the width of the first times the height of the last (rectangle number <code>i</code>). We wish to minimize this, hence <code>cost[i] = min(cost[i],cost[j]+rect[i].h*rect[j+1].w).</code> (Notice that <code>j</code> is the last rectangle of the previous subset, looping over all possible choices.)
-Unfortunately, <math>\Theta(N^2)</math> is too slow when <math>N=50000</math>, so a better solution is needed.
-===Observation 3: Convexity===
-Let <math>m_j=\mathrm{rect}[j+1].w</math>, <math>b_j=\mathrm{cost}[j]</math>, and <math>x=\mathrm{rect}[i].h</math>. Then, it is clear that the inner loop in the above DP solution is actually trying to minimize the function <math>y=m_j x + b_j</math> by choosing <math>j</math> appropriately. That is, it is trying to solve exactly the problem discussed in this article. Thus, assuming we have implemented the lower envelope data structure discussed in this article, the improved code looks as follows:
-<pre>
-input  N
-for i ∈ [1..N]
-     input rect[i].h
-     input rect[i].w
-let E = empty lower envelope structure
-let cost[0] = 0
-add the line y=mx+b to E, where m=rect[1].w and b=cost[0] //b is zero
-for i ∈ [1..N]
-     cost[i] = E.query(rect[i].h)
-     if i<N
-          E.add(m=rect[i+1].w,b=cost[i])
-print cost[N]
-</pre>
-Notice that the lines are already being given in descending order of slope, so that each line is added "at the right"; this is because we already sorted them by width. The query step can be performed in logarithmic time, as discussed, and the addition step in amortized constant time, giving a <math>\Theta(N\lg N)</math> solution. We can modify our data structure slightly to take advantage of the fact that query values are non-decreasing (that is, no query occurs further left than its predecessor, so that no line chosen has a greater slope than the previous one chosen), and replace the binary search with a [[pointer walk]], reducing query time to amortized constant as well and giving a <math>\Theta(N)</math> solution for the DP step. The overall complexity, however, is still <math>O(N\lg N)</math>, due to the sorting step.
 ==Another example: APIO 2010 Commando==