Editing Prefix sum array and difference array

Jump to: navigation, search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 67: Line 67:
 
To solve this, we will first transform array <math>S</math> into its prefix sum array <math>P(0,S)</math>. Notice that the sum of each contiguous subsequence <math>S_i + S_{i+1} + S_{i+2} + ... + S_{j-1}</math> corresponds to the difference of two elements of <math>P</math>, that is, <math>P_j - P_i</math>. So what we want to find is the number of pairs <math>(i,j)</math> with <math>P_j - P_i = 47</math> and <math>i < j</math>. (Note that if <math>i > j</math>, we will instead get a subsequence with sum -47.)
 
To solve this, we will first transform array <math>S</math> into its prefix sum array <math>P(0,S)</math>. Notice that the sum of each contiguous subsequence <math>S_i + S_{i+1} + S_{i+2} + ... + S_{j-1}</math> corresponds to the difference of two elements of <math>P</math>, that is, <math>P_j - P_i</math>. So what we want to find is the number of pairs <math>(i,j)</math> with <math>P_j - P_i = 47</math> and <math>i < j</math>. (Note that if <math>i > j</math>, we will instead get a subsequence with sum -47.)
  
However, this is quite easy to do. We sweep through <math>P</math> from left to right, keeping a [[map]] of all elements of <math>P</math> we've seen so far, along with their frequencies; and for each element <math>P_j</math> we count the number of times <math>P_j - 47</math> has appeared so far, by looking up that value in our map; this tells us how many contiguous subsequences ending at <math>S_{j-1}</math> have sum 47. And finally, adding the number of contiguous subsequences with sum 47 ending at each entry of <math>S</math> gives the total number of such subsequences in the array. Total time taken is <math>O(N)</math>, if we use a [[hash table]] implementation of the map.
+
However, this is quite easy to do. We sweep through <math>P</math> from left to right, keeping a [[map]] of all elements of <math>P</math> we've seen so far, along with their frequencies; and for each element <math>P_j</math> we count the number of times <math>P_j - 48</math> has appeared so far, by looking up that value in our map; this tells us how many contiguous subsequences ending at <math>S_{j-1}</math> have sum 47. And finally, adding the number of contiguous subsequences with sum 47 ending at each entry of <math>S</math> gives the total number of such subsequences in the array. Total time taken is <math>O(N)</math>, if we use a [[hash table]] implementation of the map.
  
 
==Use of difference array==
 
==Use of difference array==
The difference array is used to keep track of an array when ranges of said array can be updated all at once. If we have array <math>A</math> and add an increment <math>k</math> to elements <math>A_i, A_{i+1}, ..., A_{j-1}</math>, then notice that <math>D_0, D_1, ..., D_{i-2}</math> are not affected; that <math>D_{i-1} = A_i - A_{i-1}</math> is increased by <math>k</math>; that <math>D_i, D_{i+1}, ..., D_{j-2}</math> are not affected; that <math>D_{j-1} = A_j - A_{j-1}</math> is decreased by <math>k</math>; and that <math>D_j, D_{j+1}, ...</math> are unaffected. Thus, if we are required to update many ranges of an array in this manner, we should keep track of <math>D</math> rather than <math>A</math> itself, and then integrate at the end to reconstruct <math>A</math>.
+
The difference array is used to keep track of an array when ranges of said array can be updated all at once. If we have array <math>A</math> and add an increment <math>k</math> to elements <math>A_i, A_{i+1}, ..., A_{j-1}</math>, then notice that <math>D_0, D_1, ..., D_{i-2}</math> are not affected; that <math>D_{i-1} = A_i - A{i-1}</math> is increased by <math>k</math>; that <math>D_i, D_{i+1}, ..., D_{j-2}</math> are not affected; that <math>D_{j-1} = A_j - A_{j-1}</math> is decreased by <math>k</math>; and that <math>D_j, D_{j+1}, ...</math> are unaffected. Thus, if we are required to update many ranges of an array in this manner, we should keep track of <math>D</math> rather than <math>A</math> itself, and then integrate at the end to reconstruct <math>A</math>.
  
 
===Example: Wireless (CCC)===
 
===Example: Wireless (CCC)===
Line 110: Line 110:
 
After the prefix sum array has been computed, we can use it to add together any rectangle in <math>A</math>, that is, all the elements with their indices in <math>[x_1,x_2), [y_1,y_2)</math>. To do so, we first observe that <math>P_{x_2,y_2} - P_{x_1,y_2}</math> gives the sum of all the elements with indices in the box <math>[x_1, x_2) \times [0, y_2)</math>. Then we similarly observe that <math>P_{x_2,y_1} - P_{x_1,y_1}</math> corresponds to the box <math>[x_1, x_2) \times [0, y_1)</math>. Subtracting gives the box <math>[x_1, x_2) \times [y_1, y_2)</math>. This gives a final formula of <math>P_{x_2,y_2} - P_{x_1,y_2} - P_{x_2,y_1} + P_{x_1,y_1}</math>.
 
After the prefix sum array has been computed, we can use it to add together any rectangle in <math>A</math>, that is, all the elements with their indices in <math>[x_1,x_2), [y_1,y_2)</math>. To do so, we first observe that <math>P_{x_2,y_2} - P_{x_1,y_2}</math> gives the sum of all the elements with indices in the box <math>[x_1, x_2) \times [0, y_2)</math>. Then we similarly observe that <math>P_{x_2,y_1} - P_{x_1,y_1}</math> corresponds to the box <math>[x_1, x_2) \times [0, y_1)</math>. Subtracting gives the box <math>[x_1, x_2) \times [y_1, y_2)</math>. This gives a final formula of <math>P_{x_2,y_2} - P_{x_1,y_2} - P_{x_2,y_1} + P_{x_1,y_1}</math>.
  
In the <math>n</math>-dimensional case, to sum the elements of <math>A</math> with indices in the box <math>[x_{1,0}, x_{1,1}) \times [x_{2,0}, x_{2,1}) \times ... \times [x_{n,0}, x_{n,1})</math>, we will use the formula <math>\sum_{k_1=0}^1 \sum_{k_2=0}^1 ... \sum_{k_n=0}^1 (-1)^{n - k_1 - k_2 - ... - k_n} P_{x_{1,k_1}, x_{2,k_2}, ..., x_{n,k_n}}</math>. We will not state the proof, instead noting that it is a form of the inclusion&ndash;exclusion principle.
+
In the <math>n</math>-dimensional case, to sum the elements of <math>A</math> with indices in the box <math>[x_{1,0}, x_{1,1}) \times [x_{2,0}, x_{2,1}) \times ... \times [x_{n,0}, x_{n,1})</math>, we will use the formula <math>(-1)^n \sum_{k_1=0}^1 \sum_{k_2=0}^1 ... \sum_{k_n=0}^1 P_{x_{1,k_1}, x_{2,k_2}, ..., x_{n,k_n}}</math>. We will not state the proof, instead noting that it is a form of the inclusion&ndash;exclusion principle.
  
 
====Example: Diamonds (BOI)====
 
====Example: Diamonds (BOI)====
Line 119: Line 119:
  
 
We will simply define the difference array <math>D</math> to be the array whose prefix sum array is <math>A</math>. This means in particular that <math>D_{i,j}</math> is the sum of elements of <math>D</math> with indices in the box <math>[i,i+1) \times [j,j+1)</math> (this box contains only the single element <math>D_{i,j}</math>). Using the prefix sum array <math>A</math>, we obtain <math>D_{i,j} = A_{i+1,j+1} - A_{i,j+1} - A_{i+1,j} + A_{i,j}</math>. In general:
 
We will simply define the difference array <math>D</math> to be the array whose prefix sum array is <math>A</math>. This means in particular that <math>D_{i,j}</math> is the sum of elements of <math>D</math> with indices in the box <math>[i,i+1) \times [j,j+1)</math> (this box contains only the single element <math>D_{i,j}</math>). Using the prefix sum array <math>A</math>, we obtain <math>D_{i,j} = A_{i+1,j+1} - A_{i,j+1} - A_{i+1,j} + A_{i,j}</math>. In general:
:<math>D_{i_1, i_2, ..., i_n} = \sum_{k_1=0}^1 \sum_{k_2=0}^1 ... \sum_{k_n=0}^1 (-1)^{n - k_1 - k_2 - ... - k_n} A_{i_1+k_1,i_2+k_2,...,i_n+k_n}</math>.
+
:<math>D_{i_1, i_2, ..., i_n} = (-1)^n \sum_{k_1=0}^1 \sum_{k_2=0}^1 ... \sum_{k_n=0}^1 A_{i_1+k_1,i_2+k_2,...,i_n+k_n}</math>.
  
 
Should we actually need to ''compute'' the difference array, on the other hand, the easiest way to do so is by reversing the computation of the prefix sum array:
 
Should we actually need to ''compute'' the difference array, on the other hand, the easiest way to do so is by reversing the computation of the prefix sum array:
Line 142: Line 142:
 
Now we consider what happens when all elements of <math>A</math> with coordinates in the box given by <math>[r_1, r_2) \times [c_1, c_2)</math> are incremented by <math>k</math>. If we take the difference array of each column of <math>A</math> now, as in the function <code>D</code> defined above, we see that for each column, we will have to add <math>k</math> to entry <math>r_1-1</math> and subtract it from <math>r_2-1</math> (as in the one-dimensional case). Now if we take the difference array of each row of what we've just obtained, we notice that in row number <math>r_1-1</math>, we've added <math>k</math> to every element in columns in <math>[c_1,c_2)</math> in the previous step, and in row number <math>r_2-1</math>, we've subtracted <math>k</math> to every element in the same column range, so in the end the effect is to add <math>k</math> to elements <math>D_{r_1-1,c_1-1}</math> and <math>D_{r_2-1,c_2-1}</math>, and to subtract <math>k</math> from elements <math>D_{r_2-1,c_1-1}</math> and <math>D_{r_1-1,c_2-1}</math>.
 
Now we consider what happens when all elements of <math>A</math> with coordinates in the box given by <math>[r_1, r_2) \times [c_1, c_2)</math> are incremented by <math>k</math>. If we take the difference array of each column of <math>A</math> now, as in the function <code>D</code> defined above, we see that for each column, we will have to add <math>k</math> to entry <math>r_1-1</math> and subtract it from <math>r_2-1</math> (as in the one-dimensional case). Now if we take the difference array of each row of what we've just obtained, we notice that in row number <math>r_1-1</math>, we've added <math>k</math> to every element in columns in <math>[c_1,c_2)</math> in the previous step, and in row number <math>r_2-1</math>, we've subtracted <math>k</math> to every element in the same column range, so in the end the effect is to add <math>k</math> to elements <math>D_{r_1-1,c_1-1}</math> and <math>D_{r_2-1,c_2-1}</math>, and to subtract <math>k</math> from elements <math>D_{r_2-1,c_1-1}</math> and <math>D_{r_1-1,c_2-1}</math>.
  
In the general case, when adding <math>k</math> to all elements of <math>A</math> with indices in the box <math>[x_{1,0}, x_{1,1}) \times [x_{2,0}, x_{2,1}) \times ... \times [x_{n,0}, x_{n,1})</math>, a total of <math>2^n</math> elements of <math>D</math> need to be updated. In particular, element <math>D_{x_{1,j_1}-1, x_{2,j_2}-1, ..., x_{n,j_n}-1}</math> (where each of the <math>j</math>'s can be either 0 or 1, giving <math>2^n</math> possibilities in total) is incremented by <math>(-1)^{j_1+j_2+...+j_n}k</math>. That is, if we consider an <math>n</math>-dimensional array to be an <math>n</math>-dimensional hypercube of numbers, then the elements to be updated lie on the corners of an <math>n</math>-dimensional hypercube; we 2-color the vertices black and white (meaning two adjacent vertices always have opposite colours), with the lowest corner (corresponding to indices <math>[x_{1,0}-1, x_{2,0}-1, ..., x_{n,0}-1]</math>) white; and each white vertex receiving <math>+k</math> and each black vertex <math>-k</math>. One can attempt to visualize the effect this has on the prefix sum array in three dimensions, and become convinced that it makes sense in <math>n</math> dimensions. Each element in the prefix sum array <math>A_{i_1, i_2, ..., i_n}</math> is the sum of all the elements in some box of the difference array with its lowest corner at the origin, <math>[0, i_1) \times [0, i_2) \times ... \times [0, i_n)</math>. If the highest corner actually lies within the hypercube, that is, <math>x_{1,0} \leq i_1 < x_{1,1}, x_{2,0} \leq i_2 < x_{2,1}, ..., x_{n,0} \leq i_n < x_{n,1}</math>, then this box is only going to contain the low corner <math>D_{x_{1,0}-1, x_{2,0}-1, ..., x_{n,0}-1}</math>, which has increased by <math>k</math>; thus, this entry in <math>A</math> has increased by <math>k</math> as well, and this is true of all elements that lie within the hypercube, <math>[x_{1,0},x_{1,1}) \times [x_{2,0},x_{2,1}) \times ... \times [x_{n,0},x_{n,1})</math>. If any of the <math>i</math>'s are less than the lower bound of the corresponding <math>x</math>, then our box doesn't hit any of the vertices of the hypercube at all, so all these elements are unaffected; and if instead any one of them goes over the upper bound, then our box passes in through one hyperface and out through another, which means that corresponding vertices on the low and high face will either both be hit or both not be hit, and each pair cancels itself out, giving again no change outside the hypercube.
+
In the general case, when adding <math>k</math> to all elements of <math>A</math> with indices in the box <math>[x_{1,0}, x_{1,1}) \times [x_{2,0}, x_{2,1}) \times ... \times [x_{n,0}, x_{n,1})</math>, a total of <math>2^n</math> elements of <math>D</math> need to be updated. In particular, element <math>D_{x_{1,j_1}-1, x_{2,j_2}-1, ..., x_{n,j_n}-1}</math> (where each of the <math>j</math>'s can be either 0 or 1, giving <math>2^n</math> possibilities in total) is incremented by <math>(-1)^{j_1+j_2+...+j_n}k</math>. That is, if we consider an <math>n</math>-dimensional array to be an <math>n</math>-dimensional hypercube of numbers, then the elements to be updated lie on the corners of an <math>n</math>-dimensional hypercube; we 2-color the vertices black and white (meaning two adjacent vertices always have opposite colours), with the lowest corner (corresponding to indices <math>[x_{1,0}-1, x_{2,0}-1, ..., x_{n,0}-1]</math>) white; and each white vertex receiving <math>+k</math> and each black vertex <math>-k</math>. One can attempt to visualize the effect this has on the prefix sum array in three dimensions, and become convinced that it makes sense in <math>n</math> dimensions. Each element in the prefix sum array <math>A_{i_1, i_2, ..., i_n}</math> is the sum of all the elements in some box of the difference array with its lowest corner at the origin, <math>[0, i_1) \times [0, i_2) \times ... \times [0, i_n)</math>. If the highest corner actually lies within the hypercube, that is, <math>x_{1,0} \leq i_1 < x_{1,1}, x_{2,0} \leq i_2 < x_{2,1}, ..., x_{n,0} \leq i_n < x_{n,1}</math>, then this box is only going to contain the low corner <math>D_{x_{1,0}-1, x_{2,0}-1, ..., x_{n,0}-1}</math>, which has increased by <math>k</math>; thus, this entry in <math>A</math> has increased by <math>k</math> as well, and this is true of all elements that lie within the hypercube, <math>[x_{1,0},x_{1,1}) \times [x_{2,0},x_{2,1}) \times ... \times [x_{n,0},x_{n,1}]</math>. If any of the <math>i</math>'s are less than the lower bound of the corresponding <math>x</math>, then our box doesn't hit any of the vertices of the hypercube at all, so all these elements are unaffected; and if instead any one of them goes over the upper bound, then our box passes in through one hyperface and out through another, which means that corresponding vertices on the low and high face will either both be hit or both not be hit, and each pair cancels itself out, giving again no change outside the hypercube.
  
 
====Example: The Cake is a Dessert====
 
====Example: The Cake is a Dessert====
{{Problem|cake|The Cake is a Dessert}} from the Waterloo&ndash;Woburn 2011 Mock CCC is a straightforward application of the difference array in two dimensions.
+
{{Problem|cake|The Cake is a Dessert}} from the Woburn&ndash;Waterloo 2011 Mock CCC is a straightforward application of the difference array in two dimensions.
 
+
==A word on the dynamic case==
+
The dynamic version of the problem that the prefix sum array is intended to solve requires us to be able to carry out an intermixed sequence of operations, where each operation either changes an element of <math>A</math> or asks us to determine the sum <math>A_0 + A_1 + ... + A_{i-1}</math> for some <math>i</math>. That is, we need to be able to compute entries of the prefix sum array of an array that is changing (dynamic). If we can do this, then we can do range sum queries easily on a dynamic array, simply by taking the difference of prefix sums as in the static case. This is a bit trickier to do, but is most easily and efficiently accomplished using the [[binary indexed tree]] data structure, which is specifically designed to solve this problem and can be updated in <math>O(\log(m_1) \log(m_2) ... \log(m_n))</math> time, where <math>n</math> is the number of dimensions and each <math>m</math> is a dimension. Each prefix sum query also takes that amount of time, but we need to perform <math>2^n</math> of these to find a box sum using the formula given previously, so the running time of the query is asymptotically <math>2^n</math> times greater than that of the update. In the most commonly encountered one-dimensional case, both query and update are simply <math>O(\log m)</math>.
+
 
+
The dynamic version of the problem that the difference array is intended to solve is analogous: we want to be able to increment entire ranges of an array, but at any point we also want to be able to determine any element of the array. Here, we simply use a binary indexed tree for the difference array, and compute prefix sums whenever we want to access an element. Here, it is the update that is slower by a factor of <math>2^n</math> than the query (because we have to update <math>2^n</math> entries at a time, but only query one.)
+
 
+
On the other hand, if we combine the two, that is, we wish to be able to increment ranges of some array <math>A</math> as well as query prefix sums of <math>A</math> in an intermixed sequence, matters become more complicated. Notice that in the static case we would just track the difference array of <math>A</math> and then integrate twice at the end. In the dynamic case, however, we need to use a [[segment tree]] with lazy propagation. This will give us <math>O(2^n \log(m_1) \log(m_2) ... \log(m_n))</math> time on both the query and the update, but is somewhat more complex and slower than the binary indexed tree on the [[invisible constant factor]].
+
 
+
 
[[Category:Pages needing diagrams]]
 
[[Category:Pages needing diagrams]]

Please note that all contributions to PEGWiki are considered to be released under the Attribution 3.0 Unported (see PEGWiki:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel | Editing help (opens in new window)