# Difference between revisions of "Prefix sum array and difference array"

(Created page with "Given an array of numbers, we can construct a new array by replacing each element by the difference between itself and the previous element, except for the first element, whi...") |
|||

Line 13: | Line 13: | ||

==Use of prefix sum array== | ==Use of prefix sum array== | ||

− | The Fundamental Theorem of Calculus also has an analogue, which is why the prefix sum array is so useful. If we assume both the original array and its prefix sum array are zero-indexed, for example, <math>A = [9, 2, 6, 3, 1, 5, 0, 7]</math> where <math>A_0 = 9</math> and <math>P = [0, 9, 11, 17, 20, 21, 26, 26, 33]</math> (a possible prefix sum array of <math>A</math>, where we have arbitrarily chosen 0 as the initial element), where <math>P_0 = 0</math>, then we can add up any range of elements in <math>A</math> by taking the difference of ''two'' elements in <math>P</math>. In particular, <math>\Sigma_{[i,j)}A</math>, the notation we will use for <math>A_i + A_{i+1} + ... + A_{j-1}</math>, is equal to <math>P_j - P_i</math>. (Note the use of the [[half-open interval]].) For example, <math>A_2 + A_3 + A_4 + A_5 = 6 + 3 + 1 + 5 = 15 = 26 - 11 = P_6 - P_2</math>. This property holds no matter what we choose as the initial element of <math>P</math>, because it will always cancel out of the calculation, just as we can use any antiderivative of a function to compute an integral (which is like summing a function across an entire interval) by taking the difference between the antiderivative's values at the endpoints, because the constant of integration will always cancel out. This makes the prefix sum array a useful item to [[precompute]]; once we have computed it, which only takes linear time, we can sum any range of the original array in ''constant time'', simply by taking the difference of two values in the prefix sum array. | + | The Fundamental Theorem of Calculus also has an analogue, which is why the prefix sum array is so useful. If we assume both the original array and its prefix sum array are zero-indexed (or both one-indexed), for example, <math>A = [9, 2, 6, 3, 1, 5, 0, 7]</math> where <math>A_0 = 9</math> and <math>P = [0, 9, 11, 17, 20, 21, 26, 26, 33]</math> (a possible prefix sum array of <math>A</math>, where we have arbitrarily chosen 0 as the initial element), where <math>P_0 = 0</math>, then we can add up any range of elements in <math>A</math> by taking the difference of ''two'' elements in <math>P</math>. In particular, <math>\Sigma_{[i,j)}A</math>, the notation we will use for <math>A_i + A_{i+1} + ... + A_{j-1}</math>, is equal to <math>P_j - P_i</math>. (Note the use of the [[half-open interval]].) For example, <math>A_2 + A_3 + A_4 + A_5 = 6 + 3 + 1 + 5 = 15 = 26 - 11 = P_6 - P_2</math>. This property holds no matter what we choose as the initial element of <math>P</math>, because it will always cancel out of the calculation, just as we can use any antiderivative of a function to compute an integral (which is like summing a function across an entire interval) by taking the difference between the antiderivative's values at the endpoints, because the constant of integration will always cancel out. This makes the prefix sum array a useful item to [[precompute]]; once we have computed it, which only takes linear time, we can sum any range of the original array in ''constant time'', simply by taking the difference of two values in the prefix sum array. |

− | < | + | ===Example: Partial Sums (SPOJ)=== |

+ | Computing the prefix sum array is rarely the most difficult part of a problem. Instead, the prefix sum array is kept on hand because the algorithm to solve the problem makes frequent reference to range sums. | ||

+ | |||

+ | For example, consider the problem {{SPOJ|PARTSUM|Partial Sums}}. Here we are given an array <math>a</math> of up to 10<sup>5</sup> integers, a modulus <math>P</math> and a remainder <math>K</math>. The problem asks us to determine the minimal <math>S \geq K</math> such that there exists a nonempty slice of <math>a</math> whose sum is congruent to <math>S\pmod{P}</math>. | ||

+ | |||

+ | To solve this, we will first transform array <math>a</math> into its prefix sum array <math>p</math>. Then, the problem reduces to finding the minimal <math>S \geq K</math> such that there exist <math>i,j</math> with <math>p_j - p_i \equiv S\pmod{P}</math>, since the difference between a pair of elements of <math>p</math> is the sum of a range in <math>a</math>. (We do not allow <math>i = j</math> because then the corresponding range from <math>a</math> would be empty.) | ||

+ | |||

+ | This can be solved by iterating through the prefix sum array from left to right, trying out every possible value of <math>j</math>. We assume that all values <math>p_0, p_1, ..., p_{j-1}</math> have already been added to an [[ordered dynamic set]]. If we intend to find <math>p_j - p_i = K</math>, then we must have <math>p_i = p_j - K</math>. If we intend to find <math>p_j - p_i = K+1</math>, then we must have <math>p_i = p_j - K - 1</math>; and so on up to <math>p_i = p_j - P + 1</math>; so we want to identify the first element in the sequence <math>[p_j-K, p_j-K-1, p_j-K-2, ..., p_j-P+1]</math> which is congruent to something already in the set. (We can do this using set operations; the details are left as an exercise to the reader.) Doing so will allow us to compute the optimal <math>S_j \geq K</math>; the smallest modular range sum at least <math>K</math> that ends at <math>p_j</math> (''i.e.'', ends at <math>a_{j-1}</math>). After we've done this, we add <math>p_j</math> to our set, and move on to the next iteration. Finally, <math>\min(S_1, ..., S_N)</math> is the answer. Overall time is <math>O(N \log N)</math>. |

## Revision as of 04:52, 16 December 2011

Given an array of numbers, we can construct a new array by replacing each element by the difference between itself and the previous element, except for the first element, which we simply ignore. This is called the **difference array**, because it contains the first differences of the original array. For example, the difference array of is , or . Evidently, the difference array can be computed in linear time from the original array.

We can run this process in reverse: that is, given a difference array and the initial element of the original array, we can reconstruct the original array. To do this, we simply scan from left to right, and accumulate as we go along, giving . The reader should perform this calculation and confirm that this is indeed equal to . This reverse process can also be done in linear time, as each element in is the sum of the corresponding element in and the *previous* element in . We say that is a **prefix sum array** of . However, it is not the *only* prefix sum array of , because knowing alone does not give us enough information to reconstruct ; we also needed to know the initial element of . If we assumed that this were -3, for example, instead of 9, we would obtain . is the difference array of both and . Both and are prefix sum arrays of .

## Analogy with calculus

These two processes—computing the difference array, and computing a prefix sum array—are the discrete equivalents of differentiation and integration in calculus, which operate on continuous domains:

- Differentiation and integration are reverses. Computing the difference array and computing a prefix sum array are reverses.
- A function can only have one derivative, but an infinite number of antiderivatives. An array has only one difference array, but an infinite number of prefix sum arrays.
- However, if we know the value of a function over an interval of the real line and we are given some real number, we can find one unique antiderivative of this function which attains this real value at the left end of the interval. Likewise, if we are given an array, and we are told the initial element of one of its prefix sum arrays, we can reconstruct the entire prefix sum array.
- When we take the difference array, we shorten the array by one element, since we destroy information about what the initial element was. On the other hand, when we take a prefix sum array, we lengthen the array by one element, since we introduce information about what the initial element is to be. Likewise, when we take the derivative of a function on a closed interval, we obtain (open interval), and we destroy the information of what was; conversely, if we are given (open interval), and the value of is also told to us, we obtain its antiderivative (closed interval).
- All possible prefix sum arrays differ only by an offset from each other, in the sense that one of them can be obtained by adding a constant to each entry of the other. For example, the two prefix sum arrays shown for above differ by 12 at all positions. Likewise, all possible antiderivatives differ from each other only by a constant (the difference between their constants of integration).

Because of these similarities, we will speak simply of *differentiating* and *integrating* arrays. An array can be differentiated multiple times, but eventually it will shrink to length 0. An array can be integrated any number of times. Differentiating times and integrating times are still reverse processes.

## Use of prefix sum array

The Fundamental Theorem of Calculus also has an analogue, which is why the prefix sum array is so useful. If we assume both the original array and its prefix sum array are zero-indexed (or both one-indexed), for example, where and (a possible prefix sum array of , where we have arbitrarily chosen 0 as the initial element), where , then we can add up any range of elements in by taking the difference of *two* elements in . In particular, , the notation we will use for , is equal to . (Note the use of the half-open interval.) For example, . This property holds no matter what we choose as the initial element of , because it will always cancel out of the calculation, just as we can use any antiderivative of a function to compute an integral (which is like summing a function across an entire interval) by taking the difference between the antiderivative's values at the endpoints, because the constant of integration will always cancel out. This makes the prefix sum array a useful item to precompute; once we have computed it, which only takes linear time, we can sum any range of the original array in *constant time*, simply by taking the difference of two values in the prefix sum array.

### Example: Partial Sums (SPOJ)

Computing the prefix sum array is rarely the most difficult part of a problem. Instead, the prefix sum array is kept on hand because the algorithm to solve the problem makes frequent reference to range sums.

For example, consider the problem Partial Sums. Here we are given an array of up to 10^{5} integers, a modulus and a remainder . The problem asks us to determine the minimal such that there exists a nonempty slice of whose sum is congruent to .

To solve this, we will first transform array into its prefix sum array . Then, the problem reduces to finding the minimal such that there exist with , since the difference between a pair of elements of is the sum of a range in . (We do not allow because then the corresponding range from would be empty.)

This can be solved by iterating through the prefix sum array from left to right, trying out every possible value of . We assume that all values have already been added to an ordered dynamic set. If we intend to find , then we must have . If we intend to find , then we must have ; and so on up to ; so we want to identify the first element in the sequence which is congruent to something already in the set. (We can do this using set operations; the details are left as an exercise to the reader.) Doing so will allow us to compute the optimal ; the smallest modular range sum at least that ends at (*i.e.*, ends at ). After we've done this, we add to our set, and move on to the next iteration. Finally, is the answer. Overall time is .