All nearest smaller values

From PEGWiki
Revision as of 03:07, 21 November 2011 by Brian (Talk | contribs) (Created page with "In a list of numbers <math>A</math>, the ''nearest smaller value'' (NSV) to an entry <math>A_i</math> is the last entry preceding it with a smaller value, if any; that is, the en...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

In a list of numbers A, the nearest smaller value (NSV) to an entry A_i is the last entry preceding it with a smaller value, if any; that is, the entry A_j with greatest j < i such that A_j < A_i. The all nearest smaller values (ANSV) problem is that of computing the NSV for each number in the list.

Algorithm

It is obvious that, upon seeing the list for the first time, we cannot guarantee being able to find the NSV to a given entry in less than linear time, because it could be any of the preceding entries in the list, and thus, in the worst case, we must examine all of them. If we do a linear scan on each element, we get a trivial O(n^2) time solution to ANSV.

However, it turns out that there is a simple linear time (O(n)) solution to ANSV, which is clearly asymptotically optimal, since the size of the output is \Theta(n) anyway. To understand this algorithm, we make a series of observations and refinements:

  1. If k < j < i, then it is only possible for A_k to be a NSV for A_i if A_k < A_j; otherwise A_j will always be both smaller and "nearer".
  2. Thus, if we scan the list from left to right, and compute the NSV for each element as we visit it, and we encounter A_j such that A_j \leq A_k for some A_k already seen, then we can pretend that A_k doesn't exist anymore.
  3. Given this fact, the only entries we've seen so far that we need to keep track of are the ones that are less than all following entries seen so far.
  4. This means that the list of "retained" entries is monotonically increasing.
  5. Once we get to A_i, we can compute its NSV by looking backward though the list of retained previous entries, selecting the first one we find that is less than A_i. Also, all the retained entries skipped along the way—the ones that are greater than or equal to A_i—can now be discarded.

In these five points the essence of the algorithm is contained. We can use a stack to store the list of previously seen values that are being retained, because we only add to and remove from it at the end. As an implementation detail, in step 5, it is useful to have a sentinel, so that we don't try to pop off an empty stack (when A_i has no preceding smaller value).

Pseudocode:

input array A[1..n]
let A[0] ← -∞
let S be an empty stack
push(S,0)
for i ∈ [1..n]
    while A[top(S)] ≥ A[i]
        pop(S)
    let NSV[i] ← A[top(S)]

After this code has completed, NSV[i] will contain the index of the NSV of A_i (not the NSV itself).

Applications

A theoretically important application of ASNV is the construction of the Cartesian tree for a given list of numbers; a node's parent in a Cartesian tree is either its NSV or its NSV in the reversed array (that is, the nearest smaller value on the right rather than on the left), whichever one is larger. (Proof is given in the other article.)

Running ANSV twice, once on an array and once on its reverse, can be used to find, for each element A_i, the range [j,k] such that A_m \geq A_i for each m \in [j,k]. Geometrically, if the array is taken to represent a histogram in which A_i is the height of the ith bar, then this range gives how large a rectangle can be that just touches the top of the ith bar while staying contained within the histogram. This can then trivially be used to find the largest rectangle contained entirely within the histogram.

This gives an easy linear-time solution to the problem of finding the largest empty axis-aligned rectangle in a rectangular grid (in which each square is either completely full or completely empty): sweep from top to bottom; keep track of how far up one can go along each column until encountering an obstacle, starting from the current row; and find the largest rectangle contained within the corresponding histogram constructed at each row.

Problems

The articles in this category may describe algorithms and data structures in a way that is hard to understand, and would be improved by the addition of pseudocode or a clear and concise implementation in a mainstream programming language.