Cartesian tree

A sequence and its corresponding Cartesian tree

Given a sequence of numbers (or any totally ordered objects), there exists a binary min-heap whose inorder traversal is that sequence. This is known as the Cartesian tree for that sequence.

Construction

To see what such a tree would look like, imagine starting with the example sequence shown on the right. Because 1 is the smallest element in the sequence, it must occur at the root of the Cartesian tree (which is min-heap-ordered). Now, in an inorder traversal of the Cartesian tree, all the elements in the root's left subtree will be visited before the root, and all the elements in the right subtree after; so we can readily conclude that the elements 9, 3, and 7 occur in the left subtree, whereas 8, 12, 10, 20, 15, 18, and 5 occur in the right subtree.

To actually construct the left subtree, we apply this reasoning recursively; the left subtree of the root is itself min-heap-ordered, so its root must be the element 3, as it is the smallest of 9, 3, and 7. Likewise, the root of the right subtree must be 5, which is the smallest of all the elements to the right of the 1; and so on.

It is clear from this discussion that there always exists at least one valid Cartesian tree for any given sequence.

Uniqueness

Based on the technique given above to construct a Cartesian tree, and the underlying reasoning, we see that the Cartesian tree of a sequence of distinct numbers is always unique. If there are duplicates in the sequence, there may be multiple possible Cartesian trees; for example, the sequence [5, 5] has two possible Cartesian trees (one in which the root has a left child, and one in which it has a right child). On the other hand, [5, 2, 5] has only one possible Cartesian tree, even though it has duplicate elements. The precise criterion is that the Cartesian tree is unique if and only if every segment of the sequence has a unique minimum element.

We can guarantee uniqueness even with duplicates by insisting that if an element appears multiple times in a sequence then the occurrence furthest to the left is always considered to be the smallest, or by finding any other way to break ties.

Naive method

The naive method described above may take up to $O(N^2)$ time, since it requires scanning the sequence to find its minimum, then scanning the range to the left of the minimum, and scanning the range to the right, and so on down; this algorithm then has the same performance dynamics as quicksort (average $O(N \log N)$ ).

Left-to-right

The Cartesian tree may be constructed on one pass over the sequence. We proceed from left to right and add each value we encounter as a new node to the tree; after having processed the first $i$ elements of the sequence, we will have a valid Cartesian tree for these initial elements. After we have processed the entire sequence ( $i = N$ ), the Cartesian tree will be complete.

Thus, we begin by constructing a singleton tree containing only the first element $a_1$ in the sequence; this is a valid Cartesian tree for the segment of the sequence containing only the first element. Then, we repeatedly examine the next element $a_{i+1}$ . At each step, we keep track of $v_i$ , that is, the node that has label $a_i$ (the last value inserted):

If $a_{i+1} > a_i$ , then we just insert a new node $v_{i+1}$ as the right child of $v_i$ , and label this new node with $a_{i+1}$ .
Otherwise, we consider node $P(v_i)$ , the parent of $v_i$ . If the label of $P(v_i)$ is less than or equal to $a_{i+1}$ , we stop. Otherwise, we look to its parent, $P(P(v_i))$ , and see whether its label is less than or equal to $a_{i+1}$ and so on. If we stop at node $u$ , with right child $w$ , then we insert the new node $v_{i+1}$ so that it is the right child of $u$ , and make $w$ the left child of $v_{i+1}$ .
If we trace all the way up to the root of the tree so far, and still find no node with label less than or equal to $a_{i+1}$ , then $a_{i+1}$ is the smallest element seen so far; so make the new node $v_{i+1}$ the root of the tree, and make the old root $v_{i+1}$ 's left child.

Pseudocode

input sequence A with length N indexed from 1
// label the root node 1, with left child = right child = 0 (nonexistent)
root ← 1
left[1] ← 0
right[1] ← 0
// the root's label is the first item in the sequence
label[1] ← A[1]
for i ∈ [2..N]
    last ← i-1               // node of A[i-1]
    // create new node with label A[i]
    label[i] ← A[i]
    right[i] ← 0
    while label[last] > A[i] and last ≠ root
        last ← parent[last]
    if label[last] > A[i]    // A[i] is the smallest element yet; make it new root
        parent[root] ← i
        left[i] ← root
        root ← i
    else if right[last] = 0       // just insert it
        right[last] ← i
        parent[i] ← last
        left[i] ← 0
    else                     // reconfigure links
        parent[right[last]] ← i
        left[i] ← right[last]
        right[last] ← i
        parent[i] ← last

Proof of correctness

We define our invariant is as follows: After the loop in the above algorithm has run $m$ times, we will have a valid Cartesian tree for the first $m+1$ elements of A.

This is true just before the loop runs for the first time ( $m = 0$ ), because we will have a singleton tree that contains just the first element of $A$ , and clearly this is a valid Cartesian tree for just that first element.

Now assume the loop has completed $m$ iterations, but has not yet begun iteration $m+1$ . Then, we have a valid Cartesian tree of the first $m+1$ elements of $A$ , and we wish to insert $A_{m+2}$ . We first observe the following:

The last node inserted, which we will call $v$ , has no right child. This is because $v$ must come last in the inorder traversal of the tree so far, as that tree is a valid Cartesian tree of all the elements added so far; but if $v$ has a right child, then $\mathrm{right}[v]$ will follow $v$ in the inorder traversal.
$v$ is the right child of $\mathrm{parent}[v]$ , if the latter exists. $\mathrm{parent}[v]$ is furthermore the right child of $\mathrm{parent}[\mathrm{parent}[v]]$ , and so on all the way up to the root. If this were not the case, then $v$ would lie in the left subtree of some node $u$ , and therefore $u$ would follow $v$ in the inorder traversal, a contradiction.

Then consider the three cases covered in the code:

If $A_{m+2}$ is less than the label of the tree's root, then after we make it the new root (with the old root as its left child), $A_{m+2}$ will occur at the very end of the inorder traversal, since everything in the left subtree will be visited before it; but the order in which the previously inserted elements are visited will be unchanged. Also, since $A_{m+2}$ is the smallest element in the tree, and it goes at the root, the tree maintains its min-heap-ordering property. Hence the invariant is preserved.
If $A_{m+2} \geq A_{m+1}$ , then after we have inserted a new node as the right child of $v$ , this node will follow $v$ in the inorder traversal, but the order in which all other elements are visited will be unchanged, so the effect is to append the new node to the end of the inorder traversal of the old tree. The new tree will also maintain its min=heap-ordering, since the new node inserted is greater than or equal to its parent. Therefore, the invariant is preserved.
If we find some proper ancestor $u$ of the most recently inserted node, whose label is less than or equal to $A_{m+2}$ , but whose right child's label is greater than $A_{m+2}$ , then $u$ is visited immediately before all the elements in its right subtree, and those elements occur at the very end of the inorder traversal (since $u$ is descended from the root of the tree along only right edges); furthermore all the elements in $u$ 's right subtree are greater than or equal to the label of $\mathrm{right}[u]$ . Now then, when we insert a new node as $u$ 's right child, and make the original right child of $u$ the left child of the new root, we see that all the elements formerly in $u$ 's right subtree remain in $u$ 's right subtree, that they are visited after $u$ in the inorder traversal and all consecutively, and that the new node is visited at the very end; we conclude that the inorder traversal of the new tree is just the inorder traversal of the old tree with the new node visited at the very end. Also, since $A_{m+2}$ is greater than or equal to $u$ 's label, the min-heap-ordering is preserved at $u$ ; and since it is less than the label of $u$ 's original right child, the min-heap-ordering is preserved here as well; so finally we conclude that the invariant is preserved. $_{\blacksquare}$

Analysis of running time

Observe that the loop in the algorithm given above performs at most a constant number of operations per iteration, plus some variable number of operations possibly spent ascending the tree looking for a place to insert the new element. Now define the active point of the tree to be the node pointed to by the last variable. Every time we ascend the tree by one node, we decrease the active point's depth by one. Then, either Case 1, Case 2, or Case 3 happens. In Case 1, the active point's depth is left unchanged, as it goes from being the old tree's root to being the new tree's root. In Cases 2 and 3, the active point's depth increases by one, since the new node inserted is one level deeper than its parent (the previous active point), and this new node becomes the new active point. So in total the depth of the active node cannot increase more than $N-1$ times, once for each iteration of the loop. This means that it cannot decrease more than $N-1$ times in total, either, because it starts at 0 and must finish as a non-negative integer. Hence the total number of times we ascend the tree is $O(N)$ across all iterations of the loop, and all the other loop operations take $O(N)$ time total, so the construction is completed in $O(N)$ time. (This analysis also shows that each of the iterations of the loop uses amortized constant time.)

Using all nearest smaller values

Another construction method relies on finding all nearest smaller values of the sequence, which can be done in linear time. That is, for each element $A_i$ in the sequence, we find the maximum $j < i$ such that $A_j \leq A_i$ ; this is the nearest smaller value on the left (LNSV) . We also find the minimum $k > i$ such that $A_k < A_i$ , and call this the nearest smaller value on the right (RNSV).

Theorem: Consider the unique Cartesian tree of $A$ obtained by considering the first of equal elements to be the smallest. (In such a tree, if node with label $x$ has a left child with label $l$ , then $l < x$ , and if it has a right child with label $r$ , then $r \leq x$ .) Then, given an element $A_i$ of the sequence, excepting that it is the root of the tree, it will be either the LNSV's right child or the RNSV's left child, as determined by the following rules:

If the LNSV and RNSV both exist, the larger of the two is the parent of $A_i$ in the tree. In case of a tie, the RNSV wins.
If only the LSNV exists, it is $A_i$ 's parent. If only the RNSV exists, it is $A_i$ 's parent.
If neither the LNSV nor the RNSV exists, $A_i$ is the root.

Proof: Suppose that node labelled $A_i$ is the right child of its parent. Then its parent must be labelled $A_j$ where $j < i$ , because $A_i$ occurs after $A_j$ in the inorder traversal. Also, $A_i \geq A_j$ . Now suppose there exists $k$ such that $j < k < i$ and $A_i \geq A_k$ (that is, $A_j$ is not the LNSV to $A_i$ ). Then, $A_k$ must occur after $A_j$ in the inorder traversal, but before $A_i$ . This is only possible if $A_k$ occurs in the left subtree of the node labelled $A_i$ . However, this implies that $A_k > A_i$ , a contradiction. A similar analysis shows that if $A_i$ 's parent occurs after $A_i$ in the sequence $A$ , it must be the RNSV of $A_i$ . Now, pursuant to the three cases in the statement of the Theorem:

Since the LNSV and RNSV exist, node labelled $A_i$ is not the root. Therefore this node has a parent. The foregoing analysis shows that if node labelled $A_i$ is the right child of its parent, then the parent is labelled with the LSNV, and if it is the left child, the parent is labelled with the RNSV. Suppose the former is true and $A_i$ 's parent is its LNSV. Then the RNSV follows $A_i$ and $A_j$ in the inorder traversal. However, it cannot be in the right subtree of node labelled $A_i$ , since its value is strictly less than $A_i$ . It follows that it cannot be anywhere in the right subtree of $A_j$ either (or anywhere in the subtree rooted at $A_j$ ). Now let $u$ be the lowest common ancestor of the nodes labelled with $A_j$ and the RNSV. If $u$ is the node labelled with the RNSV, then node labelled $A_j$ is in the RNSV's left subtree, which means $A_j$ (the LNSV) is strictly greater than the RNSV, as desired. If not, then since the RNSV occurs after $A_j$ in the inorder traversal, it must be that the RNSV is in $u$ 's right subtree and node labelled $A_i$ is in $u$ 's left subtree. But then $u$ occurs after $A_i$ in the inorder traversal and before the RNSV, and $u$ 's label is strictly less than $A_i$ , which contradicts the "nearestness" of the supposed RNSV (so this case can never occur at all). A similar analysis shows that if node labelled $A_i$ is the left child of its parent, then the parent is labelled with the RNSV and the RNSV is greater than or equal to the LNSV, as desired.
If only the LSNV exists, then node labelled $A_i$ still cannot be the root, but all the elements following $A_i$ are greater than or equal to it, so node labelled $A_i$ cannot be the left child of any node. Therefore it is the right child of its parent, and therefore its parent is the LNSV. Likewise, if only the RNSV exists, we see that node labelled $A_i$ is the left child of its parent, and thus the parent must be the RNSV.
If neither the LSNV nor the RNSV exists, then $A_i$ is the leftmost occurrence of the minimum element in the sequence, and hence must be at the root of the tree. $_{\blacksquare}$

Using the preceding Theorem, after computing all nearest smaller values on both sides of each element (which requires linear time), we can determine each node's parent in constant time, so we obtain overall linear time construction.

Applications

A range minimum query on a sequence is equivalent to a lowest common ancestor query on the sequence's Cartesian tree. Hence, RMQ may be reduced to LCA using the sequence's Cartesian tree.
The treap, a balanced binary search tree structure, is a Cartesian tree of (key,priority) pairs; it is heap-ordered according to the priority values, and an inorder traversal gives the keys in sorted order.
The suffix tree of a string may be constructed from the suffix array and the longest common prefix array. The first step is to compute the Cartesian tree of the longest common prefix array. Details in the Suffix tree article.

Cartesian tree

Contents

Construction

Uniqueness

Naive method

Left-to-right

Pseudocode

Proof of correctness

Analysis of running time

Using all nearest smaller values

Applications

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools