Difference between revisions of "Size Balanced Tree"

From PEGWiki
Jump to: navigation, search
(Added maintenance method)
(Added pseudocode for maintain section)
Line 100: Line 100:
 
:Now that we have satisfied the precondition of making <math>R</math>'s subtrees SBTs, we may call <code>maintain</code> on <math>R</math> itself.
 
:Now that we have satisfied the precondition of making <math>R</math>'s subtrees SBTs, we may call <code>maintain</code> on <math>R</math> itself.
  
*'''Case 3''': <math>size(T.right) < size\left(T.left.left\right)</math>
+
*'''Case 3''': <math>size(T.right) < size\left(T.left.right\right)</math>
:Symmetrical to case 2.
+
*'''Case 4''': <math>size(T.right) < size\left(T.left.right\right)</math>
+
 
:Symmetrical to case 1.
 
:Symmetrical to case 1.
 +
*'''Case 4''': <math>size(T.right) < size\left(T.left.left\right)</math>
 +
:Symmetrical to case 2.
 +
 +
 +
With this casework being taken care of, it becomes straightforward to actually implement <code>maintain</code>.
 +
<pre>
 +
def maintain(t):
 +
 +
    if t.left.size < t.right.left.size:        //case 1
 +
        right-rotate(t.right)
 +
        left-rotate(t)
 +
        maintain(t.left)
 +
        maintain(t.right)
 +
        maintain(t)
 +
 +
    else if t.left.size < t.right.right.size:  //case 2
 +
        left-rotate(t)
 +
        maintain(t.left)
 +
        maintain(t)
 +
 +
    else if t.right.size < t.left.right.size:  //case 1'
 +
        left-rotate(t.left)
 +
        right-rotate(t)
 +
        maintain(t.left)
 +
        maintain(t.right)
 +
        maintain(t)
 +
 +
    else if t.right.size < t.left.left.size:    //case 2'
 +
        right-rotate(t)
 +
        maintain(t.right)
 +
        maintain(t)
 +
</pre>
 +
 +
 +
This pseudocode is slightly slow and redundant. Since we know that the two SBT properties will ''usually'' be satisfied, the following is an optimization.
 +
Simply add an extra boolean flag to the <code>maintain</code> parameters, indicating whether cases 1/2 or their symmetrical cases are being examined.
 +
If the flag is TRUE, then we examine cases 1 and 2, otherwise we examine cases 3 and 4. Doing so will eliminate many unnecessary comparisons.
 +
 +
<pre>
 +
def maintain(t, flag):
 +
   
 +
    if flag:
 +
        if t.left.size < t.right.left.size:        //case 1
 +
            right-rotate(t.right)
 +
            left-rotate(t)
 +
        else if t.left.size < t.right.right.size:  //case 2
 +
            left-rotate(t)
 +
        else:
 +
            done
 +
    else:
 +
        if t.right.size < t.left.right.size:      //case 1'
 +
            left-rotate(t.left)
 +
            right-rotate(t)
 +
        else if t.right.size < t.left.left.size:  //case 2'
 +
            right-rotate(t)
 +
        else:
 +
            done
 +
 +
    maintain(t.left, FALSE)    //maintain the left subtree
 +
    maintain(t.right, TRUE)    //maintain the right subtree
 +
    maintain(t, TRUE)          //maintain the whole tree
 +
    maintain(t, FALSE)          //maintain the whole tree
 +
</pre>
 +
 +
The proof for why <code>maintain(t.left, TRUE)</code> and <code>maintain(t.right, FALSE)</code> are unnecessary can be found in section 6 of Chen's paper. Furthermore, the running time of <code>maintain</code> is O(1) amortized (which means that you do not have to worry about it not terminating).

Revision as of 07:50, 20 August 2014

A size balanced tree (SBT) is a self-balancing binary search tree first published by Chinese student Qifeng Chen in 2007. The tree is rebalanced by examining the sizes of each node's subtrees. Its abbreviation resulted in many nicknames given by Chinese informatics competitors, including "sha bi" tree (Chinese: 傻屄树; pinyin: shǎ bī shù; literally meaning "dumb cunt tree") and "super BT", which is a homophone to the Chinese term for snot (Chinese: 鼻涕; pinyin: bítì) suggesting that it is messy to implement. Contrary to what its nicknames suggest, this data structure can be very useful, and is also known to be easy to implement. Since the only extra piece of information that needs to be stored is sizes of the nodes (instead of other "useless" fields such as randomized weights in treaps or colours in red–black tress), this makes it very convenient to implement the select-by-rank and get-rank operations in dynamic order statistics problems. It supports standard binary search tree operations such as insertion, deletion, and searching in O(log n) time. According to Chen's paper, "this is the fastest known advanced binary search tree to date."

Properties

The size balanced tree examines each node's size (i.e. the number of nodes in the subtree rooted at that node) to determine when rotations should be performed. Each node T in the tree satisfies the following two properties:

  1. size(T.left) \ge size(T.right.left), size(T.right.right)
  2. size(T.right) \ge size(T.left.left), size(T.left.right)

In other words, each child node of T is not smaller in size than the child nodes of its sibling. Clearly, we should consider the sizes of nonexistent children and siblings to be 0.

Consider the following example where T is the node in question, L, R are its child nodes, and A, B, C, D are subtrees which also satisfy the above SBT properties on their own.

       T
      / \
     /   \
    L     R
   / \   / \
  A   B C   D

Then, the node T must satisfy:

  • size(L) \ge size(C), size(D)
  • size(R) \ge size(A), size(B)

Rotations

The rotations of SBTs are analogous to those in other self-balancing binary search trees.

  -------------    Right Rotation     ------------
  |    Q      |   --------------->    |     P    |
  |   / \     |                       |    / \   |
  -- P   C    |                       |   A   Q --
    / \    <---     Left Rotation     --->   / \  
   A   B          <---------------          B   C 

Left Rotation

left-rotate(t):
    k ← t.right
    t.right ← k.left
    k.left ← t
    k.size ← t.size
    t.size ← t.left.size + t.right.size + 1
    t ← k

Right Rotation

right-rotate(t):
    k ← t.left
    t.left ← k.right
    k.right ← t
    k.size ← t.size
    t.size ← t.left.size + t.right.size + 1
    t ← k

Maintenance

After insertions and deletions, the new sizes of subtrees may violate the two properties above. Thus, we define a procedure maintain(T) to rebalance the SBT rooted at the node T. This should be called with the precondition that T's children are already SBTs themselves. Since property 1 and 2 are symmetrical, we will only discuss property 1.

There are 4 cases to consider when rebalancing.

  • Case 1: size(T.left) < size\left(T.right.left\right)
Perhaps after inserting a value to T.right, the scenario below (figure 1) may occur, leading to size(L) < size\left(C\right).
To fix this, we first perform a right-rotate on T.right (figure 2) and then a left-rotate on T (figure 3).
    Fig. 1:                Fig. 2:                   Fig. 3:    
  insert(R,v)          right-rotate(R)            left-rotate(T)

       T                      T                         C       
      / \                    / \                       / \      
     /   \                  /   \                     /   \     
    L     R                L     C                   T     R    
   / \   / \              / \   / \                 / \   / \   
  A   B C   D            A   B E   R               L   E F   D  
       / \                        / \             / \           
      E   F                      F   D           A   B          
After these operations, the properties of the entire tree in figure 3 becomes unpredictable. Luckily, the subtrees A, B, D, E, F, L are still SBTs. Thus, we can recursively call maintain on subtrees R and T to take care of them.
Now that all of the subtrees are SBTs, we still have to make sure that the root node C satisfies the SBT properties. So, we call maintain one last time on root node C.
  • Case 2: size(T.left) < size\left(T.right.right\right)
Perhaps after inserting a value to T.right, the scenario below (figure 4) may occur, leading to size(L) < size\left(D\right). This is similar to case 1, except that instead of going below C, E and F instead goes below D. We can omit them from the diagram.
Fixing this, we will perform a left-rotate on the root node T, obtaining the structure in figure 5.
    Fig. 4:                Fig. 5:   
  insert(R,v)           left-rotate(T)
       T                       R           
      / \                     / \          
     /   \                   /   \         
    L     R                 T     D        
   / \   / \               / \             
  A   B C   D             L   C            
                         / \               
                        A   B              
After this, the tree rooted at R is still not yet a SBT because size(C) < size\left(A\right) or size(C) < size\left(B\right) may be true. So, we continue to call maintain on T.
Now that we have satisfied the precondition of making R's subtrees SBTs, we may call maintain on R itself.
  • Case 3: size(T.right) < size\left(T.left.right\right)
Symmetrical to case 1.
  • Case 4: size(T.right) < size\left(T.left.left\right)
Symmetrical to case 2.


With this casework being taken care of, it becomes straightforward to actually implement maintain.

def maintain(t):

    if t.left.size < t.right.left.size:         //case 1
        right-rotate(t.right)
        left-rotate(t)
        maintain(t.left)
        maintain(t.right)
        maintain(t)

    else if t.left.size < t.right.right.size:   //case 2
        left-rotate(t)
        maintain(t.left)
        maintain(t)

    else if t.right.size < t.left.right.size:   //case 1'
        left-rotate(t.left)
        right-rotate(t)
        maintain(t.left)
        maintain(t.right)
        maintain(t)

    else if t.right.size < t.left.left.size:    //case 2'
        right-rotate(t)
        maintain(t.right)
        maintain(t)


This pseudocode is slightly slow and redundant. Since we know that the two SBT properties will usually be satisfied, the following is an optimization. Simply add an extra boolean flag to the maintain parameters, indicating whether cases 1/2 or their symmetrical cases are being examined. If the flag is TRUE, then we examine cases 1 and 2, otherwise we examine cases 3 and 4. Doing so will eliminate many unnecessary comparisons.

def maintain(t, flag):
    
    if flag:
        if t.left.size < t.right.left.size:        //case 1
            right-rotate(t.right)
            left-rotate(t)
        else if t.left.size < t.right.right.size:  //case 2
            left-rotate(t)
        else:
            done
    else:
        if t.right.size < t.left.right.size:       //case 1'
            left-rotate(t.left)
            right-rotate(t)
        else if t.right.size < t.left.left.size:   //case 2'
            right-rotate(t)
        else:
            done

    maintain(t.left, FALSE)     //maintain the left subtree
    maintain(t.right, TRUE)     //maintain the right subtree
    maintain(t, TRUE)           //maintain the whole tree
    maintain(t, FALSE)          //maintain the whole tree

The proof for why maintain(t.left, TRUE) and maintain(t.right, FALSE) are unnecessary can be found in section 6 of Chen's paper. Furthermore, the running time of maintain is O(1) amortized (which means that you do not have to worry about it not terminating).