Wednesday, October 10, 2012

BST: Binary Search Tree

Definition: A binary tree is... umm... well, lets ditch formal definition (refer wikipedia for that). A binary tree represents a data-structure as this image:

The elements are called nodes. Each node can have at most two children. The top element is called root-node or just root. A binary search tree (BST) is one that has elements in its left subtree less than itself; and elements in its right subtree is higher than it[1]. If you remember quick sort earlier, any node in a BST is basically a pivot. Expected height[2] of a randomly build binary search tree with n elemements is lg(n) where lg is log2.

Most operations in a BST is simple and recursive. I will list the algorithms here.
Node
  value
  left
  right
  parent
  
insert(root, node)

  // use two pointer, current to the place where new node will be inserted, 
  // and parent holds it's parent reference
  val = node.val
  current = root
  parent = root
  
  // move in tree following BST rule, searching  the place where this node
  // should be. This loop ends at a leaf, giving the future parent to the node
  while current != NIL
    if val < parent.value 
      current = parent.left
    else
      current = parent.right
    
    parent = current
  
  // root was null, no tree was pre-exiting
  if parent = NIL
    node.parent 
    
  // plug in the node to correct side of the parent
  if val < parent.value
    parent.left = node
  else
    parent.right = node
    
  //assign the parent
  node.parent = parent
  
  return root
  
//iterative
search(root, value)
  current = root
  found = false
  result = NIL    
  
  while found = false and current != NIL
    if current.value = value
      result = current
      found = true
    else if value < current.value
      current = current.left
    else
      current = current.right
      
  return result

//recursive
search(root, value)
  if root = NIL or root.value = value
    return root
    
  if value < root.value
    return search(root.left, value)
  else
    return search(root.right, value)

//min is left most node, right? Think about it.
findMin(root)
  if root = NIL
    error("Nil BST?")
  min = root
  while min.left != NIL
    min = min.left
    
  return min
  
//max, same as min but in right subtree
findMax(root)
  if root = NIL
    error("Nil BST?")
  max = root
  while max.right != NIL
    max = max.right
    
  return max
  
//in order traversal, prints sorted node values
inOrderTraversal(root)
  if root = NIL
    return
  
  inOrderTraversal(root.left)
  print root.value
  inOrderTraversal(root.right)
So, these were the easier ones. With a little effort one can understand these. I will now discuss, some of the harder ones.

Successor and Predecessor: A successor of a node is a node which comes the next if we sort the tree in increasing order. A predecessor of a node is the node that comes immediately before it, if we sort the tree in increasing order. That is it. In a tree containing values 1 3 5 6 7 9, the successor and predecessor of 5 are 6 and 3, respectively.

To  find successor of a node we need to find immediate larger node. So, here is the algorithm:
  1. The right subtree of the node will contain the values larger than the node. (right?) If we find the lowest node in right subtree we will have the immediate larger node, and we get our successor.
  2. What if the node does not have a right subtree? We will have to go up in the tree somewhere. See the image below.
    1. If the node is left child, the parent will the immediate next.
    2. If the node is right child, we need to keep moving up until we get a node whose left child is one of the node's parents.

Once you get the idea behind the successor, predecessor is easy.
  1. Predecessor is highest in left subtree.
  2. If there exists no left subtree, then:
    1. If the node is right-child, immediate parent is the predecessor.
    2. If the node is left child move up till you get a node whose right child is one of the node's parents.

Deletion: You must be wondering why I am covering delete so late? That's because it's so damn hard without understanding the concepts above. So, here is the algorithm to delete a node:
  1. If the node had no child, well just knock it off.
  2. If it has one child, yank the node and connect the child to the node's parent.
  3. If it has two children, then we need to get a replacement for this node. What would be a good replacement? A good replacement is the one that on replacement, maintains the BST property. (At this moment I want you to pause, and think if you have to replace this node who could be the best candidate?) You can choose either the successor or the predecessor. The common practice is too use successor (further discussion assumes we replaced with successor). And then delete the successor from the subtree.

Now, if you read successor code properly, you will observe that if the successor chosen  from right subtree, the successor will have at most one child. So, it is NOT going to be a recursive as in "delete a node with two children, then delete the successor with two children, and so on". NO. Just place the successor's value in node. Then clip the successor using either step 1 or step 2.
successor(node)
  //if exists a right subtree
  if node.right != NIL
    return findMin(node.right)
    
  parent = node.parent
    
  // if it is left child this loop wont executeoutput here
  // else move up until a we get first right parent
  while parent != NIL and parent.right = node
    node = parent
    parent = parent.parent
    
  return parent
  
predecessor(node)
  //if exists a left subtree
  if node.left != NIL
    return findMax(node.left)
    
  parent = node.parent
  
  // if it is left child, this loop wouldn't execute
  // else move up till a left parent arrives
  while parent != NIL and parent.left = node
    node = parent
    parent = parent.parent
    
  return parent
  
delete(node)
  // leaf node
  if node.left = NIL and node.right = NIL
    if node = node.parent.left
      node.parent.left = NIL
    else
      node.parent.right = NIL
      
  // half node
  if node.left = NIL
    node.right.parent = node.parent
    if node.parent.left = node
      node.parent.left = node.right
    else
      node.parent.right = node.right
  else if node.right = NIL
    node.left.parent = node.parent
    if node.parent.left = node
      node.parent.left = node.left
    else
      node.parent.right = node.left
      
  // complete node
  successor = successor(node)
  node.value = successor.value
  delete(successor) 
Footnotes:
[1] The basic idea is to have elements less than the parent on one side and greater than the parent, on the other. And, whatever criteria you choose, it must apply to all the nodes. The most common is to have smaller in left, but if your rebel inner self does not want to conform to the standard go ahead, make a tree with higher in left.

[2] Refer Introduction to Algorithms - CLRS, Chapter 12 Binary Search Trees: 12.4 Randomly built binary search trees

[3] View/download Java code from GitHub, You can see sample output here.

1 comment: