## Wednesday, October 10, 2012

### BST: Binary Search Tree

Definition: A binary tree is... umm... well, lets ditch formal definition (refer wikipedia for that). A binary tree represents a data-structure as this image:

The elements are called nodes. Each node can have at most two children. The top element is called root-node or just root. A binary search tree (BST) is one that has elements in its left subtree less than itself; and elements in its right subtree is higher than it. If you remember quick sort earlier, any node in a BST is basically a pivot. Expected height of a randomly build binary search tree with `n` elemements is `lg(n)` where lg is log2.

Most operations in a BST is simple and recursive. I will list the algorithms here.
```Node
value
left
right
parent

insert(root, node)

// use two pointer, current to the place where new node will be inserted,
// and parent holds it's parent reference
val = node.val
current = root
parent = root

// move in tree following BST rule, searching  the place where this node
// should be. This loop ends at a leaf, giving the future parent to the node
while current != NIL
if val < parent.value
current = parent.left
else
current = parent.right

parent = current

// root was null, no tree was pre-exiting
if parent = NIL
node.parent

// plug in the node to correct side of the parent
if val < parent.value
parent.left = node
else
parent.right = node

//assign the parent
node.parent = parent

return root

//iterative
search(root, value)
current = root
found = false
result = NIL

while found = false and current != NIL
if current.value = value
result = current
found = true
else if value < current.value
current = current.left
else
current = current.right

return result

//recursive
search(root, value)
if root = NIL or root.value = value
return root

if value < root.value
return search(root.left, value)
else
return search(root.right, value)

//min is left most node, right? Think about it.
findMin(root)
if root = NIL
error("Nil BST?")
min = root
while min.left != NIL
min = min.left

return min

//max, same as min but in right subtree
findMax(root)
if root = NIL
error("Nil BST?")
max = root
while max.right != NIL
max = max.right

return max

//in order traversal, prints sorted node values
inOrderTraversal(root)
if root = NIL
return

inOrderTraversal(root.left)
print root.value
inOrderTraversal(root.right)
```
So, these were the easier ones. With a little effort one can understand these. I will now discuss, some of the harder ones.

Successor and Predecessor: A successor of a node is a node which comes the next if we sort the tree in increasing order. A predecessor of a node is the node that comes immediately before it, if we sort the tree in increasing order. That is it. In a tree containing values `1 3 5 6 7 9`, the successor and predecessor of `5` are `6` and `3`, respectively.

To  find successor of a node we need to find immediate larger node. So, here is the algorithm:
1. The right subtree of the node will contain the values larger than the node. (right?) If we find the lowest node in right subtree we will have the immediate larger node, and we get our successor.
2. What if the node does not have a right subtree? We will have to go up in the tree somewhere. See the image below.
1. If the node is left child, the parent will the immediate next.
2. If the node is right child, we need to keep moving up until we get a node whose left child is one of the node's parents.

Once you get the idea behind the successor, predecessor is easy.
1. Predecessor is highest in left subtree.
2. If there exists no left subtree, then:
1. If the node is right-child, immediate parent is the predecessor.
2. If the node is left child move up till you get a node whose right child is one of the node's parents.

Deletion: You must be wondering why I am covering `delete` so late? That's because it's so damn hard without understanding the concepts above. So, here is the algorithm to delete a node:
1. If the node had no child, well just knock it off.
2. If it has one child, yank the node and connect the child to the node's parent.
3. If it has two children, then we need to get a replacement for this node. What would be a good replacement? A good replacement is the one that on replacement, maintains the BST property. (At this moment I want you to pause, and think if you have to replace this node who could be the best candidate?) You can choose either the successor or the predecessor. The common practice is too use successor (further discussion assumes we replaced with successor). And then delete the successor from the subtree.

Now, if you read successor code properly, you will observe that if the successor chosen  from right subtree, the successor will have at most one child. So, it is NOT going to be a recursive as in "delete a node with two children, then delete the successor with two children, and so on". NO. Just place the successor's value in node. Then clip the successor using either step `1` or step `2`.
```successor(node)
//if exists a right subtree
if node.right != NIL
return findMin(node.right)

parent = node.parent

// if it is left child this loop wont executeoutput here
// else move up until a we get first right parent
while parent != NIL and parent.right = node
node = parent
parent = parent.parent

return parent

predecessor(node)
//if exists a left subtree
if node.left != NIL
return findMax(node.left)

parent = node.parent

// if it is left child, this loop wouldn't execute
// else move up till a left parent arrives
while parent != NIL and parent.left = node
node = parent
parent = parent.parent

return parent

delete(node)
// leaf node
if node.left = NIL and node.right = NIL
if node = node.parent.left
node.parent.left = NIL
else
node.parent.right = NIL

// half node
if node.left = NIL
node.right.parent = node.parent
if node.parent.left = node
node.parent.left = node.right
else
node.parent.right = node.right
else if node.right = NIL
node.left.parent = node.parent
if node.parent.left = node
node.parent.left = node.left
else
node.parent.right = node.left

// complete node
successor = successor(node)
node.value = successor.value
delete(successor) ```
Footnotes:
 The basic idea is to have elements less than the parent on one side and greater than the parent, on the other. And, whatever criteria you choose, it must apply to all the nodes. The most common is to have smaller in left, but if your rebel inner self does not want to conform to the standard go ahead, make a tree with higher in left.

 Refer Introduction to Algorithms - CLRS, Chapter 12 Binary Search Trees: 12.4 Randomly built binary search trees

 View/download Java code from GitHub, You can see sample output here.

#### 1 comment:

1. 