Tuesday, October 2, 2012

Heap and Heap Sort

Heap: A heap is a nearly complete binary tree, that means tree is filled at all levels except probably the last one. Interesting thing about heaps are with a little set of rules on how to determine children of a node or parent of a node -- they can be easily stored in a array. There are two types of heaps max-heap, and min-heap. Any node of a max-heap is larger than its children. Thus the largest element stays on the top. A min-heap is one whose nodes are smaller than their children. A min-heap has the lowest value at the top. In all discussions here, I will assume heap implies max heap.

So, with the knowledge that (a) max heap has larger value to parent nodes, (b) it's a nearly complete binary tree. We can visualize a tree, that look like the image above.

Heap Representation: If we index nodes from top to bottom, from left to right, we would come with numbers (the blue digits in the image) that can, perhaps, work as index in an array. The only problem is how one would know what are children of a node; or what are the parents of nodes?

Look closely, you'd see the following relationship:
  1. Left child of node with index i, has index 2*i 
  2. Right child of node with index i, has index 2*i + 1 
  3. Parent of any node with index i, has index floor(i/2)
Well, now we can just forget the tree and all the baggage (pointer maintenance, recursive traversal, and extra space) associated with a tree. We'd use array to represent heap, with the following ground rules
Assume heap is repsented by array A[1..n]
LeftChild(i) = 2*i
RightChild(i) = 2*i + 1
Parent(i) = floor(i/2)
Creating a Heap: Building heap is two step process, append the new element to the array; and then check if all of it's parents (parents, grandparents, great-grandparents,...) satisfy the heap property. If not, fix it. Here is the process
  1. Append the element to the array
  2. Check if it's parent satisfy the heap-property (that each of it's children is smaller than the parent), if not swap them. Now check, if the newly swapped location satify the heap-property. Continue until you either get a parent that is OK with it or reach to top. This process is called Heap-Increase-Key or Bubble Up or Percolate Up.
Here is basic algorithm for a max-heap

  if i <= 1
    return NIL
  return floor(i/2)
  if 2*i > heapSize
    return NIL
  return 2*i
  if 2*i + 1 > heapSize
    return NIL
  return 2*i + 1

  parent = parent(i)
  while parent != NIL and A[parent] < A[i]
    swap(parent, i)
    i = parent
    parent = parent(i)
  if heapSize >= N
    return error("Not enough space")
  heapSize = heapSize + 1
  A[heapSize] = key
Complexity: parent, left, right are simple O(1) operations. Bubble up is a order O(lg(n)) operation. Because, in worst case, we have T(n) = T(floor(n/2)) + O(1) while n >= 2. O(1) for swap operation.

Heapify an Array: Heapify means toss and turn the array to make sure all the nodes statisfy heap-property. If you have an array, A[1..N], we need to make sure all the parents have children that are smaller than it. We should start from heapifying the last parent to root. So, who is the last parent? It's the parent of last child -- A[floor(N/2)]. We will visit all parents starting from floor(N/2) to 1, and make sure they satisfy heap property. The process is similar to bubbleUp, except here we are ajusting parents fixing the tree under it. This process of moving node to a lower position is called, to Heapify or to Bubble Down or to Percolate Down.

bubbleDown(A, i)
  maxIndex = i
  left = left(i)
  right = right(i)
  if left != NIL and A[left] > A[maxIndex]
    maxIndex = left
  if right != NIL and A[right] > A[maxIndex]
    maxIndex = right
  if maxIndex != i
    swap(i, maxIndex)
    bubbleDown(A, maxIndex)
heapify(A, N)
  heapSize = N
  for i = floor(N/2) to 1
    bubbleDown(A, i)

Complexity: Bubble down is similar to bubble up. In worst case, when  you will always hit the recursion block till leaf nodes. Also, you will have exactly 2*n/3 elements in worst case, where bottom level of the tree is exactly half occupied. So, we have T(n) <= T(2*n/3). This leads to order O(lg(n)).

Heap Sort: Now that we know how to convert an array into a heap, sorting is very simple.
  1. Heapify the array, making it max-heap. This will make sure largest element at index 1.
  2. Swap the first element with the element at heapSize,  decrease the heapSize by 1. Effectively, sending the top element at right location and outside the heap.
  3. Fix the heap by bubbling down the element swapped at 1.
  4. Repeat 2 and 3 until only one element remains.
  heapify(A, A.length)
  for i = A.length to 2
    swap(1, i)
    heapSize = heapSize - 1
    bubbleDown(A, 1)
Complexity: Heapsort is n*log(n). You see, heapify is O(log(n)) and so is bubbleDown. We are doing heapify once and bubbleDown (n-1) times. This makes total complexity of this agorithm O(n*log(n)).

[1] Introduction to Algorithms - CLRS, chapter 6 - Heap Sort
[2] The Algorithm Design Manual - Skiena, chapter 4.3 - Heap
[3] Download Java code from gitHub

1 comment: