Heap

Heap is a widely adopted data structure in various computing applications such as priority queues, heap sort[[1]](https://en.wikipedia.org/wiki/Heap_(data_structure)) and so on.

Inherited from general tree structure, a heap is either a max-heap or a min-heap. Normally, the heap is referred as binary heap, theoretical tree-like, which could be stored in either a static or dynamic structure such as a static array, tree nodes.

It is mostly favored to use a static array structure given that INSERTION operations of new entries always happen in a row from left-side of the tree to the right-side of the tree, which means the heap structure is nearly a complete binary tree fully filled except the leaf level. Figuratively,

New elements will be promptly appended to the end of the array and entry removal only happens at the root. And these operations will incur the heap maintenance steps to ensure its status as a max-heap or min-heap

Note: There are certain operations to take such as PARENT(i) = i/2, which is the index of node i's parent; LEFT(i) = 2i, which is the index of node i's left child; RIGHT(i) = 2i + 1, which is the index of node i's right child.

Max Heap and Min Heap

In a max heap, the key of a node is larger than or equal to the keys of its children. The largest element is stored in root. Specifically, given an array A for entry storage:

A[PARENT(i)]A[i]

Similarly, a min heap will have keys of nodes are smaller than their children. The smallest entry is stored in root. Then,

A[PARENT(i)]A[i]

In a Heap Sort algorithm, the max heap is chosen while the min heap is generally common in building a priority queue.

Note: only max heap data structure is used in further discussions.

Heap Property Maintenance

A MAX-HEAPIFY is a critical process for heap property maintenance. Given an array A and an index i, assuming LEFT(i) and RIGHT(i) are both max heap. Then, MAX-HEAPIFY is called upon if A[i] smaller than its children in order to adjust the position of A[i] in the total heap to maintain the overall max heap property.

It is worth noted that MAX-HEAPIFY operation should only be performed where a single heap property violation happens. In a top-down fashion,



MAX_HEAPIFY(A, i)
  l = left(i)
  r = right(i)
  if l ⩽ heap-size(A) and A[l] > A[i]
    largest = l
  else
    largest = i
  if r ⩽ heap-size(A) and A[r] > A[largest]
    largest = r
  if largest ≠ i
    swap A[i] and A[largest]
    MAX_HEAPIFY(A, largest)

Note: recursive calls happen because after swapping the current index i entry with left or right child, the corresponding right or left sub-tree could have a new heap property violation.

The operations before each call of MAXHEAPIFY take constant time, and the total number of calls on MAX_HEAPIFY is bounded by Ο(_h), wherein h is the height of the heap. Therefore, the time complexity of MAX_HEAPIFY operation is Ο(log(n)) for a n-entry heap.

Build a Heap

Given an unordered inputs A stored in a static array structure, build a max heap from it involves iterative calls to MAX_HEAPIFY operation. Specifically,

BUILD_MAX_HEAP(A)
  for i = length(A)/2 to 1
    MAX_HEAPIFY(A, i)

wherein all leaves of the heap are between the index length(A)/2 + 1 to length(A). The overall process is a bottom-up fashion and generate the max heap regardless of the number of heap property violations.

Algorithm Analysis

At the bottom level of the heap, there are 2h nodes with each cost none for the heapify operation; at the level above the bottom, there are 2h-1 nodes with each cost most 1 swapping for the heapify operation, and so on. Figuratively,

Then, at the level j, there are 2h-j nodes with each cost most j swappings for the heapify operation. Counting them up,

By infinite geometric series, the sum of j/2j converges to 2; thus,

Τ(n) ⩽ 2h+1 = n + 1 = Ο(n)

Obviously, the operation must access each of the inputs during heap building and a more tighter bound will be Θ(n).

Additional References

Last updated