Shell Sort

shell sort is an in-place, not stable comparison sort to improve insertion sort; it is not to be confused with Unix shell sort which is a command line utility for sorting.

This is a generalization of insertion sort which enables distant entries to exchange with each other, compensating the bad performance of insertion sort when exchanging entries far apart. Specifically, insertion sort is adopted first to sort on a widely arranged entries, then the less widely arranged entries until the full entries in the last run.

The following steps summarize the shell sort:

  1. Pick an initial value for gap sequence h.

  2. Divide the inputs into sub-sequences, each with equal gap h.

  3. Sort the sub-sequences using insertion sort.

  4. Decrease current gpa value h by 1 and repeat the process 2, 3 and 4 until the full inputs are sorted.

Figuratively, suppose to sort a sequence: {35, 33, 42, 10, 14, 19, 27, 44} and the initial gap sequence is 4; the first run will sort sub-sequences {35, 14}, {33, 19}, {42, 27}, {10, 44} and yield {14, 35}, {19, 33}, {27, 42}, {10, 44}. In the next run if h decreases by 1, the sub-sequences will be {35, 10, 27}, {33, 14, 44}, {42, 19}

When the gap sequence number is reduced down to 1, the shell sort becomes a total insertion sort in one run.

Pseudocode

SHELL_SORT(array, gaps)
  for gap in gaps
    for i := gap to 1
      do INSERTION_SORT on entries with gap interval starting from index (gap - i + 1)

Gap Sequences

The running time of shell sort heavily depends on the gap sequences it uses. Every gap sequence that contains 1 outputs the correct results but choosing a different gap sequence would yield a totally distinct shell sort.

Two properties should be known about shell sort before choosing a proper gap sequence:

  1. any two numbers within a gap sequence should be prime with each other. (consider a bad gap sequence {1, 3, 5, 6, ...}, a 6-sorted array has common entries when performing 3-sort and render it useless).

  2. given two numbers in the gap sequence g and h, a g-sorted array after h-sorted remains h-sorted and vice versa.

Finding a proper gap sequence remains an open problem in academia and there are prevalent choices such as Knuth sequence: {1, 4, 13, 40, 121, ...}. It is worth noted that initial gap sequence number evaluates to no more than the length of the inputs and it would perform nothing sorting if not otherwise.

Formal Def:

define a incremental sequence 𝒮 against a predefined set of variables 𝒳 and locations of sequence number k, then

< 𝒮(𝒳, 1), 𝒮(𝒳, 2), ... 𝒮(𝒳, k) > would be a gap sequence wherein 𝒮(𝒳, k) ⩽ n (length of inputs), 𝒮(𝒳, 1) = 1

Knuth Sequence

Invented by a great computer scientist Donald Knuth, this sequence demonstrates a characteristic of arithmetic simplicity, specifically, 𝒮 = (3k-1)/2; it produces a sequence {1, 4, 13, 40, 121, ...} that satisfies the above shell sort properties.

Sedgewick Sequence

In 1986, [Prof. Robert Sedgewick](https://en.wikipedia.org/wiki/Robert_Sedgewick_(computer_scientist)) invented a gap sequence for shell sort {1, 8, 23, 77, 281, ...} that has a optimal computational complexity Ο(n4/3); Formally, 𝒮 = 4k + 3 ⋅ 2k-1 + 1, prefixed with 1. It is still the best choice for the gap sequence generator.

Additional Resources

  1. Analysis of Shell Sort and Related Algorithms, R. Sedgewick, Princeton U. http://www.cs.princeton.edu/~rs/shell/paperF.pdf

Last updated