Directory of Algorithms And Data Structures

By Steven Fletcher
2009/11/14
2009/03/24

This document provides a directory of what algorithms and data structures to use in specific circumstances. In most cases, the algorithms themselves aren't detailed or are only detailed enough to act as a reminder of what they are.

This document isn't complete and may not be entirely accurate. It's just a catalogue to help people figure out what algorithms and data structures they should use in different situations.

Worst Case Time Analysis

The following table supplies worst case time analysis for different operations on different data structures. N represents the number of elements in the data structure. O(A) means that the operation uses amortized constant time. Some algorithms and structures will have additional notation that is listed as they are described.

Easy Operations

Some operations, such as isEmpty() and size(), can be made O(1) for any kind of data structure. Even if they aren't naturally O(1), a simple counter can be added to the structure and updated by add-like and remove-like methods.

Algorithms and Data Structures

Common Algorithms

Graph Search Algorithms

In time complexities, V represents the number of vertices, and E represents the number of edges. A "minimum path" is a path between two points that has the minimum size of any path between those points.

Lists - Sorted

Structure Add Get Iterate Remove Search
Array (Sorted) O(N) O(1) O(N) O(N) or O(1) at end O(lg N)
Linked List (Sorted) O(1) * N/A O(N) O(1) * O(N)
Skip List O(1) * N/A O(N) O(1) * O(lg N) **

* The starred operations of Linked List are only O(1) when they are done during iteration. If you must iterate over the list to find the data to update, then the operations will actually take the amount of time required to search the structure.

** Assuming that there's lg N layers in the skip list, and the Nth layer skips 2N - 1 elements.

Add - Adds another element to the end of the structure. Or possibly to the beginning if the structure behaves in that matter. Some structures don't have "ends", so this operation doesn't apply.

Get - Gets an element anywhere in the data structure. This operation is really only intended for random access data structures. Changing an element within the data structure (i.e. "putting" an element into the structure) often takes the same amount of time when it's an available operation.

Iterate - Iterates over all the elements of the data structure.

Remove - Removes an element anywhere in the structure. Some structures have better performance when the element is removed from the end of the structure.

Search - Finds an element within the data structure.

With some creativity, many of these operations can be extended to work for multiple operations at once and run in O(N) or even less.

Array (Sorted) - A sorted array has slower inserts than an unsorted array, but it is searchable much faster. The reason insertions are O(N) rather than O(lg N) is that the elements after the insertion point must be shoved forward. Thus, inserting elements one at a time is O(N2), but sorting the array all at once could be as fast as O(N lg N). Use a sorted array when you need to be able to search quickly.

Linked List (Sorted) - A sorted linked list really isn't any faster than an unsorted linked list. The problem is that there's no random access to get at elements within the linked list. Modified linked lists may offer better performance in these situations.

Skip List - A skip list is a special kind of sorted linked list. The list actually contains multiple linked lists. The bottom layer is just a plain linked list. Each higher layer is a linked list that skips some of the elements, allowing faster searching. Lists with fewer layers may not achieve O(lg N) performance.

Lists - Unsorted

Structure Add Get Insert Iterate Remove Search
Array (Bag) O(A) O(1) O(A) O(N) O(1) O(N)
Array (Unsorted) O(A) O(1) O(A) O(N) O(N) or O(1) at end O(N)
Linked List (Unsorted) O(1) N/A O(1) ** O(N) O(1) ** O(N)

The operations are the same as those for ordered lists except that the following operation is added:

Insert - Inserts an element into the data structure at a specific position, maintaining the ordering of the other elements.

(Insertion isn't included for sorted lists because any element added must be put in the correct position. Hence, you shouldn't insert elements at specific positions.)

Array (Bag) - A bag is an array that doesn't maintain any kind of ordering. This allows it remove to operate in O(1) because it need only swap the last element with the element being removed and then decrement the size of the bag.

Array (Unsorted) - An unsorted array has fast get and put operations but requires a sequential search of the array for anything else.

Linked List (Unsorted) - A linked list contains a series of nodes, each of which links to the next node. A doubly-linked linked list links each node both to the next node and to the previous node. It is the same sort of thing but more complicated.

Modification

Push/Pop Data Structures

These data structures have 3 operations:

Structure Type Peek Pop Push
Deque FIFO/LIFO O(1) O(1) O(1)
Priority Queue Sorted O(1) O(1) O(lg N)
Queue FIFO O(1) O(1) O(1)
Random Queue Random Queue O(1) O(1) O(1)
Stack LIFO O(1) O(1) O(1)

Deque - A deque can push and pop elements onto either the end or the beginning. (It will need to have methods for both.) It can be backed by a doubly-linked linked list or by a dynamic array (see queue description for dynamic array details.) Despite its operations all being amortized constant time, it will most likely experience worse performance than a normal queue or stack.

Priority Queues - A priority queue sorts elements pushed onto it by "priority". The client can only pop the highest priority item off the queue. The times listed assume that the priority queue is backed by a binary search tree, but it could be backed by a sorted array instead.

Queue - The first element pushed on will be the first element popped off. It is possible to implement pop in amortized constant time even when the queue is backed by an array. (Store counters for both the first and last elements. Whenever an element is popped, increment the first element counter. When either counter passes the array's size, it wraps around to 0. The queue is empty when the first and last counters matched and full when pushing one more element would make the first and last counters match.)

Random Queue - The elements are pushed on in any order, and popping elements pops a random element anywhere within the queue. Random Queues should be backed by Bags because they have O(1) remove methods. Bags are acceptable because Random Queues need not be ordered in any particular way.

Stack - The last element pushed on will be the first element popped off. There is no good reason to have a stack backed by anything other than a dynamic array.

Sorting

Algorithm Worst Case Average Case Stable Quitable Notes
Bubble Sort O(N2) O(N2) No Slowest
Heap Sort O(N lg N) O(N lg N) No No
Insertion Sort O(N2) O(N2) No Good for nearly sorted arrays
Merge Sort O(N lg N) O(N lg N) Yes No Requires an array of size N
Quick Sort O(N2) O(N lg N) No No Best; bad for already sorted arrays
Radix Sort O(Nw) O(N) to O(Nlg N) No No Requires a number of "buckets"
Selection Sort O(N2) O(N2) Yes Good when exchanges are expensive
Shell Sort Varies Varies No See below

A "stable" sort maintains the order of elements with equal keys. This is nice in certain situations. For instance, if you have a list of students sorted alphabetically by name and then sort them by year of graduation, within each year of graduation, the students will still be sorted alphabetically by name.

A "quitable" sort is one that guarantees that the first so many elements are sorted correctly at any given step in the algorithm. This is useful when you only have a limited period of time during which to sort a large array.

Radix sort is O(Nw), where w is the average length of the words being sorted. A "word" is a sequence of "characters", where each character is whatever length you want it to be. The character length is the number of buckets in the radix sort. Radix sort works very well for data that is already in this form. In particular, it works well for Strings, since the String.compareTo method already does practically the same thing anyways. Any other sorting method is doing a miniature radix sort every time it compares two string anyways.

Shell Sort's performance depends upon what increment sequence is used. In general, you're better off using Quick Sort, which you can most likely find already implemented for you.

Linked lists can be sorted in essentially the same way as arrays, but the implementations are a bit more challenging. Since sorting isn't a situation in which having linked lists is helpful anyways, sorting linked lists is unusual.

Tables

These structures support "getting" an element from anywhere in the structure and "putting" an element into the structure. There's no search operation for these structures because they typically do any searching required during get. There's no way to access the elements of these structures by any kind of index (as would be done with an array) unless it is added in addition to the regular functionality.

K represents the length of the key being looked up in the trie.

Structure Get Iterate Put Remove
Bitset O(1) O(N) O(1) O(1)
BST (Balanced) O(lg N) O(N) O(lg N) O(lg N)
BST (Unbalanced) O(N) * O(N) O(N) * O(N) *
Hash Map O(1) ** O(N) O(1) ** O(1) **
Trie O(K) O(KN) O(K) O(K)

* The average case behavior of most unbalanced Binary Search Tree operations is O(lg N) (specifically 1:39 lg N) even though the worst case behavior is pretty bad.

** Hash Map operations only occur in constant time if certain criteria are satisfied. Basically, the table has to be sparse with few collisions. This requires that the hash code be chosen well and that the hash codes are used properly. It also requires that the hash map dynamically increases its number of hashes if it gets too full.

Binary Search Tree (Balanced) - A binary search tree is a tree where each node has a maximum of 2 children. The left subtree contains only keys less than or equal to the root's key, and the right subtree contains only keys greater than or equal to the roots key. BSTs have good average case behavior for everything. In a balanced tree, the left and right subtrees of any node have heights that differ by at most 1. This invariant must be maintained (using one of several different methods for this).

Binary Search Tree (Unbalanced) - This is the same as a balanced binary search tree except not balanced. The behavior is conceivably much worse because the tree might be organized in an unfortunate way.

Bitset - A bitset is an array (typically of ints) where each bit is used like a boolean. This is the fastest implementation for situations where all you really need is a boolean array, not a regular collection.

Hash Map - A hash map uses keys that map to values. Each key is converted to a code before it's stored or used.

Trie - A trie is a map that uses Strings (or byte arrays) as keys. It is similar to a deterministic finite state machine in structure. Performance depends upon the length of the keys, not the number of elements in the trie.

Weird Stuff

This is stuff that you most likely will never have a use for. It is included only for completeness.

References

  1. Section 5.3, Algorithms in Java.
  2. Program 5-13, Algorithms in Java.
  3. Programs 4.5-4.6, Algorithms in Java.
  4. http://en.wikipedia.org/wiki/A*_search_algorithm
  5. http://en.wikipedia.org/wiki/Breadth-first_search
  6. http://en.wikipedia.org/wiki/Depth-first_search
  7. http://en.wikipedia.org/wiki/Best-first_search
  8. http://en.wikipedia.org/wiki/Uniform-cost_search
  9. Program 3.10, Algorithms in Java.
  10. Programs 1.3-1.4, Algorithms in Java.

Copyright (C) 2009 Steven Fletcher. All rights reserved.