This is the first part on a quick reference/revision on data structures and algorithms. My main resources are 3 books:
- Cracking the Coding interview1
- Elements of Programming Interviews in Java (2)
- Algorithms in a Nutshell
Implementing details will be in Java. Disclaimer: my main goal is interview preparation and not a thorough intro on CS fundamentals, so expect some simplifications or omissions.
First things first: terminology
- Time complexity
- Space complexity: memory needed at any point in time. Unlike time complexity this is not the total/cumulative used by the algorithm.
- We care for auxiliary space complexity. This means to ignore input storage (otherwise we would start with O(n) and upwards 2) SO question
- Allocations (e.g. of an array of size n) count against the complexity.
- Recursion depth matters because variables stay in the stack. Specifically the space used is equal to "the deepest level of recursion" multiplied by "the size of the stack frame for each level". 2 remarks:
- The above is not true in languages with tail call optimization (like ES6) because they don't assign a new stack on tail recursion.
- In the fibonacci algorithm (
return fibonacci(i-1) + fibonaci(i-2);) the first part is evaluated completely before the second, therefore we never get deeper than n levels of recursion.
- Cases can be worst, average or best: usually we refer to average or worst case.
- Analysis can be asymptotic or amortized.
- Complexity f(n) is usually expressed in terms of big O-notation (upper bound - takes at most f(n) time/space). Other notations are lower bound Ω, tight-bound Θ (both lower and upper bound - takes at least and at most f(n) time/space) or little-o (stricter than big-O) but they won't be used in an interview setting.
- For reference, from smaller to bigger - O(1): constant, O(logn): logarithmic, O(sqrt(n)) or generally n^d where 0<d<1: less common, appears in primality check, O(n): linear, O(nlogn): quasilinear, O(n^2): quadratic, O(n^3): cubic, O(n^k): polynomial, O(2^n): exponential, O(n!): factorial
- Amortized: is the "average" cost over a number of operations because worst case can be too pessimistic, for more see amortized analysis wiki. An example in data structures is the cost of resizing an array list: since it happens every n insertions, it's amortized cost is O(1) among all insertions.
Basic data structures
Mostly from 2. The basic operations are access (knowing the index), lookup/search, insertion and deletion.
- Insertion and deletion complexities are the same (afaik).
- All of the data structures need O(n) space (duuuh), besides skip list?
|name||access||search||insert/delete||Java||custom impl., notes|
|strings||same as array||String, StringBuilder||immutable (in most?) languages|
|list||O(n)||O(n)||O(1)||LinkedList (is doubly linked!)||array or dynamic allocation|
|stack||array or list, depends on implementation||ArrayDeque or LinkedList (not Stack interface)||array or dynamic, LIFO, think of a stack of plates|
|queue||same as stack||ArrayDeque or LinkedList||FIFO, array or dynamic|
|tree||depends on array or list implementation||single or double list to keep reference to parent|
|heap||?||O(1) for max||O(logn)||PriorityQueue||pointer or 1-dimensional array representation|
|BST||?||O(h=logn) for balanced trees||O(h)||TreeSet or TreeMap|
|hash table||-||O(1)||O(1)||HashSet or HashMap|
More data types: Abstract ADT
- bag: unordered set, can contain duplicates
- set: unordered, no duplicates
For related Java classes and implementation tips see here.
How to choose an appropriate structure
- Do you have the algorithm in mind already?
- Look at the special properties of each structure e.g. use stack if you need to process elements in LIFO order from the input or a set to remove duplicates.
- Look at what operations will we do the most e.g.
- Insertions/deletions at start or middle: a list would be better than an array because an array would need to be shifted. If they are on the end they don't affect that much.
- Search/lookups? Hashmap if we need to associate it with a value, hashset otherwise
- Random access (e.g. in a map)? Array
If you need to create a custom implementation:
- Initial capacity? What happens on resizing?
- Permits nulls or duplicates?
- Should it be thread safe?
In real life
- The call stack aka program stack is really implemented as a stack: the last method to enter executes first. Heap like a linked list?