Data structures: Tree special

A special section on trees/heaps/graphs because these are the data structures I have used the least professionally. Find the introduction on data structures here.

From graphs..

Graphs can be implemented in three ways says Google/get that job at Google. These are objects and pointers, adjacency list and adjacency matrix, very nicely explained here. In this Leetcode question adjacency list is used backed up by a 2D array but it could have been a Hashmap<Vertex, List<Vertex>, a List<List<Integer>> (ok same as a 2D array) etc. Note that adjacency list as is is not appropriate for weighted graphs.

Method Suitable for
Pointers easy to recurse, (un)weighted
Matrix dense graphs, (un)directed, (un)weighted
List sparse graphs, unweighted

For reference, there are other ways too, e.g. edge list (it is what the name suggests: a list of edges, the 3rd way listed here, incidence matrix? or this pointer-hell but you are not likely to encounter those.

Analysis

Analysis is done based on edges and vertices (instead of just n).

Source

Operation Add/delete edge Are i-j connected Find path i-j Find successors of i
Matrix O(1) O(1) O(n^2) O(n)
List of vertices to vertices logn logn O(V+E) for DFS O(k): k being the list of successors

Pointers: easy to recurse, can support weights. Space analysis: for sparse graphs

TODO complete with Nutshell and wiki

..to trees

Trees are special cases of graphs: an undirected graph that is connected but has no cycles 2. Terminology on trees is not universally agreed upon. Here I follow 2's names.

We are mostly interested in binary trees: formally, a binary tree is either empty or a root with a left and right binary tree. Less formally, each node has at most two children (a special case of k-ary tree for k=2).

  • Depth of root is 0. Height of the tree is the maximum depth of any node. A level consists of nodes on the same depth. A leaf is a node with no children (or descendants).
  • Every node has a unique parent (except the root). The path to a node from the root is unique as well.
  • Full binary tree: every node beside the leaves has 2 children.
  • Perfect: a full tree in which all leaves are at the same depth
  • Complete: every level except possibly the last is completely filled (so every node beside the leaves has 2 children=full tree??) and all nodes are as far left as possible.
  • Balanced: a tree which for every node, the height of the left and right subtrees differ at most by 1. A perfect or complete tree is balanced but not vice versa. The height of a balanced binary tree is logn (which is also the minimum height a tree can take)

Contains reference to parent or not?

..to special trees

  • Heap: a complete binary tree that fulfils the heap property: every node is >= then its children. Due to the complete-ness, it can be implemented as an array (left child if it exists is 2n and right 2n+1).
  • Binary search tree: a binary tree (not complete!) where every node is greater than or equal to the left subtree (not just child) and the less than or equal to the right subtree.
  • B-tree: a BST with more than 2 children
  • Red-black and AVL: self-balanced BSTs
    • Balancing a tree is probably off limits for an interview1 but checking for balance is a legit problem.
  • Trie: a n-ary tree that stores characters
    • Suffix tree is a compressed trie
      • Check solution of 18.8 on how to construct one.
      • Video on how to use one.

Traversals

  • BFS: uses a queue, iteratively or recursively (to print per layer)
    • Good for optimization problems (find shortest path) or you are ready to quit if you go far from the root.
    • Dijkstra is like BFS for a weighted graph.
      • In fact it can be implemented with a priority queue BFS.
  • DFS: 3 types depending by when the root/current node is processed
    • In-order: left tree - node - right tree
    • Pre-order: node - left - right, this is what people refer to usually when they say DFS.
    • Post-order: left - right - node
    • Recursive implementations are the easiest

Graphs in real Life

  • Garbage collection algorithms (mark 'n' sweep) do a graph traversal on the mark part.
  • Deadlock detection see 18.4 in 2

Just for reference

  • Dijkstra and Bellman-Ford are single source shortest path
  • Floyd-Warshall is to find shortest paths for all pairs of vertices.
  • Prim's and Kruskal's are for constructing a MST (minimum spanning tree)
    • The difference is that with a shortest path algorithm you might end up with something that is not a tree (e.g. has cycles)

Groups of problems

Easy:

  • You are given a tree or graph structure: check balance (4.1 in 1), connectivity of nodes (4.2 in 1), check BST property (4.5 in 1), sum (leetcode 112 or Ctci 4.9),
  • Construct a tree from data structure e.g. 4.3 in 1

Hard:

  • Model a problem as a tree/graph e.g. the map in 18.10 1 is an adjacency list (Vertices are strings, edges are the key-value association (with direction))

Resources

  • Cracking the Coding interview1 and the accompanying repo
  • Elements of Programming Interviews in Java2 and the repo
  • Algorithms in a Nutshell