data structure
Lookup the complexity here.
linear
- array
- linked list
- n integers in range [1, n] can be interpreted as linked list
- skip list
- In linked list, search will take O(n)
- Skip list choose to build indexes list that skip nodes in the list.
- When doing the search, we will search the most sparse list and proceed to the denser list.
- We improve by effectively reducing search size n.
- The layer of indexes is randomly decided when inserting a node.
- disjoint-set / union-find / merge-find
- flatten the tree by setting intermediate node’ parent to actual parent during find
- Floyd’s Algorithm: find a loop in linked list
- hare is two times faster than tortoise
- 2 * (F + a) = F + a + n * C
- F + a = n * C
- After they meet again:
- (F + a) + (C - a) = F
- hare is two times faster than tortoise
stack / queue
- stack
- LIFO structure
- queue
- FIFO structure
- priority queue / stack
hash table
tree
- binary search tree
- if the data is sorted at insertion, a binary tree might be heavily imbalanced.
- b / b + tree
- basically the same as binary tree except that each node can have multiple keys and child nodes
- maintain continuous data structure inside one node helps performance
- red-black tree
- node being either red or black
- root is always black
- no adjacent red node
- every path from a node to any its descendant null node has the same number of black nodes
- splay tree
- avl
- kd tree
- segment tree
- each node keep track of a interval and the result on this interval
O(n)
to create such tree
- when searching for a interval, we will:
- compare the interval with both child branch
- follow the branch when it partially matches
- each node keep track of a interval and the result on this interval
- zkw segment tree
- trie
- save storage space by sharing prefix
- huffman table
- save storage space by entropy
- asymmetric number system
- heap property, priority queue
- a tree with parent node larger or equal to child node (max heap)
- easier to maintain when only relative order is required
- insert at child node and push up
- remove at root node, place leaf node at root and push down
- std::make_heap, push_heap, pop_heap, sort_heap, is_heap
subarray sum
- keep the array and calculate each time
O(1)
update,O(n)
query
- save cumulative sum
- we can find subarray sum by subtracting two cumulative sum
O(n)
update,O(1)
query
- segment tree
O(logn)
for both update and query
algorithm
basic ideas
- divide and conquer
- prove that recursion work at step 0 and step n + 1
- greedy
- prove optimal by proving that any other choice is equivalent or worse
- dynamic programming
- design a objective that is recurring
- solve the result by gradually solving the objective
- sliding window: maintain invariant between range
- usually done with two pointers
- useful for finding stuff in range
sort
- merged sort -> divide and conquer
- quick sort
- choose a pivot and partition based on pivot -> median of three
- recursively apply the steps to smaller sub-array
- need O(log(n)) due to recursive call -> tail call
- heap sort
- partition into a heap area and sorted area.
- heap property is maintained and root is moved to sorted area.
- insertion sort
- sort by inserting new value into a proper position in the sorted part.
- bucket sort
- sort by putting values into buckets
- sort bucket and read
graph theory
- adjacency matrix
- bfs / dfs
- shortest distance
- Dijkstra’s
- take the vertex with minimal distance and update its neighbors
- normal implementation $ O(V^2) $
- with adjacency list $ O(E \log{V}) $
- Bellman-Ford: works for graph with negative cycle $ O(\abs{V} \abs(E)) $
- repeat distance update for every edges for a total of V - 1 times i.e. let the weight propagate to all vertex
- check negative weight cycle by update once more, see if any path can be improved
- Floyd / SPFA
- Dijkstra’s
- minimal spanning tree
- a sub-graph that connects all the vertices together
- Kruskal -> pick V - 1 edges with minimal weights that does not form a cycle
- Prim
- basically Dijkstra’s method and keep tree structure
- union-find: group nodes given some criteria
- find: find node’s parent
- often involves flatten the tree by setting intermediate’s parent directly to root
- union: combine two tree with different parent
- find: find node’s parent
max flow
- model the problem as a graph with one source and one sink with the wanted value being the number of connection to src or sink
- construct the graph under problem constraint
- the value is thus known to be max
- min-cut / max-flow
string matching
- regex
- KMP / Boyer-Moore
- Rabin-Karp
side note
- sign bit can be used as a free marker
b tree
Properties of a B-tree of order m:
- Every node has at most m children
- Every node except the root node contain at least m/2 children.
- Root node contains at least two nodes if it is not a leaf node.
- A non-leaf node with k children contains k - 1 keys
- All leaves are at the same level.
(Note that the leaf node definition is different from Knuth’s definition. But follows other trees’ definition.)
At insertion, if we run out of space at the child node, we will split the child node and add the separator to parent. (possibly recursively)
At deletion, we might need to rotate element from siblings. Or merged with a sibling.
b+ tree
For B + tree, the separator is the first value in the right child node. Which leads to the following advantage:
- Because separator is presented in the children, only leaf nodes need to hold the data corresponding to the separator.
- Thus, more keys can be fit on a page of memory. Helps with search speed.
- The leaf node can be linked together to help with range query.
- ie tree branches are interconnected
tail call optimization
Normally, at function end, the stack frame needs to be popped. But if we have a function call at the end of stack, we can modified the frame directly for that function. Thus, reducing the amount of stack used.
tree stored in array
left = 2 * i + 1;
right = 2 * i + 2;
parent = (i - 1) / 2; -> use integer division
bit count
- Brian Kernighan’s algorithm
- if we loop on
n & (n - 1)
, we unset the right most bit in each iteration
- if we loop on
todo
Topological sorting Morris Manacher / suffix array Shell Sort token Bucket