operation ArrayList LinkedList add(x) O(1) amortized O(1) (provided a tail pointer) get(i) O(1) random access O(n) removeFirst() O(n) shift to left O(1) (remove(0)) removeLast() O(1) (numElements--) O(1) (doubly linked list) (remove(size()-1)) add(i,x): O(n) O(n) 1. find i O(1) O(n) 2. add O(n) O(1) -------------------------------------------------------------------------------------------------- Implementing a priority queue 1. using an unsorted array/array list add(x) --> add x to the end of the array O(1) amortized time peek() --> scan the array to find the min O(n) time poll() --> find min, then remove it O(n) + O(n) = O(n) 2. unsorted array with maintaining the min add(x) --> add x to the end of the array O(1) amortized time if x < min, set min = index(x) O(1) time peek() --> return index(min) O(1) time poll() --> find the min O(1) time remove(min) O(n) time find the next min O(n) time 3. sorted array descending add(x) add x into the sorted order (run the inner loop of insertion sort) O(n) time peek() return array[size-1] O(1) time poll() size--; O(1) time What we want, ideally, is adding and polling to take O(log n) time. Use a binary minimum heap A binary tree is a structure where every node has at most 2 children and there is a unique path from any node to the root. children = the nodes underneath a node parent = the node above. root = the topmost node complete binary tree is a binary tree where every row is completely filled (from left to right) except the last row. binary minimum heap property says that for every node x in the tree, x<=its children (x>= its parent) A binary min heap is a complete binary tree that obeys the binary min heap property. ex: 3(0) 5(1) 7(2) 6(3) 8(4) 7(5) 8(6) 6(7) 10(8) 11(9) 14(10) 11(11) 8(12) 8(13) 8(14) peek(): O(1) time return the root add(x): O(h) where h = height of the tree put x at the end of the array (leftmost open spot in the tree) while x is not at the root and x< parent(x) swap(x, parent(x)) x = parent(x) poll(): O(h) swap the root with the last element delete the last element find the smallest of root, left(root), and right(root) //provided left(root) and right(root) exist if smallest = root then stop else swap(root, smallest) repeat this process with smallest as the new root left(x) = 2x + 1 right(x) = 2x+2 parent(x) = floor((x-1)/2) The running times for poll() and add() are O(h). But what is h in terms of n? n = 1 + 2 + 4 + 8 + ... + 2^(h-1) multiply by the common ratio (in this case, it's 2) 2n = 2 + 4 + 8 + ... + 2^(h-1) + 2^h 2n - n = n = 2^h - 1 -> n + 1 = 2^h -> log_2 (n+1) = h ------------------------------------------------ Heaps for sorting heapsort(arr, n) 1. take arr and make it into a max heap using the build heap algorithm while n>1 do 2. swap arr[0] with arr[n-1] // put the max at the end 3. n--; //ignore the max 4. fix the heap 5 12 3 8 1 20 5 4 1. turn the array into a max heap 20 12 5 8 1 3 5 4 2. repeatedly pop the heap D1 D3 D4 D5 D5 D8 D12 D20 total time O(n log n) Sorting algorithms: heapsort merge sort bubble sort selection sort insertion sort quicksort bubble sort 3 5 1 8 5 4 12 20 3 1 5 5 4 8 12 20 1 3 5 4 5 8 12 20 1 3 4 5 5 8 12 20 1 3 4 5 5 8 12 20 bubble sort obeys the following invariant: after p passes, the last p numbers are in sorted order. best case: already sorted. O(n) worst case: reverse sorted n-1 + n-2 + n-3 + ... + 1 = 1 + 2 + 3 + ... + n-1 = n(n-1)/2 = O(n^2) ------------------------------------------------------ insertion sort sorted ??? 0___________i-1 i ___________ 5 || 12 3 8 1 20 5 4 5 12 || 3 8 1 20 5 4 3 5 12 || 8 1 20 5 4 3 5 8 12 || 1 20 5 4 1 3 5 8 12 20 || 5 4 1 3 5 5 8 12 20 || 4 1 3 4 5 5 8 12 20 void insertionSort(int[] arr, int n) { for(int i=1; i= 0 && arr[loc] > key) { arr[loc+1] = arr[loc]; loc--; } arr[loc+1] = key; } } best case: already sorted O(n) because the while loop doesn't run ever. if every number is a constant distance from where it belongs. worst case: reverse sorted O(n^2) ---------------------------------------------------- 2 6 7 10 1 3 9 12 1 2 3 6 7 9 10 12 void merge(int[] arr, int start, int mid, int end) { //arr[start..mid-1] is sorted and so is arr[mid..end-1] int left = start; int right = mid; int i = 0; int[] temp = new int[end-start]; while(left n = 2^k --> k = log_2 n = n (1) + n log_2 n = O(n log n) ----------------------------------------------------------------------- Quicksort Partitioning: Given an array A[start..end], how can we prepare the array in such a way such that when we are done, we can just sort? Goal: Using some element inside the array called the pivot, move all elements <=pivot to the left and all elements >= pivot to the right. Once this is accomplished, all we need to do is sort the left and right parts. One possible method: <=pivot >=pivot ??? | | | | | | | | |b| | i | | | | | PS int partition(int[] arr, int start, int end) { int r = random.nextInt(end-start); swap(arr, start, r); int pivot = arr[start]; int boundary = start; for(int i=start+1; i set = new HashSet<>(); for(int x : A) set.add(x); return A.length != set.size(); for(int x: A) { if(set.contains(x)) return true; else set.add(x); } return false; 5 12 3 8 1 20 5 4 for(int x: A) { if(set.contains(23-x)) return true; else set.add(x); } return false; ------------------------------------------------------------------ A set is a data structure that models a mathematical set. I.e. a collection where all we are interested in is whether some element is in the collection or not. It's useful in programming since it allows us to remember things we've already seen, like above. Useful operations: 1. S.contains(x) : Is x an element of S? 2. S.add(x) : S = S U {x} 3. S.remove(x) : S = S \ {x} 4. S.addAll(T): S = S U T 5. S.removeAll(T): S = S \ T 6. S.retainAll(T): S = S intersect T Implementations: 1. TreeSet: The data are stored in a tree using a natural ordering (or a Comparator ordering) Pros: 1. iterating through the keys in a TreeSet yields a sorted order of the keys 2. Because of the underlying structure, we can query the set about max/min keys and the predecessor/successor of a given key in O(log n) time. Cons: 1. Keys must have an ordering 2. most operations are O(log n) worst case time which is generally slower than O(1) amortized expected. 2. HashSet: data are stored in a hash table based on prehash and hash function mappings. Pros: 1. Very fast (O(1) time amortized in expectation) insertions/deletions/searches 2. Can store any key imaginable (since all Objects in java have hash codes = pre hash) Cons: 1. Iteration produces haphazard results. 2. Cannot efficiently answer predecessor/successor queries. ------------------------------------------------------------------------ Maps A map is a data structure that allows us to associate one type of data (called the key) with another called the value. Searches are optimized based on the key. Structurally, maps are identical to sets, but store pairs of elements (based on the key) instead of just keys like a set. Implementations in java are TreeMap and HashMap, and are subject to the same pros and cons as TreeSets and HashSets. Important methods: M.containsKey(k) : is k a key in M M.get(k) : returns the value associated with k if M.containsKey(k) and null otherwise. (Recall: if you want to use get() in order to check membership of k in the key set of M, you cannot rely on autounboxing since it can be null) M.put(k, v) : Adds the pair (k,v) to the map M.remove(k) : Deletes the pair associated with k in the map M.keySet(): Returns a Set over all keys in M. Return type is Set. This allows iteration over M. M.entrySet(): Returns a Set over all pairs in M. Return type is Set>. --------------------------------------------------------- TreeSets are implemented using binary search trees. A binary search tree is a binary tree in which every node obeys the binary search tree property: For all nodes n, if x is a node in n's left subtree, then x.key <= n.key and if x is a node in n's right subtree, then x.key>=n.key 20 18 25 15 19 23 29 17 22 24 26 35 27 preorder: 20 18 15 17 19 25 23 22 24 29 26 27 35 inorder: 15 17 18 19 20 22 23 24 25 26 27 29 35 postorder: 17 15 19 18 22 24 23 27 26 35 29 25 20 class TreeNode { int key; TreeNode left; TreeNode right; } public static TreeNode search(TreeNode root, int key) { if(root==null) return null; if(root.key == key) return root; if(key < root.key) return search(root.left, key); else return search(root.right, key); } public static TreeNode insert(TreeNode root, int target) { if(root==null) { root = new TreeNode(); root.key = target; root.left = null; root.right = null; }else if(target < root.key) { root.left = insert(root.left, target) }else { root.right = insert(root.right, target); } return root; } preorder = root first public static void printNLR (TreeNode root) { if(root==null) return; System.out.println(root.key); printNLR (root.left); print NLR(root.right); } inorder = look at root in the middle public static void printLNR (TreeNode root) { if(root==null) return; printLNR(root.left); System.out.println(root.key); printLNR(root.right); } postorder = look at root at the end public static void printLRN (TreeNode root) { if(root==null) return; printLRN(root.left); printLRN(root.right); System.out.println(root.key); } public static int sum(TreeNode root) { if(root==null) return 0; return root.key + sum(root.left) + sum(root.right); } 1 2 3 4 5 6 StringBuilder sb = new StringBuilder(); for(E e: this) sb.append(e + " "); return sb.toString(); if(root==null) return false; if(value == root.value) return true; return containsValue(root.left) || containsValue(root.right); boolean ans = false; ArrayStack saved = new ArrayStack<>(); while(!stack.isEmpty()) { if(stack.peek()==target) { ans = true; break; }else { saved.push(stack.pop()); } } while(!saved.isEmpty()) stack.push(saved.pop()); return ans; --------------------------------------------------------------------------- Hash tables By way of introduction, suppose that I wanted to implement a set, and I knew that the keys could only be in the set {0, ..., 100}. How could I do this really efficiently? Ans: A direct access table = A direct access table is an array T[0..n] of boolean values such that T[i] = true if i is in the set and T[i] = false if i is not in the set. constructor(): set all T[i] to false add(x): T[x] = true contains(x): return T[x] remove(x): T[x] = false Is this generally realistic? for small n, sure. for large n, definitely not. How can we leverage this idea? Create a smaller table. To make our keys fit into the table, do key % table size. Called the division method. add(14) add(63) add(150) add(75) add(10) 0 Hello 1 2 3 63 4 5 6 7 8 9 10 150 -> 10 11 12 13 14 14 15 75 16 17 18 19 But there is a major problem = collisions! Resolutions: chaining probing public class OurClass { private char ch; OurClass(char ch) { this.ch = ch; } public int hashCode() { return (ch - 'A') + 10; } public boolean equals(Object o) { if(! (o instanceof OurClass)) return false; OurClass rhs = (OurClass)o; return ch == rhs.ch; } } OurClass o1 = new OurClass('A'); OurClass o2 = new OurClass('Z'); --------------------------------------------- An undirected graph G is an ordered pair (V, E) (G = (V,E) ) V = a set of vertices (nodes) E = a set of edges (arcs) where an edge is a set of 2 vertices, For the undirected graph: V = {A, B, C, D, E} E = {{A, B}, {A, C}, {B,C}, {B,D}, {B,E}, {C,E}, {D,E}} A directed graph G is an ordered pair (V, E) (G = (V,E) ) V = a set of vertices (nodes) E = a set of edges (arcs) where an edge is an ordered pair of vertices. For the directed graph, V = {A, B, C, D, E, F} E = {(A,B), (A,D), (B,C), (C,D), (F,A), (E,F), (F,E)} Abuse of notation: In reality, in an undirected graph edges are unordered pairs, i.e. doubleton sets. So the correct notation is as above. However, many textbooks and academic papers will use the parenthesis notation for both, even though not technically correct. Many examples abound. (e.g. O(1), f(x,y) etc.) Definition: We say that a vertex v is adjacent to a vertex u if there is an edge from u to v. I.e. if (u,v) is an edge, we say that v is adjacent to u. Definition: The adjacency matrix of a grsph G=(V,E) is a 2D array Adj[|V|, |V|] where Adj[i,j] = 1 if (i,j) is an edge and 0 if not. undirected graph A B C D E A 0 1 1 0 0 B 1 0 1 1 1 C 1 1 0 0 1 D 0 1 0 0 1 E 0 1 1 1 0 directed graph: A B C D E F A 0 1 0 1 0 0 B 0 0 1 0 0 0 C 0 0 0 1 0 0 D 0 0 0 0 0 0 E 0 0 0 0 0 1 F 1 0 0 0 1 0 undirected: A = B- >C B = A->C->D->E C = A->B->E D = B-> E E = C -> B-> D directed: A = B -> D B = C C = D D = null E = F F = A -> E