Tweaks to union-find

hollance · hollance · commit d9313352c34b · 2016-02-13T15:41:48.000+01:00
diff --git a/README.markdown b/README.markdown
@@ -43,7 +43,7 @@ If you're new to algorithms and data structures, here are a few good ones to sta
 - [Select Minimum / Maximum](Select Minimum Maximum). Find the minimum/maximum value in an array.
 - [k-th Largest Element](Kth Largest Element/). Find the *k*th largest element in an array, such as the median.
 - [Selection Sampling](Selection Sampling/). Randomly choose a bunch of items from a collection.
-- Union-Find
+- [Union-Find](Union-Find/). Keeps track of disjoint sets and lets you quickly merge them.
 
 ### String Search
 
diff --git a/Union-Find/README.markdown b/Union-Find/README.markdown
@@ -1,59 +1,88 @@
-# Union-Find data structure
+# Union-Find
 
-Union-Find data structure (also known as disjoint-set data structure) is data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports three basic operations:
-  1. Find(**A**): Determine which subset an element **A** is in
-  2. Union(**A**, **B**): Join two subsets that contain **A** and **B** into a single subset
-  3. AddSet(**A**): Add a new subset containing just that element **A**
+Union-Find is a data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It is also known as disjoint-set data structure.
 
-The most common application of this data structure is keeping track of the connected components of an undirected graph. It is also used for implementing efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
+What do we mean by this? For example, the Union-Find data structure could be keeping track of the following sets:
+
+	[ a, b, f, k ]
+	[ e ]
+	[ g, d, c ]
+	[ i, j ]
+
+These sets are disjoint because they have no members in common. 
+
+Union-Find supports three basic operations:
+
+1. **Find(A)**: Determine which subset an element **A** is in. For example, `find(d)` would return the subset `[ g, d, c ]`.
+
+2. **Union(A, B)**: Join two subsets that contain **A** and **B** into a single subset. For example, `union(d, j)` would combine `[ g, d, c ]` and `[ i, j ]` into the larger set `[ g, d, c, i, j ]`.
+
+3. **AddSet(A)**: Add a new subset containing just that element **A**. For example, `addSet(h)` would add a new set `[ h ]`.
+
+The most common application of this data structure is keeping track of the connected components of an undirected [graph](../Graph/). It is also used for implementing an efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
 
 ## Implementation
 
 Union-Find can be implemented in many ways but we'll look at the most efficient.
 
-Every Union-Find data structure is just value of type `UnionFind`
-
 ```swift
 public struct UnionFind<T: Hashable> {
-  private var index = [T:Int]()
+  private var index = [T: Int]()
   private var parent = [Int]()
   private var size = [Int]()
 }
 ```
 
-Our Union-Find data structure is actually a forest where each subset represented by a [tree](../Tree/). For our purposes we only need to keep parent of each node. To do this we use array `parent` where `parent[i]` is index of parent of node with number **i**. In a that forest, the unique number of each subset is the index of value of root of that subset's tree.
+Our Union-Find data structure is actually a forest where each subset is represented by a [tree](../Tree/).
+
+For our purposes we only need to keep track of the parent of each tree node, not the node's children. To do this we use the array `parent` so that `parent[i]` is the index of node `i`'s parent.
 
-So let's look at the implementation of basic operations:
+Example: If `parent` looks like this,
 
-### Add set
+	parent [ 1, 1, 1, 0, 2, 0, 6, 6, 6 ]
+	     i   0  1  2  3  4  5  6  7  8
+	
+then the tree structure looks like:
+	
+	      1              6
+	    /   \           / \
+	  0       2        7   8
+	 / \     /
+	3   5   4
+
+There are two trees in this forest, each of which corresponds to one set of elements. (Note: due to the limitations of ASCII art the trees are shown here as binary trees but that is not necessarily the case.)
+
+We give each subset a unique number to identify it. That number is the index of  the root node of that subset's tree. In the example, node `1` is the root of the first tree and `6` is the root of the second tree.
+
+Note that the `parent[]` of a root node points to itself. So `parent[1] = 1` and `parent[6] = 6`. That's how we can tell something is a root node.
+
+So in this example we have two subsets, the first with the label `1` and the second with the label `6`. The **Find** operation actually returns the set's label, not its contents.
+
+## Add set
+
+Let's look at the implementation of these basic operations, starting with adding a new set.
 
 ```swift
 public mutating func addSetWith(element: T) {
   index[element] = parent.count  // 1
-  parent.append(parent.count)  //2
-  size.append(1)  // 3
+  parent.append(parent.count)    // 2
+  size.append(1)                 // 3
 }
 ```
 
-1. We save index of new element in `index` dictionary because we need `parent` array only containing values in range 0..<parent.count.
+When you add a new element, this actually adds a new subset containing just that element.
 
-2. Then we add that index to `parent` array. It's pointing itself because the tree that represent new set containing only one node which obviously is a root of that tree.
+1. We save the index of the new element in the `index` dictionary. That lets us look up the element quickly later on.
 
-3. `size[i]` is a count of nodes in tree which root is node with number `i` We'll be using that in Union method.
+2. Then we add that index to the `parent` array to build a new tree for this  set. Here, `parent[i]` is pointing to itself because the tree that represents the new set contains only one node, which of course is the root of that tree.
 
+3. `size[i]` is the count of nodes in the tree whose root is at index `i`. For the new set this is 1 because it only contains the one element. We'll be using the `size` array in the Union operation.
 
-### Find
+## Find
 
-```swift
-private mutating func setByIndex(index: Int) -> Int {
-  if parent[index] == index {  // 1
-    return index
-  } else {
-    parent[index] = setByIndex(parent[index])  // 2
-    return parent[index]  // 3
-  }
-}
+Often we want to determine whether we already have a set that contains a given element. That's what the **Find** operation does. In our `UnionFind` data structure it is called `setOf()`:
 
+```swift
 public mutating func setOf(element: T) -> Int? {
   if let indexOfElement = index[element] {
     return setByIndex(indexOfElement)
@@ -63,36 +92,66 @@ public mutating func setOf(element: T) -> Int? {
 }
 ```
 
-`setOf(element: T)` is a helper method to get index corresponding to `element` and if it exists we return value of actual method `setByIndex(index: Int)`
+This looks up the element's index in the `index` dictionary and then uses a helper method to find the set that this element belongs to:
+
+```swift
+private mutating func setByIndex(index: Int) -> Int {
+  if parent[index] == index {  // 1
+    return index
+  } else {
+    parent[index] = setByIndex(parent[index])  // 2
+    return parent[index]       // 3
+  }
+}
+```
+
+Because we're dealing with a tree structure, this is a recursive method.
 
-1. First, we check if current index represent a node that is root. That means we find number that represent the set of element we search for.
+Recall that each set is represented by a tree and that the index of the root node serves as the number that identifies the set. We're going to find the root node of the tree that the element we're searching for belongs to, and return its index.
 
-2. Otherwise we recursively call our method on parent of current node. And then we do **very important thing**: we cache index of root node, so when we call this method again it will executed faster because of cached indexes. Without that optimization method's complexity is **O(n)** but now in combination with the size optimization (I'll cover that in Union section) it is almost **O(1)**.
+1. First, we check if the given index represents a root node (i.e. a node whose `parent` points back to the node itself). If so, we're done. 
 
-3. We return our cached root as result.
+2. Otherwise we recursively call this method on the parent of the current node. And then we do a **very important thing**: we overwrite the parent of the current node with the index of root node, in effect reconnecting the node directly to the root of the tree. The next time we call this method, it will execute faster because the path to the root of the tree is now much shorter. Without that optimization, this method's complexity is **O(n)** but now in combination with the size optimization (covered in the Union section) it is almost **O(1)**.
 
-Here's illustration of what I mean
+3. We return the index of the root node as the result.
 
-Before first call `setOf(4)`:
+Here's illustration of what I mean. Let's say the tree looks like this:
 
 ![BeforeFind](Images/BeforeFind.png)
 
-After:
+We call `setOf(4)`. To find the root node we have to first go to node `2` and then to node `7`. (The indexes of the elements are marked in red.)
+
+During the call to `setOf(4)`, the tree is reorganized to look like this:
 
 ![AfterFind](Images/AfterFind.png)
 
-Indexes of elements are marked in red.
+Now if we need to call `setOf(4)` again, we no longer have to go through node `2` to get to the root. So as you use the Union-Find data structure, it optimizes itself. Pretty cool!
 
+There is also a helper method to check that two elements are in the same set:
 
-### Union
+```swift
+public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
+  if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
+    return firstSet == secondSet
+  } else {
+    return false
+  }
+}
+```
+
+Since this calls `setOf()` it also optimizes the tree.
+
+## Union
+
+The final operation is **Union**, which combines two sets into one larger set.
 
 ```swift
 public mutating func unionSetsContaining(firstElement: T, and secondElement: T) {
   if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {  // 1
-    if firstSet != secondSet {  // 2
+    if firstSet != secondSet {               // 2
       if size[firstSet] < size[secondSet] {  // 3
-        parent[firstSet] = secondSet  // 4
-        size[secondSet] += size[firstSet]  // 5
+        parent[firstSet] = secondSet         // 4
+        size[secondSet] += size[firstSet]    // 5
       } else {
         parent[secondSet] = firstSet
         size[firstSet] += size[secondSet]
@@ -102,50 +161,34 @@ public mutating func unionSetsContaining(firstElement: T, and secondElement: T)
 }
 ```
 
-1. We find sets of each element.
+Here is how it works:
 
-2. Check that sets are not equal because if they are it makes no sense to union them.
+1. We find the sets that each element belongs to. Remember that this gives us two integers: the indices of the root nodes in the `parent` array.
 
-3. This is where our size optimization comes in. We want to keep trees as small as possible so we always attach the smaller tree to the root of the larger tree. To determine small tree we compare trees by its sizes.
+2. Check that the sets are not equal because if they are it makes no sense to union them.
 
-4. Here we attach the smaller tree to the root of the larger tree.
-
-5. We keep sizes of trees in actual states so we update size of larger tree.
+3. This is where the size optimization comes in. We want to keep the trees as shallow as possible so we always attach the smaller tree to the root of the larger tree. To determine which is the smaller tree we compare trees by their sizes.
 
-Union with optimizations also takes almost **O(1)** time.
+4. Here we attach the smaller tree to the root of the larger tree.
 
-As always, some illustrations for better understanding
+5. Update the size of larger tree because it just had a bunch of nodes added to it.
 
-Before calling `unionSetsContaining(4, and: 3)`:
+An illustration may help to better understand this. Let's say we have these two sets, each with its own tree:
 
 ![BeforeUnion](Images/BeforeUnion.png)
 
-After:
+Now we call `unionSetsContaining(4, and: 3)`. The smaller tree is attached to the larger one:
 
 ![AfterUnion](Images/AfterUnion.png)
 
-Note that during union caching optimization was performed because of calling `setOf` in the beginning of method.
+Note that, because we call `setOf()` in the beginning of the method, the larger tree was also optimized in the process -- node `3` now hangs directly off the root.
 
+Union with optimizations also takes almost **O(1)** time.
 
-
-There is also helper method to just check that two elements is in the same set:
-
-```swift
-public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
-  if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
-    return firstSet == secondSet
-  } else {
-    return false
-  }
-}
-```
-
+## See also
 
 See the playground for more examples of how to use this handy data structure.
 
-
-## See also
-
-[Union-Find at wikipedia](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)
+[Union-Find at Wikipedia](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)
 
 *Written for Swift Algorithm Club by [Artur Antonov](https://github.com/goingreen)*
diff --git a/Union-Find/UnionFind.playground/Contents.swift b/Union-Find/UnionFind.playground/Contents.swift
@@ -1,12 +1,10 @@
 //: Playground - noun: a place where people can play
 
 public struct UnionFind<T: Hashable> {
-  
-  private var index = [T:Int]()
+  private var index = [T: Int]()
   private var parent = [Int]()
   private var size = [Int]()
   
-  
   public mutating func addSetWith(element: T) {
     index[element] = parent.count
     parent.append(parent.count)
@@ -54,6 +52,8 @@ public struct UnionFind<T: Hashable> {
 }
 
 
+
+
 var dsu = UnionFind<Int>()
 
 for i in 1...10 {
diff --git a/Union-Find/UnionFind.playground/contents.xcplayground b/Union-Find/UnionFind.playground/contents.xcplayground
diff --git a/Union-Find/UnionFind.playground/timeline.xctimeline b/Union-Find/UnionFind.playground/timeline.xctimeline
diff --git a/Union-Find/UnionFind.swift b/Union-Find/UnionFind.swift
@@ -7,14 +7,11 @@
     union sets is almost O(1)
 */
 
-
 public struct UnionFind<T: Hashable> {
-  
-  private var index = [T:Int]()
+  private var index = [T: Int]()
   private var parent = [Int]()
   private var size = [Int]()
   
-  
   public mutating func addSetWith(element: T) {
     index[element] = parent.count
     parent.append(parent.count)
@@ -51,7 +48,7 @@ public struct UnionFind<T: Hashable> {
       }
     }
   }
-  
+
   public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
     if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
       return firstSet == secondSet