Skip to content

Commit d931335

Browse files
committed
Tweaks to union-find
1 parent 65f66e7 commit d931335

File tree

6 files changed

+116
-76
lines changed

6 files changed

+116
-76
lines changed

README.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ If you're new to algorithms and data structures, here are a few good ones to sta
4343
- [Select Minimum / Maximum](Select Minimum Maximum). Find the minimum/maximum value in an array.
4444
- [k-th Largest Element](Kth Largest Element/). Find the *k*th largest element in an array, such as the median.
4545
- [Selection Sampling](Selection Sampling/). Randomly choose a bunch of items from a collection.
46-
- Union-Find
46+
- [Union-Find](Union-Find/). Keeps track of disjoint sets and lets you quickly merge them.
4747

4848
### String Search
4949

Union-Find/README.markdown

Lines changed: 110 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,59 +1,88 @@
1-
# Union-Find data structure
1+
# Union-Find
22

3-
Union-Find data structure (also known as disjoint-set data structure) is data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports three basic operations:
4-
1. Find(**A**): Determine which subset an element **A** is in
5-
2. Union(**A**, **B**): Join two subsets that contain **A** and **B** into a single subset
6-
3. AddSet(**A**): Add a new subset containing just that element **A**
3+
Union-Find is a data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It is also known as disjoint-set data structure.
74

8-
The most common application of this data structure is keeping track of the connected components of an undirected graph. It is also used for implementing efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
5+
What do we mean by this? For example, the Union-Find data structure could be keeping track of the following sets:
6+
7+
[ a, b, f, k ]
8+
[ e ]
9+
[ g, d, c ]
10+
[ i, j ]
11+
12+
These sets are disjoint because they have no members in common.
13+
14+
Union-Find supports three basic operations:
15+
16+
1. **Find(A)**: Determine which subset an element **A** is in. For example, `find(d)` would return the subset `[ g, d, c ]`.
17+
18+
2. **Union(A, B)**: Join two subsets that contain **A** and **B** into a single subset. For example, `union(d, j)` would combine `[ g, d, c ]` and `[ i, j ]` into the larger set `[ g, d, c, i, j ]`.
19+
20+
3. **AddSet(A)**: Add a new subset containing just that element **A**. For example, `addSet(h)` would add a new set `[ h ]`.
21+
22+
The most common application of this data structure is keeping track of the connected components of an undirected [graph](../Graph/). It is also used for implementing an efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
923

1024
## Implementation
1125

1226
Union-Find can be implemented in many ways but we'll look at the most efficient.
1327

14-
Every Union-Find data structure is just value of type `UnionFind`
15-
1628
```swift
1729
public struct UnionFind<T: Hashable> {
18-
private var index = [T:Int]()
30+
private var index = [T: Int]()
1931
private var parent = [Int]()
2032
private var size = [Int]()
2133
}
2234
```
2335

24-
Our Union-Find data structure is actually a forest where each subset represented by a [tree](../Tree/). For our purposes we only need to keep parent of each node. To do this we use array `parent` where `parent[i]` is index of parent of node with number **i**. In a that forest, the unique number of each subset is the index of value of root of that subset's tree.
36+
Our Union-Find data structure is actually a forest where each subset is represented by a [tree](../Tree/).
37+
38+
For our purposes we only need to keep track of the parent of each tree node, not the node's children. To do this we use the array `parent` so that `parent[i]` is the index of node `i`'s parent.
2539

26-
So let's look at the implementation of basic operations:
40+
Example: If `parent` looks like this,
2741

28-
### Add set
42+
parent [ 1, 1, 1, 0, 2, 0, 6, 6, 6 ]
43+
i 0 1 2 3 4 5 6 7 8
44+
45+
then the tree structure looks like:
46+
47+
1 6
48+
/ \ / \
49+
0 2 7 8
50+
/ \ /
51+
3 5 4
52+
53+
There are two trees in this forest, each of which corresponds to one set of elements. (Note: due to the limitations of ASCII art the trees are shown here as binary trees but that is not necessarily the case.)
54+
55+
We give each subset a unique number to identify it. That number is the index of the root node of that subset's tree. In the example, node `1` is the root of the first tree and `6` is the root of the second tree.
56+
57+
Note that the `parent[]` of a root node points to itself. So `parent[1] = 1` and `parent[6] = 6`. That's how we can tell something is a root node.
58+
59+
So in this example we have two subsets, the first with the label `1` and the second with the label `6`. The **Find** operation actually returns the set's label, not its contents.
60+
61+
## Add set
62+
63+
Let's look at the implementation of these basic operations, starting with adding a new set.
2964

3065
```swift
3166
public mutating func addSetWith(element: T) {
3267
index[element] = parent.count // 1
33-
parent.append(parent.count) //2
34-
size.append(1) // 3
68+
parent.append(parent.count) // 2
69+
size.append(1) // 3
3570
}
3671
```
3772

38-
1. We save index of new element in `index` dictionary because we need `parent` array only containing values in range 0..<parent.count.
73+
When you add a new element, this actually adds a new subset containing just that element.
3974

40-
2. Then we add that index to `parent` array. It's pointing itself because the tree that represent new set containing only one node which obviously is a root of that tree.
75+
1. We save the index of the new element in the `index` dictionary. That lets us look up the element quickly later on.
4176

42-
3. `size[i]` is a count of nodes in tree which root is node with number `i` We'll be using that in Union method.
77+
2. Then we add that index to the `parent` array to build a new tree for this set. Here, `parent[i]` is pointing to itself because the tree that represents the new set contains only one node, which of course is the root of that tree.
4378

79+
3. `size[i]` is the count of nodes in the tree whose root is at index `i`. For the new set this is 1 because it only contains the one element. We'll be using the `size` array in the Union operation.
4480

45-
### Find
81+
## Find
4682

47-
```swift
48-
private mutating func setByIndex(index: Int) -> Int {
49-
if parent[index] == index { // 1
50-
return index
51-
} else {
52-
parent[index] = setByIndex(parent[index]) // 2
53-
return parent[index] // 3
54-
}
55-
}
83+
Often we want to determine whether we already have a set that contains a given element. That's what the **Find** operation does. In our `UnionFind` data structure it is called `setOf()`:
5684

85+
```swift
5786
public mutating func setOf(element: T) -> Int? {
5887
if let indexOfElement = index[element] {
5988
return setByIndex(indexOfElement)
@@ -63,36 +92,66 @@ public mutating func setOf(element: T) -> Int? {
6392
}
6493
```
6594

66-
`setOf(element: T)` is a helper method to get index corresponding to `element` and if it exists we return value of actual method `setByIndex(index: Int)`
95+
This looks up the element's index in the `index` dictionary and then uses a helper method to find the set that this element belongs to:
96+
97+
```swift
98+
private mutating func setByIndex(index: Int) -> Int {
99+
if parent[index] == index { // 1
100+
return index
101+
} else {
102+
parent[index] = setByIndex(parent[index]) // 2
103+
return parent[index] // 3
104+
}
105+
}
106+
```
107+
108+
Because we're dealing with a tree structure, this is a recursive method.
67109

68-
1. First, we check if current index represent a node that is root. That means we find number that represent the set of element we search for.
110+
Recall that each set is represented by a tree and that the index of the root node serves as the number that identifies the set. We're going to find the root node of the tree that the element we're searching for belongs to, and return its index.
69111

70-
2. Otherwise we recursively call our method on parent of current node. And then we do **very important thing**: we cache index of root node, so when we call this method again it will executed faster because of cached indexes. Without that optimization method's complexity is **O(n)** but now in combination with the size optimization (I'll cover that in Union section) it is almost **O(1)**.
112+
1. First, we check if the given index represents a root node (i.e. a node whose `parent` points back to the node itself). If so, we're done.
71113

72-
3. We return our cached root as result.
114+
2. Otherwise we recursively call this method on the parent of the current node. And then we do a **very important thing**: we overwrite the parent of the current node with the index of root node, in effect reconnecting the node directly to the root of the tree. The next time we call this method, it will execute faster because the path to the root of the tree is now much shorter. Without that optimization, this method's complexity is **O(n)** but now in combination with the size optimization (covered in the Union section) it is almost **O(1)**.
73115

74-
Here's illustration of what I mean
116+
3. We return the index of the root node as the result.
75117

76-
Before first call `setOf(4)`:
118+
Here's illustration of what I mean. Let's say the tree looks like this:
77119

78120
![BeforeFind](Images/BeforeFind.png)
79121

80-
After:
122+
We call `setOf(4)`. To find the root node we have to first go to node `2` and then to node `7`. (The indexes of the elements are marked in red.)
123+
124+
During the call to `setOf(4)`, the tree is reorganized to look like this:
81125

82126
![AfterFind](Images/AfterFind.png)
83127

84-
Indexes of elements are marked in red.
128+
Now if we need to call `setOf(4)` again, we no longer have to go through node `2` to get to the root. So as you use the Union-Find data structure, it optimizes itself. Pretty cool!
85129

130+
There is also a helper method to check that two elements are in the same set:
86131

87-
### Union
132+
```swift
133+
public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
134+
if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
135+
return firstSet == secondSet
136+
} else {
137+
return false
138+
}
139+
}
140+
```
141+
142+
Since this calls `setOf()` it also optimizes the tree.
143+
144+
## Union
145+
146+
The final operation is **Union**, which combines two sets into one larger set.
88147

89148
```swift
90149
public mutating func unionSetsContaining(firstElement: T, and secondElement: T) {
91150
if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) { // 1
92-
if firstSet != secondSet { // 2
151+
if firstSet != secondSet { // 2
93152
if size[firstSet] < size[secondSet] { // 3
94-
parent[firstSet] = secondSet // 4
95-
size[secondSet] += size[firstSet] // 5
153+
parent[firstSet] = secondSet // 4
154+
size[secondSet] += size[firstSet] // 5
96155
} else {
97156
parent[secondSet] = firstSet
98157
size[firstSet] += size[secondSet]
@@ -102,50 +161,34 @@ public mutating func unionSetsContaining(firstElement: T, and secondElement: T)
102161
}
103162
```
104163

105-
1. We find sets of each element.
164+
Here is how it works:
106165

107-
2. Check that sets are not equal because if they are it makes no sense to union them.
166+
1. We find the sets that each element belongs to. Remember that this gives us two integers: the indices of the root nodes in the `parent` array.
108167

109-
3. This is where our size optimization comes in. We want to keep trees as small as possible so we always attach the smaller tree to the root of the larger tree. To determine small tree we compare trees by its sizes.
168+
2. Check that the sets are not equal because if they are it makes no sense to union them.
110169

111-
4. Here we attach the smaller tree to the root of the larger tree.
112-
113-
5. We keep sizes of trees in actual states so we update size of larger tree.
170+
3. This is where the size optimization comes in. We want to keep the trees as shallow as possible so we always attach the smaller tree to the root of the larger tree. To determine which is the smaller tree we compare trees by their sizes.
114171

115-
Union with optimizations also takes almost **O(1)** time.
172+
4. Here we attach the smaller tree to the root of the larger tree.
116173

117-
As always, some illustrations for better understanding
174+
5. Update the size of larger tree because it just had a bunch of nodes added to it.
118175

119-
Before calling `unionSetsContaining(4, and: 3)`:
176+
An illustration may help to better understand this. Let's say we have these two sets, each with its own tree:
120177

121178
![BeforeUnion](Images/BeforeUnion.png)
122179

123-
After:
180+
Now we call `unionSetsContaining(4, and: 3)`. The smaller tree is attached to the larger one:
124181

125182
![AfterUnion](Images/AfterUnion.png)
126183

127-
Note that during union caching optimization was performed because of calling `setOf` in the beginning of method.
184+
Note that, because we call `setOf()` in the beginning of the method, the larger tree was also optimized in the process -- node `3` now hangs directly off the root.
128185

186+
Union with optimizations also takes almost **O(1)** time.
129187

130-
131-
There is also helper method to just check that two elements is in the same set:
132-
133-
```swift
134-
public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
135-
if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
136-
return firstSet == secondSet
137-
} else {
138-
return false
139-
}
140-
}
141-
```
142-
188+
## See also
143189

144190
See the playground for more examples of how to use this handy data structure.
145191

146-
147-
## See also
148-
149-
[Union-Find at wikipedia](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)
192+
[Union-Find at Wikipedia](https://en.wikipedia.org/wiki/Disjoint-set_data_structure)
150193

151194
*Written for Swift Algorithm Club by [Artur Antonov](https://github.com/goingreen)*

Union-Find/UnioinFind.playground/Contents.swift renamed to Union-Find/UnionFind.playground/Contents.swift

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,10 @@
11
//: Playground - noun: a place where people can play
22

33
public struct UnionFind<T: Hashable> {
4-
5-
private var index = [T:Int]()
4+
private var index = [T: Int]()
65
private var parent = [Int]()
76
private var size = [Int]()
87

9-
108
public mutating func addSetWith(element: T) {
119
index[element] = parent.count
1210
parent.append(parent.count)
@@ -54,6 +52,8 @@ public struct UnionFind<T: Hashable> {
5452
}
5553

5654

55+
56+
5757
var dsu = UnionFind<Int>()
5858

5959
for i in 1...10 {

Union-Find/UnionFind.swift

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,11 @@
77
union sets is almost O(1)
88
*/
99

10-
1110
public struct UnionFind<T: Hashable> {
12-
13-
private var index = [T:Int]()
11+
private var index = [T: Int]()
1412
private var parent = [Int]()
1513
private var size = [Int]()
1614

17-
1815
public mutating func addSetWith(element: T) {
1916
index[element] = parent.count
2017
parent.append(parent.count)
@@ -51,7 +48,7 @@ public struct UnionFind<T: Hashable> {
5148
}
5249
}
5350
}
54-
51+
5552
public mutating func inSameSet(firstElement: T, and secondElement: T) -> Bool {
5653
if let firstSet = setOf(firstElement), secondSet = setOf(secondElement) {
5754
return firstSet == secondSet

0 commit comments

Comments
 (0)