You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Union-Find data structure (also known as disjoint-set data structure) is data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It supports three basic operations:
4
-
1. Find(**A**): Determine which subset an element **A** is in
5
-
2. Union(**A**, **B**): Join two subsets that contain **A** and **B** into a single subset
6
-
3. AddSet(**A**): Add a new subset containing just that element **A**
3
+
Union-Find is a data structure that can keep track of a set of elements partitioned into a number of disjoint (non-overlapping) subsets. It is also known as disjoint-set data structure.
7
4
8
-
The most common application of this data structure is keeping track of the connected components of an undirected graph. It is also used for implementing efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
5
+
What do we mean by this? For example, the Union-Find data structure could be keeping track of the following sets:
6
+
7
+
[ a, b, f, k ]
8
+
[ e ]
9
+
[ g, d, c ]
10
+
[ i, j ]
11
+
12
+
These sets are disjoint because they have no members in common.
13
+
14
+
Union-Find supports three basic operations:
15
+
16
+
1.**Find(A)**: Determine which subset an element **A** is in. For example, `find(d)` would return the subset `[ g, d, c ]`.
17
+
18
+
2.**Union(A, B)**: Join two subsets that contain **A** and **B** into a single subset. For example, `union(d, j)` would combine `[ g, d, c ]` and `[ i, j ]` into the larger set `[ g, d, c, i, j ]`.
19
+
20
+
3.**AddSet(A)**: Add a new subset containing just that element **A**. For example, `addSet(h)` would add a new set `[ h ]`.
21
+
22
+
The most common application of this data structure is keeping track of the connected components of an undirected [graph](../Graph/). It is also used for implementing an efficient version of Kruskal's algorithm to find the minimum spanning tree of a graph.
9
23
10
24
## Implementation
11
25
12
26
Union-Find can be implemented in many ways but we'll look at the most efficient.
13
27
14
-
Every Union-Find data structure is just value of type `UnionFind`
15
-
16
28
```swift
17
29
publicstructUnionFind<T: Hashable> {
18
-
privatevar index = [T:Int]()
30
+
privatevar index = [T:Int]()
19
31
privatevar parent = [Int]()
20
32
privatevar size = [Int]()
21
33
}
22
34
```
23
35
24
-
Our Union-Find data structure is actually a forest where each subset represented by a [tree](../Tree/). For our purposes we only need to keep parent of each node. To do this we use array `parent` where `parent[i]` is index of parent of node with number **i**. In a that forest, the unique number of each subset is the index of value of root of that subset's tree.
36
+
Our Union-Find data structure is actually a forest where each subset is represented by a [tree](../Tree/).
37
+
38
+
For our purposes we only need to keep track of the parent of each tree node, not the node's children. To do this we use the array `parent` so that `parent[i]` is the index of node `i`'s parent.
25
39
26
-
So let's look at the implementation of basic operations:
40
+
Example: If `parent` looks like this,
27
41
28
-
### Add set
42
+
parent [ 1, 1, 1, 0, 2, 0, 6, 6, 6 ]
43
+
i 0 1 2 3 4 5 6 7 8
44
+
45
+
then the tree structure looks like:
46
+
47
+
1 6
48
+
/ \ / \
49
+
0 2 7 8
50
+
/ \ /
51
+
3 5 4
52
+
53
+
There are two trees in this forest, each of which corresponds to one set of elements. (Note: due to the limitations of ASCII art the trees are shown here as binary trees but that is not necessarily the case.)
54
+
55
+
We give each subset a unique number to identify it. That number is the index of the root node of that subset's tree. In the example, node `1` is the root of the first tree and `6` is the root of the second tree.
56
+
57
+
Note that the `parent[]` of a root node points to itself. So `parent[1] = 1` and `parent[6] = 6`. That's how we can tell something is a root node.
58
+
59
+
So in this example we have two subsets, the first with the label `1` and the second with the label `6`. The **Find** operation actually returns the set's label, not its contents.
60
+
61
+
## Add set
62
+
63
+
Let's look at the implementation of these basic operations, starting with adding a new set.
29
64
30
65
```swift
31
66
publicmutatingfuncaddSetWith(element: T) {
32
67
index[element] = parent.count// 1
33
-
parent.append(parent.count) //2
34
-
size.append(1) // 3
68
+
parent.append(parent.count) //2
69
+
size.append(1) // 3
35
70
}
36
71
```
37
72
38
-
1. We save index of new element in `index` dictionary because we need `parent` array only containing values in range 0..<parent.count.
73
+
When you add a new element, this actually adds a new subset containing just that element.
39
74
40
-
2. Then we add that index to `parent` array. It's pointing itself because the tree that represent new set containing only one node which obviously is a root of that tree.
75
+
1. We save the index of the new element in the `index` dictionary. That lets us look up the element quickly later on.
41
76
42
-
3.`size[i]` is a count of nodes in tree which root is node with number `i` We'll be using that in Union method.
77
+
2. Then we add that index to the `parent` array to build a new tree for this set. Here, `parent[i]` is pointing to itself because the tree that represents the new set contains only one node, which of course is the root of that tree.
43
78
79
+
3.`size[i]` is the count of nodes in the tree whose root is at index `i`. For the new set this is 1 because it only contains the one element. We'll be using the `size` array in the Union operation.
44
80
45
-
###Find
81
+
## Find
46
82
47
-
```swift
48
-
privatemutatingfuncsetByIndex(index: Int) ->Int {
49
-
if parent[index] == index { // 1
50
-
return index
51
-
} else {
52
-
parent[index] =setByIndex(parent[index]) // 2
53
-
return parent[index] // 3
54
-
}
55
-
}
83
+
Often we want to determine whether we already have a set that contains a given element. That's what the **Find** operation does. In our `UnionFind` data structure it is called `setOf()`:
`setOf(element: T)` is a helper method to get index corresponding to `element` and if it exists we return value of actual method `setByIndex(index: Int)`
95
+
This looks up the element's index in the `index` dictionary and then uses a helper method to find the set that this element belongs to:
96
+
97
+
```swift
98
+
privatemutatingfuncsetByIndex(index: Int) ->Int {
99
+
if parent[index] == index { // 1
100
+
return index
101
+
} else {
102
+
parent[index] =setByIndex(parent[index]) // 2
103
+
return parent[index] // 3
104
+
}
105
+
}
106
+
```
107
+
108
+
Because we're dealing with a tree structure, this is a recursive method.
67
109
68
-
1. First, we check if current index represent a node that is root. That means we find number that represent the setof element we search for.
110
+
Recall that each set is represented by a tree and that the index of the root node serves as the number that identifies the set. We're going to find the root node of the tree that the element we're searching for belongs to, and return its index.
69
111
70
-
2. Otherwise we recursively call our method on parent of current node. And then we do **very important thing**: we cache index of root node, so when we call this method again it will executed faster because of cached indexes. Without that optimization method's complexity is **O(n)** but now in combination with the size optimization (I'll cover that in Union section) it is almost **O(1)**.
112
+
1. First, we check if the given index represents a root node (i.e. a node whose `parent` points back to the node itself). If so, we're done.
71
113
72
-
3. We return our cached root as result.
114
+
2. Otherwise we recursively call this method on the parent of the current node. And then we do a **very important thing**: we overwrite the parent of the current node with the index of root node, in effect reconnecting the node directly to the root of the tree. The next time we call this method, it will execute faster because the path to the root of the tree is now much shorter. Without that optimization, this method's complexity is **O(n)** but now in combination with the size optimization (covered in the Union section) it is almost **O(1)**.
73
115
74
-
Here's illustration of what I mean
116
+
3. We return the index of the root node as the result.
75
117
76
-
Before first call `setOf(4)`:
118
+
Here's illustration of what I mean. Let's say the tree looks like this:
77
119
78
120

79
121
80
-
After:
122
+
We call `setOf(4)`. To find the root node we have to first go to node `2` and then to node `7`. (The indexes of the elements are marked in red.)
123
+
124
+
During the call to `setOf(4)`, the tree is reorganized to look like this:
81
125
82
126

83
127
84
-
Indexes of elements are marked in red.
128
+
Now if we need to call `setOf(4)` again, we no longer have to go through node `2` to get to the root. So as you use the Union-Find data structure, it optimizes itself. Pretty cool!
85
129
130
+
There is also a helper method to check that two elements are in the same set:
@@ -102,50 +161,34 @@ public mutating func unionSetsContaining(firstElement: T, and secondElement: T)
102
161
}
103
162
```
104
163
105
-
1. We find sets of each element.
164
+
Here is how it works:
106
165
107
-
2. Check that sets are not equal because if they are it makes no sense to union them.
166
+
1. We find the sets that each element belongs to. Remember that this gives us two integers: the indices of the root nodes in the `parent` array.
108
167
109
-
3. This is where our size optimization comes in. We want to keep trees as small as possible so we always attach the smaller tree to the root of the larger tree. To determine small tree we compare trees by its sizes.
168
+
2. Check that the sets are not equal because if they are it makes no sense to union them.
110
169
111
-
4. Here we attach the smaller tree to the root of the larger tree.
112
-
113
-
5. We keep sizes of trees in actual states so we update size of larger tree.
170
+
3. This is where the size optimization comes in. We want to keep the trees as shallow as possible so we always attach the smaller tree to the root of the larger tree. To determine which is the smaller tree we compare trees by their sizes.
114
171
115
-
Union with optimizations also takes almost **O(1)** time.
172
+
4. Here we attach the smaller tree to the root of the larger tree.
116
173
117
-
As always, some illustrations for better understanding
174
+
5. Update the size of larger tree because it just had a bunch of nodes added to it.
118
175
119
-
Before calling `unionSetsContaining(4, and: 3)`:
176
+
An illustration may help to better understand this. Let's say we have these two sets, each with its own tree:
120
177
121
178

122
179
123
-
After:
180
+
Now we call `unionSetsContaining(4, and: 3)`. The smaller tree is attached to the larger one:
124
181
125
182

126
183
127
-
Note that during union caching optimization was performed because of calling`setOf` in the beginning of method.
184
+
Note that, because we call`setOf()` in the beginning of the method, the larger tree was also optimized in the process -- node `3` now hangs directly off the root.
128
185
186
+
Union with optimizations also takes almost **O(1)** time.
129
187
130
-
131
-
There is also helper method to just check that two elements is in the same set:
0 commit comments