Skip to content

Commit b4b1dfc

Browse files
committed
Add count occurrences algorithm
1 parent bdc4b88 commit b4b1dfc

File tree

6 files changed

+234
-2
lines changed

6 files changed

+234
-2
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
//: Playground - noun: a place where people can play
2+
3+
func countOccurrencesOfKey(key: Int, inArray a: [Int]) -> Int {
4+
func leftBoundary() -> Int {
5+
var low = 0
6+
var high = a.count
7+
while low < high {
8+
let midIndex = low + (high - low)/2
9+
if a[midIndex] < key {
10+
low = midIndex + 1
11+
} else {
12+
high = midIndex
13+
}
14+
}
15+
return low
16+
}
17+
18+
func rightBoundary() -> Int {
19+
var low = 0
20+
var high = a.count
21+
while low < high {
22+
let midIndex = low + (high - low)/2
23+
if a[midIndex] > key {
24+
high = midIndex
25+
} else {
26+
low = midIndex + 1
27+
}
28+
}
29+
return low
30+
}
31+
32+
return rightBoundary() - leftBoundary()
33+
}
34+
35+
36+
// Simple test
37+
38+
let a = [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
39+
countOccurrencesOfKey(3, inArray: a)
40+
41+
42+
// Test with arrays of random size and contents (see debug output)
43+
44+
import Foundation
45+
46+
func createArray() -> [Int] {
47+
var a = [Int]()
48+
for i in 0...5 {
49+
if i != 2 { // don't include the number 2
50+
let count = Int(arc4random_uniform(UInt32(6))) + 1
51+
for _ in 0..<count {
52+
a.append(i)
53+
}
54+
}
55+
}
56+
return a.sort(<)
57+
}
58+
59+
for _ in 0..<10 {
60+
let a = createArray()
61+
print(a)
62+
63+
// Note: we also test -1 and 6 to check the edge cases.
64+
for k in -1...6 {
65+
print("\t\(k): \(countOccurrencesOfKey(k, inArray: a))")
66+
}
67+
}
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2+
<playground version='5.0' target-platform='osx'>
3+
<timeline fileName='timeline.xctimeline'/>
4+
</playground>
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<Timeline
3+
version = "3.0">
4+
<TimelineItems>
5+
</TimelineItems>
6+
</Timeline>
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
/*
2+
Counts the number of times a value appears in an array in O(lg n) time.
3+
The array must be sorted from low to high.
4+
*/
5+
func countOccurrencesOfKey(key: Int, inArray a: [Int]) -> Int {
6+
func leftBoundary() -> Int {
7+
var low = 0
8+
var high = a.count
9+
while low < high {
10+
let midIndex = low + (high - low)/2
11+
if a[midIndex] < key {
12+
low = midIndex + 1
13+
} else {
14+
high = midIndex
15+
}
16+
}
17+
return low
18+
}
19+
20+
func rightBoundary() -> Int {
21+
var low = 0
22+
var high = a.count
23+
while low < high {
24+
let midIndex = low + (high - low)/2
25+
if a[midIndex] > key {
26+
high = midIndex
27+
} else {
28+
low = midIndex + 1
29+
}
30+
}
31+
return low
32+
}
33+
34+
return rightBoundary() - leftBoundary()
35+
}

Count Occurrences/README.markdown

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Count Occurrences
2+
3+
Goal: Count how often a certain value appears in an array.
4+
5+
The obvious way to do this is with a [linear search](../Linear Search/) from the beginning of the array until the end, keeping count of how often you come across the value. This is an **O(n)** algorithm.
6+
7+
However, if the array is sorted you can do it much faster, in **O(log n)** time, by using a modification of [binary search](../Binary Search/).
8+
9+
Let's say we have the following array:
10+
11+
[ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
12+
13+
If we want to know how often the value `3` occurs, we can do a binary search for `3`. That could give us any of these four indices:
14+
15+
[ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
16+
* * * *
17+
18+
But that still doesn't tell you how many other `3`s there are. To find those other `3`s, you'd still have to do a linear search to the left and a linear search to the right. That will be fast enough in most cases, but in the worst case -- when the array consists of nothing but `3`s -- it still takes **O(n)** time.
19+
20+
The trick is to use two binary searches, one to find where the `3`s start (the left boundary), and one to find where they end (the right boundary).
21+
22+
In code this looks as follows:
23+
24+
```swift
25+
func countOccurrencesOfKey(key: Int, inArray a: [Int]) -> Int {
26+
func leftBoundary() -> Int {
27+
var low = 0
28+
var high = a.count
29+
while low < high {
30+
let midIndex = low + (high - low)/2
31+
if a[midIndex] < key {
32+
low = midIndex + 1
33+
} else {
34+
high = midIndex
35+
}
36+
}
37+
return low
38+
}
39+
40+
func rightBoundary() -> Int {
41+
var low = 0
42+
var high = a.count
43+
while low < high {
44+
let midIndex = low + (high - low)/2
45+
if a[midIndex] > key {
46+
high = midIndex
47+
} else {
48+
low = midIndex + 1
49+
}
50+
}
51+
return low
52+
}
53+
54+
return rightBoundary() - leftBoundary()
55+
}
56+
```
57+
58+
Notice that the helper functions `leftBoundary()` and `rightBoundary()` are very similar to the binary search algorithm. The big difference is that they don't stop when they find the search key, but keep going.
59+
60+
To test this algorithm, copy the code to a playground and then do:
61+
62+
```swift
63+
let a = [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
64+
65+
countOccurrencesOfKey(3, inArray: a) // returns 4
66+
```
67+
68+
Remember: If you use your own array, make sure it is sorted first!
69+
70+
Let's walk through the example. The array is:
71+
72+
[ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
73+
74+
To find the left boundary, we start with `low = 0` and `high = 12`. The first mid index is `6`:
75+
76+
[ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
77+
*
78+
79+
With a regular binary search you'd be done now, but here we're not just looking whether the value `3` occurs or not -- instead, we want to find where it occurs *first*.
80+
81+
Since this algorithm follows the same principle as binary search, we now ignore the right half of the array and calculate the new mid index:
82+
83+
[ 0, 1, 1, 3, 3, 3 | x, x, x, x, x, x ]
84+
*
85+
86+
Again, we've landed on a `3`, and it's the very first one. But the algorithm doesn't know that, so we split the array again:
87+
88+
[ 0, 1, 1 | x, x, x | x, x, x, x, x, x ]
89+
*
90+
91+
Still not done. Split again, but this time use the right half:
92+
93+
[ x, x | 1 | x, x, x | x, x, x, x, x, x ]
94+
*
95+
96+
The array cannot be split up any further, which means we've found the left boundary, at index 3.
97+
98+
Now let's start over and try to find the right boundary. This is very similar, so I'll just show you the different steps:
99+
100+
[ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ]
101+
*
102+
103+
[ x, x, x, x, x, x, x | 6, 8, 10, 11, 11 ]
104+
*
105+
106+
[ x, x, x, x, x, x, x | 6, 8, | x, x, x ]
107+
*
108+
109+
[ x, x, x, x, x, x, x | 6 | x | x, x, x ]
110+
*
111+
112+
The right boundary is at index 7. The difference between the two boundaries is 7 - 3 = 4, so the number `3` occurs four times in this array.
113+
114+
Each binary search took 4 steps, so in total this algorithm took 8 steps. Not a big gain on an array of only 12 items, but the bigger the array, the more efficient this algorithm becomes. For a sorted array with 1,000,000 items, it only takes 2x20 = 40 steps to count the number of occurrences for any particular value.
115+
116+
By the way, if the value you're looking for is not in the array, then `rightBoundary()` and `leftBoundary()` return the same value and so the difference between them is 0.
117+
118+
This is an example of how you can modify the basic binary search to solve other algorithmic problems as well. Of course, it does require that the array is sorted.
119+
120+
*Written by Matthijs Hollemans*

README.markdown

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ If you're new to algorithms and data structures, here are a few good ones to sta
2828

2929
### Searching
3030

31-
- [Linear Search](Linear Search/)
31+
- [Linear Search](Linear Search/). Find an element in an array.
3232
- [Binary Search](Binary Search/). Quickly find elements in a sorted array.
33-
- Count Occurrences
33+
- [Count Occurrences](Count Occurrences/). Count how often a value appears in an array.
3434
- Select Minimum / Maximum
3535
- Select k-th Largest Element
3636
- Selection Sampling

0 commit comments

Comments
 (0)