|
| 1 | +# Count Occurrences |
| 2 | + |
| 3 | +Goal: Count how often a certain value appears in an array. |
| 4 | + |
| 5 | +The obvious way to do this is with a [linear search](../Linear Search/) from the beginning of the array until the end, keeping count of how often you come across the value. This is an **O(n)** algorithm. |
| 6 | + |
| 7 | +However, if the array is sorted you can do it much faster, in **O(log n)** time, by using a modification of [binary search](../Binary Search/). |
| 8 | + |
| 9 | +Let's say we have the following array: |
| 10 | + |
| 11 | + [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 12 | + |
| 13 | +If we want to know how often the value `3` occurs, we can do a binary search for `3`. That could give us any of these four indices: |
| 14 | + |
| 15 | + [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 16 | + * * * * |
| 17 | + |
| 18 | +But that still doesn't tell you how many other `3`s there are. To find those other `3`s, you'd still have to do a linear search to the left and a linear search to the right. That will be fast enough in most cases, but in the worst case -- when the array consists of nothing but `3`s -- it still takes **O(n)** time. |
| 19 | + |
| 20 | +The trick is to use two binary searches, one to find where the `3`s start (the left boundary), and one to find where they end (the right boundary). |
| 21 | + |
| 22 | +In code this looks as follows: |
| 23 | + |
| 24 | +```swift |
| 25 | +func countOccurrencesOfKey(key: Int, inArray a: [Int]) -> Int { |
| 26 | + func leftBoundary() -> Int { |
| 27 | + var low = 0 |
| 28 | + var high = a.count |
| 29 | + while low < high { |
| 30 | + let midIndex = low + (high - low)/2 |
| 31 | + if a[midIndex] < key { |
| 32 | + low = midIndex + 1 |
| 33 | + } else { |
| 34 | + high = midIndex |
| 35 | + } |
| 36 | + } |
| 37 | + return low |
| 38 | + } |
| 39 | + |
| 40 | + func rightBoundary() -> Int { |
| 41 | + var low = 0 |
| 42 | + var high = a.count |
| 43 | + while low < high { |
| 44 | + let midIndex = low + (high - low)/2 |
| 45 | + if a[midIndex] > key { |
| 46 | + high = midIndex |
| 47 | + } else { |
| 48 | + low = midIndex + 1 |
| 49 | + } |
| 50 | + } |
| 51 | + return low |
| 52 | + } |
| 53 | + |
| 54 | + return rightBoundary() - leftBoundary() |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +Notice that the helper functions `leftBoundary()` and `rightBoundary()` are very similar to the binary search algorithm. The big difference is that they don't stop when they find the search key, but keep going. |
| 59 | + |
| 60 | +To test this algorithm, copy the code to a playground and then do: |
| 61 | + |
| 62 | +```swift |
| 63 | +let a = [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 64 | + |
| 65 | +countOccurrencesOfKey(3, inArray: a) // returns 4 |
| 66 | +``` |
| 67 | + |
| 68 | +Remember: If you use your own array, make sure it is sorted first! |
| 69 | + |
| 70 | +Let's walk through the example. The array is: |
| 71 | + |
| 72 | + [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 73 | + |
| 74 | +To find the left boundary, we start with `low = 0` and `high = 12`. The first mid index is `6`: |
| 75 | + |
| 76 | + [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 77 | + * |
| 78 | + |
| 79 | +With a regular binary search you'd be done now, but here we're not just looking whether the value `3` occurs or not -- instead, we want to find where it occurs *first*. |
| 80 | + |
| 81 | +Since this algorithm follows the same principle as binary search, we now ignore the right half of the array and calculate the new mid index: |
| 82 | + |
| 83 | + [ 0, 1, 1, 3, 3, 3 | x, x, x, x, x, x ] |
| 84 | + * |
| 85 | + |
| 86 | +Again, we've landed on a `3`, and it's the very first one. But the algorithm doesn't know that, so we split the array again: |
| 87 | + |
| 88 | + [ 0, 1, 1 | x, x, x | x, x, x, x, x, x ] |
| 89 | + * |
| 90 | + |
| 91 | +Still not done. Split again, but this time use the right half: |
| 92 | + |
| 93 | + [ x, x | 1 | x, x, x | x, x, x, x, x, x ] |
| 94 | + * |
| 95 | + |
| 96 | +The array cannot be split up any further, which means we've found the left boundary, at index 3. |
| 97 | + |
| 98 | +Now let's start over and try to find the right boundary. This is very similar, so I'll just show you the different steps: |
| 99 | + |
| 100 | + [ 0, 1, 1, 3, 3, 3, 3, 6, 8, 10, 11, 11 ] |
| 101 | + * |
| 102 | + |
| 103 | + [ x, x, x, x, x, x, x | 6, 8, 10, 11, 11 ] |
| 104 | + * |
| 105 | + |
| 106 | + [ x, x, x, x, x, x, x | 6, 8, | x, x, x ] |
| 107 | + * |
| 108 | + |
| 109 | + [ x, x, x, x, x, x, x | 6 | x | x, x, x ] |
| 110 | + * |
| 111 | + |
| 112 | +The right boundary is at index 7. The difference between the two boundaries is 7 - 3 = 4, so the number `3` occurs four times in this array. |
| 113 | + |
| 114 | +Each binary search took 4 steps, so in total this algorithm took 8 steps. Not a big gain on an array of only 12 items, but the bigger the array, the more efficient this algorithm becomes. For a sorted array with 1,000,000 items, it only takes 2x20 = 40 steps to count the number of occurrences for any particular value. |
| 115 | + |
| 116 | +By the way, if the value you're looking for is not in the array, then `rightBoundary()` and `leftBoundary()` return the same value and so the difference between them is 0. |
| 117 | + |
| 118 | +This is an example of how you can modify the basic binary search to solve other algorithmic problems as well. Of course, it does require that the array is sorted. |
| 119 | + |
| 120 | +*Written by Matthijs Hollemans* |
0 commit comments