Skip to content

Commit b580a4a

Browse files
authored
Merge pull request kodecocodes#263 from billbarbour/master
Rabin-Karp string search algorithm
2 parents c20d822 + 663f902 commit b580a4a

File tree

6 files changed

+301
-1
lines changed

6 files changed

+301
-1
lines changed

README.markdown

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ If you're new to algorithms and data structures, here are a few good ones to sta
5757
- [Brute-Force String Search](Brute-Force String Search/). A naive method.
5858
- [Boyer-Moore](Boyer-Moore/). A fast method to search for substrings. It skips ahead based on a look-up table, to avoid looking at every character in the text.
5959
- Knuth-Morris-Pratt
60-
- Rabin-Karp
60+
- [Rabin-Karp](Rabin-Karp/) Faster search by using hashing.
6161
- [Longest Common Subsequence](Longest Common Subsequence/). Find the longest sequence of characters that appear in the same order in both strings.
6262
- [Z-Algorithm](Z-Algorithm/). Finds all instances of a pattern in a String, and returns the indexes of where the pattern starts within the String.
6363

Rabin-Karp/README.markdown

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Rabin-Karp string search algorithm
2+
3+
The Rabin-Karp string search alogrithm is used to search text for a pattern.
4+
5+
A practical application of the algorithm is detecting plagiarism. Given source material, the algorithm can rapidly search through a paper for instances of sentences from the source material, ignoring details such as case and punctuation. Because of the abundance of the sought strings, single-string searching algorithms are impractical.
6+
7+
## Example
8+
9+
Given a text of "The big dog jumped over the fox" and a search pattern of "ump" this will return 13.
10+
It starts by hashing "ump" then hashing "The". If hashed don't match then it slides the window a character
11+
at a time (e.g. "he ") and subtracts out the previous hash from the "T".
12+
13+
## Algorithm
14+
15+
The Rabin-Karp alogrithm uses a sliding window the size of the search pattern. It starts by hashing the search pattern, then
16+
hashing the first x characters of the text string where x is the length of the search pattern. It then slides the window one character over and uses
17+
the previous hash value to calculate the new hash faster. Only when it finds a hash that matches the hash of the search pattern will it compare
18+
the two strings it see if they are the same (prevent a hash collision from producing a false positive)
19+
20+
## The code
21+
22+
The major search method is next. More implementation details are in rabin-karp.swift
23+
24+
```swift
25+
public func search(text: String , pattern: String) -> Int {
26+
// convert to array of ints
27+
let patternArray = pattern.characters.flatMap { $0.asInt }
28+
let textArray = text.characters.flatMap { $0.asInt }
29+
30+
if textArray.count < patternArray.count {
31+
return -1
32+
}
33+
34+
let patternHash = hash(array: patternArray)
35+
var endIdx = patternArray.count - 1
36+
let firstChars = Array(textArray[0...endIdx])
37+
let firstHash = hash(array: firstChars)
38+
39+
if (patternHash == firstHash) {
40+
// Verify this was not a hash collison
41+
if firstChars == patternArray {
42+
return 0
43+
}
44+
}
45+
46+
var prevHash = firstHash
47+
// Now slide the window across the text to be searched
48+
for idx in 1...(textArray.count - patternArray.count) {
49+
endIdx = idx + (patternArray.count - 1)
50+
let window = Array(textArray[idx...endIdx])
51+
let windowHash = nextHash(prevHash: prevHash, dropped: textArray[idx - 1], added: textArray[endIdx], patternSize: patternArray.count - 1)
52+
53+
if windowHash == patternHash {
54+
if patternArray == window {
55+
return idx
56+
}
57+
}
58+
59+
prevHash = windowHash
60+
}
61+
62+
return -1
63+
}
64+
```
65+
66+
This code can be tested in a playground using the following:
67+
68+
```swift
69+
search(text: "The big dog jumped"", "ump")
70+
```
71+
72+
This will return 13 since ump is in the 13 position of the zero based string.
73+
74+
## Additional Resources
75+
76+
[Rabin-Karp Wikipedia](https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
77+
78+
79+
*Written by [Bill Barbour](https://github.com/brbatwork)*
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
//: Taking our rabin-karp algorithm for a walk
2+
3+
import UIKit
4+
5+
struct Constants {
6+
static let hashMultiplier = 69069
7+
}
8+
9+
precedencegroup PowerPrecedence { higherThan: MultiplicationPrecedence }
10+
infix operator ** : PowerPrecedence
11+
func ** (radix: Int, power: Int) -> Int {
12+
return Int(pow(Double(radix), Double(power)))
13+
}
14+
func ** (radix: Double, power: Int) -> Double {
15+
return pow(radix, Double(power))
16+
}
17+
18+
extension Character {
19+
var asInt: Int {
20+
let s = String(self).unicodeScalars
21+
return Int(s[s.startIndex].value)
22+
}
23+
}
24+
25+
// Find first position of pattern in the text using Rabin Karp algorithm
26+
public func search(text: String, pattern: String) -> Int {
27+
// convert to array of ints
28+
let patternArray = pattern.characters.flatMap { $0.asInt }
29+
let textArray = text.characters.flatMap { $0.asInt }
30+
31+
if textArray.count < patternArray.count {
32+
return -1
33+
}
34+
35+
let patternHash = hash(array: patternArray)
36+
var endIdx = patternArray.count - 1
37+
let firstChars = Array(textArray[0...endIdx])
38+
let firstHash = hash(array: firstChars)
39+
40+
if patternHash == firstHash {
41+
// Verify this was not a hash collison
42+
if firstChars == patternArray {
43+
return 0
44+
}
45+
}
46+
47+
var prevHash = firstHash
48+
// Now slide the window across the text to be searched
49+
for idx in 1...(textArray.count - patternArray.count) {
50+
endIdx = idx + (patternArray.count - 1)
51+
let window = Array(textArray[idx...endIdx])
52+
let windowHash = nextHash(
53+
prevHash: prevHash,
54+
dropped: textArray[idx - 1],
55+
added: textArray[endIdx],
56+
patternSize: patternArray.count - 1
57+
)
58+
59+
if windowHash == patternHash {
60+
if patternArray == window {
61+
return idx
62+
}
63+
}
64+
65+
prevHash = windowHash
66+
}
67+
68+
return -1
69+
}
70+
71+
public func hash(array: Array<Int>) -> Double {
72+
var total: Double = 0
73+
var exponent = array.count - 1
74+
for i in array {
75+
total += Double(i) * (Double(Constants.hashMultiplier) ** exponent)
76+
exponent -= 1
77+
}
78+
79+
return Double(total)
80+
}
81+
82+
public func nextHash(prevHash: Double, dropped: Int, added: Int, patternSize: Int) -> Double {
83+
let oldHash = prevHash - (Double(dropped) *
84+
(Double(Constants.hashMultiplier) ** patternSize))
85+
return Double(Constants.hashMultiplier) * oldHash + Double(added)
86+
}
87+
88+
// TESTS
89+
assert(search(text:"The big dog jumped over the fox",
90+
pattern:"ump") == 13, "Invalid index returned")
91+
92+
assert(search(text:"The big dog jumped over the fox",
93+
pattern:"missed") == -1, "Invalid index returned")
94+
95+
assert(search(text:"The big dog jumped over the fox",
96+
pattern:"T") == 0, "Invalid index returned")
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
2+
<playground version='5.0' target-platform='ios'>
3+
<timeline fileName='timeline.xctimeline'/>
4+
</playground>

Rabin-Karp/Rabin-Karp.playground/playground.xcworkspace/contents.xcworkspacedata

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Rabin-Karp/rabin-karp.swift

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
// The MIT License (MIT)
2+
3+
// Copyright (c) 2016 Bill Barbour (brbatwork[at]gmail.com)
4+
5+
// Permission is hereby granted, free of charge, to any person obtaining a copy
6+
// of this software and associated documentation files (the "Software"), to deal
7+
// in the Software without restriction, including without limitation the rights
8+
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
// copies of the Software, and to permit persons to whom the Software is
10+
// furnished to do so, subject to the following conditions:
11+
12+
// The above copyright notice and this permission notice shall be included in all
13+
// copies or substantial portions of the Software.
14+
15+
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
// SOFTWARE.
22+
23+
struct Constants {
24+
static let hashMultiplier = 69069
25+
}
26+
27+
precedencegroup PowerPrecedence { higherThan: MultiplicationPrecedence }
28+
infix operator ** : PowerPrecedence
29+
func ** (radix: Int, power: Int) -> Int {
30+
return Int(pow(Double(radix), Double(power)))
31+
}
32+
func ** (radix: Double, power: Int) -> Double {
33+
return pow(radix, Double(power))
34+
}
35+
36+
extension Character {
37+
var asInt: Int {
38+
let s = String(self).unicodeScalars
39+
return Int(s[s.startIndex].value)
40+
}
41+
}
42+
43+
// Find first position of pattern in the text using Rabin Karp algorithm
44+
public func search(text: String, pattern: String) -> Int {
45+
// convert to array of ints
46+
let patternArray = pattern.characters.flatMap { $0.asInt }
47+
let textArray = text.characters.flatMap { $0.asInt }
48+
49+
if textArray.count < patternArray.count {
50+
return -1
51+
}
52+
53+
let patternHash = hash(array: patternArray)
54+
var endIdx = patternArray.count - 1
55+
let firstChars = Array(textArray[0...endIdx])
56+
let firstHash = hash(array: firstChars)
57+
58+
if patternHash == firstHash {
59+
// Verify this was not a hash collison
60+
if firstChars == patternArray {
61+
return 0
62+
}
63+
}
64+
65+
var prevHash = firstHash
66+
// Now slide the window across the text to be searched
67+
for idx in 1...(textArray.count - patternArray.count) {
68+
endIdx = idx + (patternArray.count - 1)
69+
let window = Array(textArray[idx...endIdx])
70+
let windowHash = nextHash(
71+
prevHash: prevHash,
72+
dropped: textArray[idx - 1],
73+
added: textArray[endIdx],
74+
patternSize: patternArray.count - 1
75+
)
76+
77+
if windowHash == patternHash {
78+
if patternArray == window {
79+
return idx
80+
}
81+
}
82+
83+
prevHash = windowHash
84+
}
85+
86+
return -1
87+
}
88+
89+
public func hash(array: Array<Int>) -> Double {
90+
var total: Double = 0
91+
var exponent = array.count - 1
92+
for i in array {
93+
total += Double(i) * (Double(Constants.hashMultiplier) ** exponent)
94+
exponent -= 1
95+
}
96+
97+
return Double(total)
98+
}
99+
100+
public func nextHash(prevHash: Double, dropped: Int, added: Int, patternSize: Int) -> Double {
101+
let oldHash = prevHash - (Double(dropped) *
102+
(Double(Constants.hashMultiplier) ** patternSize))
103+
return Double(Constants.hashMultiplier) * oldHash + Double(added)
104+
}
105+
106+
// TESTS
107+
assert(search(text:"The big dog jumped over the fox",
108+
pattern:"ump") == 13, "Invalid index returned")
109+
110+
assert(search(text:"The big dog jumped over the fox",
111+
pattern:"missed") == -1, "Invalid index returned")
112+
113+
assert(search(text:"The big dog jumped over the fox",
114+
pattern:"T") == 0, "Invalid index returned")

0 commit comments

Comments
 (0)