Skip to content

Commit 5b136b8

Browse files
author
Chris Pilcher
committed
Merge pull request kodecocodes#106 from herrlui/master
Minimum edit distance for strings
2 parents 4dc952d + f3da019 commit 5b136b8

File tree

2 files changed

+99
-0
lines changed

2 files changed

+99
-0
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
extension String {
2+
3+
public func minimumEditDistance(other: String) -> Int {
4+
let m = self.characters.count
5+
let n = other.characters.count
6+
var matrix = [[Int]](count: m+1, repeatedValue: [Int](count: n+1, repeatedValue: 0))
7+
8+
9+
// initialize matrix
10+
for index in 1...m {
11+
// the distance of any first string to an empty second string
12+
matrix[index][0]=index
13+
}
14+
for index in 1...n {
15+
// the distance of any second string to an empty first string
16+
matrix[0][index]=index
17+
}
18+
19+
// compute Levenshtein distance
20+
for (i, selfChar) in self.characters.enumerate() {
21+
for (j, otherChar) in other.characters.enumerate() {
22+
if otherChar == selfChar {
23+
// substitution of equal symbols with cost 0
24+
matrix[i+1][j+1] = matrix[i][j]
25+
} else {
26+
// minimum of the cost of insertion, deletion, or substitution added to the already computed costs in the corresponding cells
27+
matrix[i+1][j+1] = min(matrix[i][j]+1, matrix[i+1][j]+1, matrix[i][j+1]+1)
28+
}
29+
30+
}
31+
}
32+
return matrix[m][n]
33+
}
34+
}

Minimum Edit Distance/README.markdown

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Minimum Edit Distance
2+
3+
The minimum edit distance is a possibility to measure the similarity of two strings *w* and *u* by counting costs of operations which are necessary to transform *w* into *u* (or vice versa).
4+
5+
### Algorithm using Levenshtein distance
6+
7+
A common distance measure is given by the *Levenshtein distance*, which allows the following three transformation operations:
8+
9+
* **Inseration** (*ε→x*) of a single symbol *x* with **cost 1**,
10+
* **Deletion** (*x→ε*) of a single symbol *x* with **cost 1**, and
11+
* **Substitution** (*x→y*) of two single symbols *x, y* with **cost 1** if *x≠y* and with **cost 0** otherwise.
12+
13+
When transforming a string by a sequence of operations, the costs of the single operations are added to obtain the (minimal) edit distance. For example, the string *Door* can be transformed by the operations *o→l*, *r→l*, *ε→s* to the string *Dolls*, which results in a minimum edit distance of 3.
14+
15+
To avoid exponential time complexity, the minimum edit distance of two strings in the usual is computed using *dynamic programming*. For this in a matrix
16+
17+
```swift
18+
var matrix = [[Int]](count: m+1, repeatedValue: [Int](count: n+1, repeatedValue: 0))
19+
```
20+
21+
already computed minimal edit distances of prefixes of *w* and *u* (of length *m* and *n*, respectively) are used to fill the matrix. In a first step the matrix is initialized by filling the first row and the first column as follows:
22+
23+
```swift
24+
// initialize matrix
25+
for index in 1...m {
26+
// the distance of any prefix of the first string to an empty second string
27+
matrix[index][0]=index
28+
}
29+
for index in 1...n {
30+
// the distance of any prefix of the second string to an empty first string
31+
matrix[0][index]=index
32+
}
33+
```
34+
Then in each cell the minimum of the cost of insertion, deletion, or substitution added to the already computed costs in the corresponding cells is chosen. In this way the matrix is filled iteratively:
35+
36+
```swift
37+
// compute Levenshtein distance
38+
for (i, selfChar) in self.characters.enumerate() {
39+
for (j, otherChar) in other.characters.enumerate() {
40+
if otherChar == selfChar {
41+
// substitution of equal symbols with cost 0
42+
matrix[i+1][j+1] = matrix[i][j]
43+
} else {
44+
// minimum of the cost of insertion, deletion, or substitution added
45+
// to the already computed costs in the corresponing cells
46+
matrix[i+1][j+1] = min(matrix[i][j]+1, matrix[i+1][j]+1, matrix[i][j+1]+1)
47+
}
48+
49+
}
50+
}
51+
```
52+
53+
After applying this algorithm, the minimal edit distance can be read from the rightmost bottom cell and is returned.
54+
55+
```swift
56+
return matrix[m][n]
57+
```
58+
59+
This algorithm has a time complexity of Θ(*mn*).
60+
61+
### Other distance measures
62+
63+
**todo**
64+
65+
*Written for Swift Algorithm Club by Luisa Herrmann*

0 commit comments

Comments
 (0)