Skip to content

Commit f3da019

Browse files
author
Luisa Herrmann
committed
added README.markdown and some comments in the code
1 parent d74c1aa commit f3da019

File tree

2 files changed

+72
-0
lines changed

2 files changed

+72
-0
lines changed

Minimum Edit Distance/MinimumEditDistance.swift

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,25 @@ extension String {
55
let n = other.characters.count
66
var matrix = [[Int]](count: m+1, repeatedValue: [Int](count: n+1, repeatedValue: 0))
77

8+
9+
// initialize matrix
810
for index in 1...m {
11+
// the distance of any first string to an empty second string
912
matrix[index][0]=index
1013
}
1114
for index in 1...n {
15+
// the distance of any second string to an empty first string
1216
matrix[0][index]=index
1317
}
1418

19+
// compute Levenshtein distance
1520
for (i, selfChar) in self.characters.enumerate() {
1621
for (j, otherChar) in other.characters.enumerate() {
1722
if otherChar == selfChar {
23+
// substitution of equal symbols with cost 0
1824
matrix[i+1][j+1] = matrix[i][j]
1925
} else {
26+
// minimum of the cost of insertion, deletion, or substitution added to the already computed costs in the corresponding cells
2027
matrix[i+1][j+1] = min(matrix[i][j]+1, matrix[i+1][j]+1, matrix[i][j+1]+1)
2128
}
2229

Minimum Edit Distance/README.markdown

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Minimum Edit Distance
2+
3+
The minimum edit distance is a possibility to measure the similarity of two strings *w* and *u* by counting costs of operations which are necessary to transform *w* into *u* (or vice versa).
4+
5+
### Algorithm using Levenshtein distance
6+
7+
A common distance measure is given by the *Levenshtein distance*, which allows the following three transformation operations:
8+
9+
* **Inseration** (*ε→x*) of a single symbol *x* with **cost 1**,
10+
* **Deletion** (*x→ε*) of a single symbol *x* with **cost 1**, and
11+
* **Substitution** (*x→y*) of two single symbols *x, y* with **cost 1** if *x≠y* and with **cost 0** otherwise.
12+
13+
When transforming a string by a sequence of operations, the costs of the single operations are added to obtain the (minimal) edit distance. For example, the string *Door* can be transformed by the operations *o→l*, *r→l*, *ε→s* to the string *Dolls*, which results in a minimum edit distance of 3.
14+
15+
To avoid exponential time complexity, the minimum edit distance of two strings in the usual is computed using *dynamic programming*. For this in a matrix
16+
17+
```swift
18+
var matrix = [[Int]](count: m+1, repeatedValue: [Int](count: n+1, repeatedValue: 0))
19+
```
20+
21+
already computed minimal edit distances of prefixes of *w* and *u* (of length *m* and *n*, respectively) are used to fill the matrix. In a first step the matrix is initialized by filling the first row and the first column as follows:
22+
23+
```swift
24+
// initialize matrix
25+
for index in 1...m {
26+
// the distance of any prefix of the first string to an empty second string
27+
matrix[index][0]=index
28+
}
29+
for index in 1...n {
30+
// the distance of any prefix of the second string to an empty first string
31+
matrix[0][index]=index
32+
}
33+
```
34+
Then in each cell the minimum of the cost of insertion, deletion, or substitution added to the already computed costs in the corresponding cells is chosen. In this way the matrix is filled iteratively:
35+
36+
```swift
37+
// compute Levenshtein distance
38+
for (i, selfChar) in self.characters.enumerate() {
39+
for (j, otherChar) in other.characters.enumerate() {
40+
if otherChar == selfChar {
41+
// substitution of equal symbols with cost 0
42+
matrix[i+1][j+1] = matrix[i][j]
43+
} else {
44+
// minimum of the cost of insertion, deletion, or substitution added
45+
// to the already computed costs in the corresponing cells
46+
matrix[i+1][j+1] = min(matrix[i][j]+1, matrix[i+1][j]+1, matrix[i][j+1]+1)
47+
}
48+
49+
}
50+
}
51+
```
52+
53+
After applying this algorithm, the minimal edit distance can be read from the rightmost bottom cell and is returned.
54+
55+
```swift
56+
return matrix[m][n]
57+
```
58+
59+
This algorithm has a time complexity of Θ(*mn*).
60+
61+
### Other distance measures
62+
63+
**todo**
64+
65+
*Written for Swift Algorithm Club by Luisa Herrmann*

0 commit comments

Comments
 (0)