You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Run-Length Encoding/README.markdown
+62-59Lines changed: 62 additions & 59 deletions
Original file line number
Diff line number
Diff line change
@@ -3,14 +3,14 @@
3
3
RLE is probably the simplest way to do compression. Let's say you have data that looks like this:
4
4
5
5
aaaaabbbcdeeeeeeef...
6
-
6
+
7
7
then RLE encodes it as follows:
8
8
9
9
5a3b1c1d7e1f...
10
10
11
11
Instead of repeating bytes, you first write how often that byte occurs and then the byte's actual value. So `5a` means `aaaaa`. If the data has a lot of "byte runs", that is lots of repeating bytes, then RLE can save quite a bit of space. It works quite well on images.
12
12
13
-
There are many different ways you can implement RLE. Here's an extension of `NSData` that does a version of RLE inspired by the old [PCX image file format](https://en.wikipedia.org/wiki/PCX).
13
+
There are many different ways you can implement RLE. Here's an extension of `Data` that does a version of RLE inspired by the old [PCX image file format](https://en.wikipedia.org/wiki/PCX).
14
14
15
15
The rules are these:
16
16
@@ -20,44 +20,42 @@ The rules are these:
20
20
21
21
- A single byte in the range 192 - 255 is represented by two bytes: first the byte 192 (meaning a run of 1 byte), followed by the actual value.
22
22
23
-
Here is the compression code. It returns a new `NSData` object containing the run-length encoded bytes:
23
+
Here is the compression code. It returns a new `Data` object containing the run-length encoded bytes:
24
24
25
25
```swift
26
-
extensionNSData {
27
-
publicfunccompressRLE() -> NSData {
28
-
let data =NSMutableData()
29
-
if length >0 {
30
-
var ptr =UnsafePointer<UInt8>(bytes)
31
-
let end = ptr + length
32
-
33
-
while ptr < end { // 1
34
-
var count =0
35
-
var byte = ptr.memory
36
-
var next = byte
37
-
38
-
while next == byte && ptr < end && count <64 { // 2
39
-
ptr = ptr.advancedBy(1)
40
-
next = ptr.memory
41
-
count +=1
42
-
}
43
-
44
-
if count >1|| byte >=192 { // 3
45
-
var size =191+UInt8(count)
46
-
data.appendBytes(&size, length: 1)
47
-
data.appendBytes(&byte, length: 1)
48
-
} else { // 4
49
-
data.appendBytes(&byte, length: 1)
26
+
extensionData {
27
+
publicfunccompressRLE() -> Data {
28
+
var data =Data()
29
+
self.withUnsafeBytes { (uPtr: UnsafePointer<UInt8>) in
30
+
var ptr = uPtr
31
+
let end = ptr + count
32
+
while ptr < end { //1
33
+
var count =0
34
+
var byte = ptr.pointee
35
+
var next = byte
36
+
37
+
while next == byte && ptr < end && count <64 { //2
38
+
ptr = ptr.advanced(by: 1)
39
+
next = ptr.pointee
40
+
count +=1
41
+
}
42
+
43
+
if count >1|| byte >=192 { // 3
44
+
var size =191+UInt8(count)
45
+
data.append(&size, count: 1)
46
+
data.append(&byte, count: 1)
47
+
} else { // 4
48
+
data.append(&byte, count: 1)
49
+
}
50
+
}
50
51
}
51
-
}
52
+
return data
52
53
}
53
-
return data
54
-
}
55
-
}
56
54
```
57
55
58
56
How it works:
59
57
60
-
1. We use an `UnsafePointer` to step through the bytes of the original `NSData` object.
58
+
1. We use an `UnsafePointer` to step through the bytes of the original `Data` object.
61
59
62
60
2. At this point we've read the current byte value into the `byte` variable. If the next byte is the same, then we keep reading until we find a byte value that is different, or we reach the end of the data. We also stop if the run is64 bytes because that's the maximum we can encode.
63
61
@@ -69,11 +67,11 @@ You can test it like this in a playground:
69
67
70
68
```swift
71
69
let originalString ="aaaaabbbcdeeeeeeef"
72
-
let utf8 = originalString.dataUsingEncoding(NSUTF8StringEncoding)!
70
+
let utf8 = originalString.data(using: String.Encoding.utf8)!
73
71
let compressed = utf8.compressRLE()
74
72
```
75
73
76
-
The compressed `NSData` object should be `<c461c262 6364c665 66>`. Let's decode that by hand to see what has happened:
74
+
The compressed `Data` object should be `<c461c262 6364c66566>`. Let's decode that by hand to see what has happened:
77
75
78
76
c4 This is196in decimal. It means the next byte appears 5 times.
79
77
61 The data byte "a".
@@ -90,34 +88,38 @@ So that's 9 bytes encoded versus 18 original. That's a savings of 50%. Of course
90
88
Here is the decompression code:
91
89
92
90
```swift
93
-
publicfuncdecompressRLE() -> NSData {
94
-
let data =NSMutableData()
95
-
if length >0 {
96
-
var ptr =UnsafePointer<UInt8>(bytes)
97
-
let end = ptr + length
98
-
99
-
while ptr < end {
100
-
var byte = ptr.memory// 1
101
-
ptr = ptr.advancedBy(1)
102
-
103
-
if byte <192 { // 2
104
-
data.appendBytes(&byte, length: 1)
105
-
106
-
} elseif ptr < end { // 3
107
-
var value = ptr.memory
108
-
ptr = ptr.advancedBy(1)
109
-
110
-
for_in0..< byte -191 {
111
-
data.appendBytes(&value, length: 1)
112
-
}
91
+
publicfuncdecompressRLE() -> Data {
92
+
var data =Data()
93
+
self.withUnsafeBytes { (uPtr: UnsafePointer<UInt8>) in
94
+
var ptr = uPtr
95
+
let end = ptr + count
96
+
97
+
while ptr < end {
98
+
// Read the next byte. This is either a single value less than 192,
99
+
// or the start of a byte run.
100
+
var byte = ptr.pointee// 1
101
+
ptr = ptr.advanced(by: 1)
102
+
103
+
if byte <192 { // 2
104
+
data.append(&byte, count: 1)
105
+
} elseif ptr < end { // 3
106
+
// Read the actual data value.
107
+
var value = ptr.pointee
108
+
ptr = ptr.advanced(by: 1)
109
+
110
+
// And write it out repeatedly.
111
+
for_in0..< byte -191 {
112
+
data.append(&value, count: 1)
113
+
}
114
+
}
115
+
}
113
116
}
114
-
}
117
+
return data
115
118
}
116
-
return data
117
-
}
119
+
118
120
```
119
121
120
-
1. Again this uses an `UnsafePointer` to read the `NSData`. Here we read the next byte; this is either a single value less than 192, or the start of a byte run.
122
+
1. Again this uses an `UnsafePointer` to read the `Data`. Here we read the next byte; this is either a single value less than 192, or the start of a byte run.
121
123
122
124
2. If it's a single value, then it's just a matter of copying it to the output.
123
125
@@ -134,6 +136,7 @@ And now `originalString == restoredString` must be true!
134
136
135
137
Footnote: The original PCX implementation is slightly different. There, a byte value of 192 (0xC0) means that the following byte will be repeated 0 times. This also limits the maximum run size to 63 bytes. Because it makes no sense to store bytes that don't occur, in my implementation 192 means the next byte appears once, and the maximum run length is64 bytes.
136
138
137
-
This was probably a trade-off when they designed the PCX format way back when. If you look at it in binary, the upper two bits indicate whether a byte is compressed. (If both bits are set then the byte value is 192 or more.) To get the run length you can simply do `byte & 0x3F`, giving you a value in the range 0 to 63.
139
+
This was probably a trade-off when they designed the PCX format way back when. If you look at it in binary, the upper two bits indicate whether a byte is compressed. (If both bits are set then the byte value is192 or more.) To get the run length you can simply do `byte &0x3F`, giving you a value in the range 0 to 63.
138
140
139
141
*Written for Swift Algorithm Club by Matthijs Hollemans*
0 commit comments