Skip to content

Commit e819bba

Browse files
authored
Merge pull request kodecocodes#626 from blainerothrock/genetic-algorithm
[New Algorithm] Genetic Algorithm
2 parents c71414b + a1e2e9d commit e819bba

File tree

5 files changed

+595
-0
lines changed

5 files changed

+595
-0
lines changed

Genetic/README.markdown

Lines changed: 312 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,312 @@
1+
# Genetic Algorthim
2+
3+
## What is it?
4+
5+
A genetic algorithm (GA) is process inspired by natural selection to find high quality solutions. Most commonly used for optimization. GAs rely on the bio-inspired processes of natural selection, more specifically the process of selection (fitness), crossover and mutation. To understand more, let's walk through these processes in terms of biology:
6+
7+
### Selection
8+
>**Selection**, in biology, the preferential survival and reproduction or preferential elimination of individuals with certain genotypes (genetic compositions), by means of natural or artificial controlling factors.
9+
10+
In other words, survival of the fittest. Organisms that survive in their environment tend to reproduce more. With GAs we generate a fitness model that will rank individuals and give them a better chance for reproduction.
11+
12+
### Crossover
13+
>**Chromosomal crossover** (or crossing over) is the exchange of genetic material between homologous chromosomes that results in recombinant chromosomes during sexual reproduction [Wikipedia](https://en.wikipedia.org/wiki/Chromosomal_crossover)
14+
15+
Simply reproduction. A generation will be a mixed representation of the previous generation, with offspring taking DNA from both parents. GAs do this by randomly, but weightily, mating offspring to create new generations.
16+
17+
### Mutation
18+
>**Mutation**, an alteration in the genetic material (the genome) of a cell of a living organism or of a virus that is more or less permanent and that can be transmitted to the cell’s or the virus’s descendants. [Britannica](https://www.britannica.com/science/mutation-genetics)
19+
20+
The randomization that allows for organisms to change over time. In GAs we build a randomization process that will mutate offspring in a population in order to introduce fitness variance.
21+
22+
### Resources:
23+
* [Genetic Algorithms in Search Optimization, and Machine Learning](https://www.amazon.com/Genetic-Algorithms-Optimization-Machine-Learning/dp/0201157675/ref=sr_1_sc_1?ie=UTF8&qid=1520628364&sr=8-1-spell&keywords=Genetic+Algortithms+in+search)
24+
* [Wikipedia](https://en.wikipedia.org/wiki/Genetic_algorithm)
25+
* [My Original Gist](https://gist.github.com/blainerothrock/efda6e12fe10792c99c990f8ff3daeba)
26+
27+
## The Code
28+
29+
### Problem
30+
For this quick and dirty example, we are going to produce an optimized string using a simple genetic algorithm. More specifically we are trying to take a randomly generated origin string of a fixed length and evolve it into the most optimized string of our choosing.
31+
32+
We will be creating a bio-inspired world where the absolute existence is the string `Hello, World!`. Nothing in this universe is better and it's our goal to get as close to it as possible to ensure survival.
33+
34+
### Define the Universe
35+
36+
Before we dive into the core processes we need to set up our "universe". First let's define a lexicon, a set of everything that exists in our universe.
37+
38+
```swift
39+
let lex: [UInt8] = " !\"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~".asciiArray
40+
```
41+
42+
To make things easier, we are actually going to work in [Unicode values](https://en.wikipedia.org/wiki/List_of_Unicode_characters), so let's define a String extension to help with that.
43+
44+
```swift
45+
extension String {
46+
var unicodeArray: [UInt8] {
47+
return [UInt8](self.utf8)
48+
}
49+
}
50+
```
51+
52+
Now, let's define a few global variables for the universe:
53+
* `OPTIMAL`: This is the end goal and what we will be using to rate fitness. In the real world this will not exist
54+
* `DNA_SIZE`: The length of the string in our population. Organisms need to be similar
55+
* `POP_SIZE`: Size of each generation
56+
* `MAX_GENERATIONS`: Max number of generations, script will stop when it reach 5000 if the optimal value is not found
57+
* `MUTATION_CHANCE`: The chance in which a random nucleotide can mutate (`1/MUTATION_CHANCE`)
58+
59+
```swift
60+
let OPTIMAL:[UInt8] = "Hello, World".unicodeArray
61+
let DNA_SIZE = OPTIMAL.count
62+
let POP_SIZE = 50
63+
let GENERATIONS = 5000
64+
let MUTATION_CHANCE = 100
65+
```
66+
67+
### Population Zero
68+
69+
Before selecting, crossover and mutation, we need a population to start with. Now that we have the universe defined we can write that function:
70+
71+
```swift
72+
func randomPopulation(from lexicon: [UInt8], populationSize: Int, dnaSize: Int) -> [[UInt8]] {
73+
guard lexicon.count > 1 else { return [] }
74+
var pop = [[UInt8]]()
75+
76+
(0..<populationSize).forEach { _ in
77+
var dna = [UInt8]()
78+
(0..<dnaSize).forEach { _ in
79+
let char = lexicon.randomElement()! // guaranteed to be non-nil by initial guard statement
80+
dna.append(char)
81+
}
82+
pop.append(dna)
83+
}
84+
return pop
85+
}
86+
```
87+
88+
### Selection
89+
90+
There are two parts to the selection process, the first is calculating the fitness, which will assign a rating to a individual. We do this by simply calculating how close the individual is to the optimal string using unicode values:
91+
92+
```swift
93+
func calculateFitness(dna: [UInt8], optimal: [UInt8]) -> Int {
94+
guard dna.count == optimal.count else { return -1 }
95+
var fitness = 0
96+
for index in dna.indices {
97+
fitness += abs(Int(dna[index]) - Int(optimal[index]))
98+
}
99+
return fitness
100+
}
101+
```
102+
103+
The above will produce a fitness value to an individual. The perfect solution, "Hello, World" will have a fitness of 0. "Gello, World" will have a fitness of 1 since it is one unicode value off from the optimal (`H->G`).
104+
105+
This example is very simple, but it'll work for our example. In a real world problem, the optimal solution is unknown or impossible. [Here](https://iccl.inf.tu-dresden.de/w/images/b/b7/GA_for_TSP.pdf) is a paper about optimizing a solution for the famous [traveling salesman problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) using a GA. In this example the problem is unsolvable by modern computers, but you can rate a individual solution by distance traveled. The optimal fitness here is an impossible 0. The closer the solution is to 0, the better chance for survival. In our example we will reach our goal, a fitness of 0.
106+
107+
The second part to selection is weighted choice, also called roulette wheel selection. This defines how individuals are selected for the reproduction process out of the current population. Just because you are the best choice for natural selection doesn't mean the environment will select you. The individual could fall off a cliff, get dysentery or be unable to reproduce.
108+
109+
Let's take a second and ask why on this one. Why would you not always want to select the most fit from a population? It's hard to see from this simple example, but let's think about dog breeding, because breeders remove this process and hand select dogs for the next generation. As a result you get improved desired characteristics, but the individuals will also continue to carry genetic disorders that come along with those traits. A certain "branch" of evolution may beat out the current fittest solution at a later time. This may be ok depending on the problem, but to keep this educational we will go with the bio-inspired way.
110+
111+
With all that, here is our weight choice function:
112+
113+
func weightedChoice(items: [(dna: [UInt8], weight: Double)]) -> (dna: [UInt8], weight: Double) {
114+
115+
let total = items.reduce(0) { $0 + $1.weight }
116+
var n = Double.random(in: 0..<(total * 1000000)) / 1000000.0
117+
118+
for item in items {
119+
if n < item.weight {
120+
return item
121+
}
122+
n = n - item.weight
123+
}
124+
return items[1]
125+
}
126+
127+
128+
The above function takes a list of individuals with their calculated fitness. Then selects one at random offset by their fitness value. The horrible 1,000,000 multiplication and division is to insure precision by calculating decimals. `Double.random` only uses integers so this is required to convert to a precise Double, it's not perfect, but enough for our example.
129+
130+
## Mutation
131+
132+
The all powerful mutation, the thing that introduces otherwise non existent fitness variance. It can either hurt of improve a individuals fitness but over time it will cause evolution towards more fit populations. Imagine if our initial random population was missing the charachter `H`, in that case we need to rely on mutation to introduce that character into the population in order to achieve the optimal solution.
133+
134+
```swift
135+
func mutate(lexicon: [UInt8], dna: [UInt8], mutationChance: Int) -> [UInt8] {
136+
var outputDna = dna
137+
(0..<dna.count).forEach { i in
138+
let rand = Int.random(in: 0..<mutationChance)
139+
if rand == 1 {
140+
outputDna[i] = lexicon.randomElement()!
141+
}
142+
}
143+
144+
return outputDna
145+
}
146+
```
147+
148+
Takes a mutation chance and a individual and returns that individual with mutations, if any.
149+
150+
This allows for a population to explore all the possibilities of it's building blocks and randomly stumble on a better solution. If there is too much mutation, the evolution process will get nowhere. If there is too little the populations will become too similar and never be able to branch out of a defect to meet their changing environment.
151+
152+
## Crossover
153+
154+
Crossover, the sexy part of a GA, is how offspring are created from 2 selected individuals in the current population. This is done by splitting the parents into 2 parts, then combining 1 part from each parent to create the offspring. To promote diversity, we randomly select a index to split the parents.
155+
156+
```swift
157+
func crossover(dna1: [UInt8], dna2: [UInt8], dnaSize: Int) -> [UInt8] {
158+
let pos = Int.random(in: 0..<dnaSize)
159+
160+
let dna1Index1 = dna1.index(dna1.startIndex, offsetBy: pos)
161+
let dna2Index1 = dna2.index(dna2.startIndex, offsetBy: pos)
162+
163+
return [UInt8](dna1.prefix(upTo: dna1Index1) + dna2.suffix(from: dna2Index1))
164+
}
165+
```
166+
167+
The above is used to generate a completely new generation based on the current generation.
168+
169+
## Putting it all together -- Running the Genetic Algorithm
170+
171+
We now have all the functions we need to kick off the algorithm. Let's start from the beginning, first we need a random population to serve as a starting point. We will also initialize a fittest variable to hold the fittest individual, we will initialize it with the first individual of our random population.
172+
173+
```swift
174+
var population:[[UInt8]] = randomPopulation(from: lex, populationSize: POP_SIZE, dnaSize: DNA_SIZE)
175+
var fittest = population[0]
176+
```
177+
178+
Now for the meat, the remainder of the code will take place in the generation loop, running once for every generation:
179+
180+
```swift
181+
for generation in 0...GENERATIONS {
182+
// run
183+
}
184+
```
185+
186+
Now, for each individual in the population, we need to calculate its fitness and weighted value. Since 0 is the best value we will use `1/fitness` to represent the weight. Note this is not a percent, but just how much more likely the value is to be selected over others. If the highest number was the most fit, the weight calculation would be `fitness/totalFitness`, which would be a percent.
187+
188+
```swift
189+
var weightedPopulation = [(dna:[UInt8], weight:Double)]()
190+
191+
for individual in population {
192+
let fitnessValue = calculateFitness(dna: individual, optimal: OPTIMAL)
193+
let pair = ( individual, fitnessValue == 0 ? 1.0 : 1.0/Double( fitnessValue ) )
194+
weightedPopulation.append(pair)
195+
}
196+
```
197+
198+
From here we can start to build the next generation.
199+
200+
```swift
201+
var nextGeneration = []
202+
```
203+
204+
The below loop is where we pull everything together. We loop for `POP_SIZE`, selecting 2 individuals by weighted choice, crossover their values to produce a offspring, then finial subject the new individual to mutation. Once completed we have a completely new generation based on the last generation.
205+
206+
```swift
207+
0...POP_SIZE).forEach { _ in
208+
let ind1 = weightedChoice(items: weightedPopulation)
209+
let ind2 = weightedChoice(items: weightedPopulation)
210+
211+
let offspring = crossover(dna1: ind1.dna, dna2: ind2.dna, dnaSize: DNA_SIZE)
212+
213+
// append to the population and mutate
214+
nextGeneration.append(mutate(lexicon: lex, dna: offspring, mutationChance: MUTATION_CHANCE))
215+
}
216+
```
217+
218+
The final piece to the main loop is to select the fittest individual of a population:
219+
220+
```swift
221+
fittest = population[0]
222+
var minFitness = calculateFitness(dna: fittest, optimal: OPTIMAL)
223+
224+
population.forEach { indv in
225+
let indvFitness = calculateFitness(dna: indv, optimal: OPTIMAL)
226+
if indvFitness < minFitness {
227+
fittest = indv
228+
minFitness = indvFitness
229+
}
230+
}
231+
if minFitness == 0 { break; }
232+
print("\(generation): \(String(bytes: fittest, encoding: .utf8)!)")
233+
```
234+
235+
Since we know the fittest string, I've added a `break` to kill the program if we find it. At the end of a loop add a print statement for the fittest string:
236+
237+
```swift
238+
print("fittest string: \(String(bytes: fittest, encoding: .utf8)!)")
239+
```
240+
241+
Now we can run the program! Playgrounds are a nice place to develop, but are going to run this program **very slow**. I highly suggest running in Terminal: `swift gen.swift`. When running you should see something like this and it should not take too long to get `Hello, World`:
242+
243+
```text
244+
0: RXclh F HDko
245+
1: DkyssjgElk];
246+
2: TiM4u) DrKvZ
247+
3: Dkysu) DrKvZ
248+
4: -kysu) DrKvZ
249+
5: Tlwsu) DrKvZ
250+
6: Tlwsu) Drd}k
251+
7: Tlwsu) Drd}k
252+
8: Tlwsu) Drd}k
253+
9: Tlwsu) Drd}k
254+
10: G^csu) |zd}k
255+
11: G^csu) |zdko
256+
12: G^csu) |zdko
257+
13: Dkysu) Drd}k
258+
14: G^wsu) `rd}k
259+
15: Dkysu) `rdko
260+
16: Dkysu) `rdko
261+
17: Glwsu) `rdko
262+
18: TXysu) `rdkc
263+
19: U^wsu) `rdko
264+
20: G^wsu) `rdko
265+
21: Glysu) `rdko
266+
22: G^ysu) `rdko
267+
23: G^ysu) `ryko
268+
24: G^wsu) `rdko
269+
25: G^wsu) `rdko
270+
26: G^wsu) `rdko
271+
...
272+
1408: Hello, Wormd
273+
1409: Hello, Wormd
274+
1410: Hello, Wormd
275+
1411: Hello, Wormd
276+
1412: Hello, Wormd
277+
1413: Hello, Wormd
278+
1414: Hello, Wormd
279+
1415: Hello, Wormd
280+
1416: Hello, Wormd
281+
1417: Hello, Wormd
282+
1418: Hello, Wormd
283+
1419: Hello, Wormd
284+
1420: Hello, Wormd
285+
1421: Hello, Wormd
286+
1422: Hello, Wormd
287+
1423: Hello, Wormd
288+
1424: Hello, Wormd
289+
1425: Hello, Wormd
290+
1426: Hello, Wormd
291+
1427: Hello, Wormd
292+
1428: Hello, Wormd
293+
1429: Hello, Wormd
294+
1430: Hello, Wormd
295+
1431: Hello, Wormd
296+
1432: Hello, Wormd
297+
1433: Hello, Wormd
298+
1434: Hello, Wormd
299+
1435: Hello, Wormd
300+
fittest string: Hello, World
301+
```
302+
303+
How long it takes will vary since this is based on randomization, but it should almost always finish in under 5000 generations. Woo!
304+
305+
306+
## Now What?
307+
308+
We did it, we have a running simple genetic algorithm. Take some time a play around with the global variables, `POP_SIZE`, `OPTIMAL`, `MUTATION_CHANCE`, `GENERATIONS`. Just make sure to only add characters that are in the lexicon or update the lexicon.
309+
310+
For an example let's try something much longer: `Ray Wenderlich's Swift Algorithm Club Rocks`. Plug that string into `OPTIMAL` and change `GENERATIONS` to `10000`. You'll be able to see that the we are getting somewhere, but you most likely will not reach the optimal string in 10,000 generations. Since we have a larger string let's raise our mutation chance to `200` (1/2 as likely to mutate). You may not get there, but you should get a lot closer than before. With a longer string, too much mutation can make it hard for fit strings to survive. Now try either upping `POP_SIZE` or increase `GENERATIONS`. Either way you should eventually get the value, but there will be a "sweet spot" for an string of a certain size.
311+
312+
Please submit any kind of update to this tutorial or add more examples!

0 commit comments

Comments
 (0)