Skip to content

Commit ca6f0f0

Browse files
authored
readme updates
1 parent 66f75d2 commit ca6f0f0

File tree

1 file changed

+38
-41
lines changed

1 file changed

+38
-41
lines changed

Genetic/README.markdown

Lines changed: 38 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -2,35 +2,34 @@ individual# Genetic Algorthim
22

33
## What is it?
44

5-
A genetic algorithm (GA) is process inspired by natural selection to find high quality solutions. Most commonly used for optimization. GAs rely on the bio-inspired processes of natural selection, more specifically the process of selection (fitness), mutation and crossover. To understand more, let's walk through these process in terms of biology:
5+
A genetic algorithm (GA) is process inspired by natural selection to find high quality solutions. Most commonly used for optimization. GAs rely on the bio-inspired processes of natural selection, more specifically the process of selection (fitness), crossover and mutation. To understand more, let's walk through these processes in terms of biology:
66

77
### Selection
88
>**Selection**, in biology, the preferential survival and reproduction or preferential elimination of individuals with certain genotypes (genetic compositions), by means of natural or artificial controlling factors. [Britannica](britannica)
99
10-
In other words, survival of the fittest. Organism that survive in their environment tend to reproduce more. With GAs we generate a fitness model that will rank offspring and give them a better chance for reproduction.
11-
12-
### Mutation
13-
>**Mutation**, an alteration in the genetic material (the genome) of a cell of a living organism or of a virus that is more or less permanent and that can be transmitted to the cell’s or the virus’s descendants. [Britannica](https://www.britannica.com/science/mutation-genetics)
14-
15-
The randomization that allows for organisms to change over time. In GAs we build a randomization process that will mutate offspring in a populate in order to randomly introduce fitness variance.
10+
In other words, survival of the fittest. Organisms that survive in their environment tend to reproduce more. With GAs we generate a fitness model that will rank individuals and give them a better chance for reproduction.
1611

1712
### Crossover
1813
>**Chromosomal crossover** (or crossing over) is the exchange of genetic material between homologous chromosomes that results in recombinant chromosomes during sexual reproduction [Wikipedia](https://en.wikipedia.org/wiki/Chromosomal_crossover)
1914
20-
Simply reproduction. A generation will a mixed representation of the previous generation, with offspring taking data (DNA) from both parents. GAs do this by randomly, but weightily, mating offspring to create new generations.
15+
Simply reproduction. A generation will be a mixed representation of the previous generation, with offspring taking DNA from both parents. GAs do this by randomly, but weightily, mating offspring to create new generations.
16+
17+
### Mutation
18+
>**Mutation**, an alteration in the genetic material (the genome) of a cell of a living organism or of a virus that is more or less permanent and that can be transmitted to the cell’s or the virus’s descendants. [Britannica](https://www.britannica.com/science/mutation-genetics)
19+
20+
The randomization that allows for organisms to change over time. In GAs we build a randomization process that will mutate offspring in a population in order to introduce fitness variance.
2121

2222
### Resources:
2323
* [Genetic Algorithms in Search Optimization, and Machine Learning](https://www.amazon.com/Genetic-Algorithms-Optimization-Machine-Learning/dp/0201157675/ref=sr_1_sc_1?ie=UTF8&qid=1520628364&sr=8-1-spell&keywords=Genetic+Algortithms+in+search)
2424
* [Wikipedia](https://en.wikipedia.org/wiki/Genetic_algorithm)
2525
* [My Original Gist](https://gist.github.com/blainerothrock/efda6e12fe10792c99c990f8ff3daeba)
2626

27-
2827
## The Code
2928

3029
### Problem
31-
For this quick and dirty example, we are going to obtain a optimize string using a simple genetic algorithm. More specifically we are trying to take a randomly generated origin string of a fixed length and evolve it into the most optimized string of our choosing.
30+
For this quick and dirty example, we are going to produce an optimized string using a simple genetic algorithm. More specifically we are trying to take a randomly generated origin string of a fixed length and evolve it into the most optimized string of our choosing.
3231

33-
We will be creating a bio-inspired world where the absolute existence is string `Hello, World!`. Nothing in this universe is better and it's our goal to get as close to it as possible.
32+
We will be creating a bio-inspired world where the absolute existence is the string `Hello, World!`. Nothing in this universe is better and it's our goal to get as close to it as possible to ensure survival.
3433

3534
### Define the Universe
3635

@@ -40,36 +39,32 @@ Before we dive into the core processes we need to set up our "universe". First l
4039
let lex: [UInt8] = " !\"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~".asciiArray
4140
```
4241

43-
To make things easier, we are actually going to work in ASCII values, so let's define a String extension to help with that.
42+
To make things easier, we are actually going to work in [Unicode values](https://en.wikipedia.org/wiki/List_of_Unicode_characters), so let's define a String extension to help with that.
4443

4544
```swift
4645
extension String {
47-
var asciiArray: [UInt8] {
46+
var unicodeArray: [UInt8] {
4847
return [UInt8](self.utf8)
4948
}
5049
}
5150
```
5251

5352
Now, let's define a few global variables for the universe:
53+
* `OPTIMAL`: This is the end goal and what we will be using to rate fitness. In the real world this will not exist
54+
* `DNA_SIZE`: The length of the string in our population. Organisms need to be similar
55+
* `POP_SIZE`: Size of each generation
56+
* `MAX_GENERATIONS`: Max number of generations, script will stop when it reach 5000 if the optimal value is not found
57+
* `MUTATION_CHANCE`: The chance in which a random nucleotide can mutate (`1/MUTATION_CHANCE`)
5458

5559
```swift
56-
// This is the end goal and what we will be using to rate fitness. In the real world this will not exist
57-
let OPTIMAL:[UInt8] = "Hello, World".asciiArray
58-
59-
// The length of the string in our population. Organisms need to be similar
60+
let OPTIMAL:[UInt8] = "Hello, World".unicodeArray
6061
let DNA_SIZE = OPTIMAL.count
61-
62-
// size of each generation
6362
let POP_SIZE = 50
64-
65-
// max number of generations, script will stop when it reach 5000 if the optimal value is not found
66-
let GENERATIONS = 5000
67-
68-
// The chance in which a random nucleotide can mutate (1/n)
63+
let MAX_GENERATIONS = 5000
6964
let MUTATION_CHANCE = 100
7065
```
7166

72-
The last piece we need for set up is a function to give us a random ASCII value from our lexicon:
67+
The last piece we need for set up is a function to give us a random unicode value from our lexicon:
7368

7469
```swift
7570
func randomChar(from lexicon: [UInt8]) -> UInt8 {
@@ -78,10 +73,12 @@ let MUTATION_CHANCE = 100
7873
return lexicon[rand]
7974
}
8075
```
76+
77+
**Note**: `arc4random_uniform` is strickly used in this example. It would be fun to play around with some of the [randomization in GameKit](https://developer.apple.com/library/content/documentation/General/Conceptual/GameplayKit_Guide/RandomSources.html)
8178

8279
### Population Zero
8380

84-
Before selecting, mutating and reproduction, we need population to start with. Now that we have the universe defined we can write that function:
81+
Before selecting, crossover and mutation, we need a population to start with. Now that we have the universe defined we can write that function:
8582

8683
```swift
8784
func randomPopulation(from lexicon: [UInt8], populationSize: Int, dnaSize: Int) -> [[UInt8]] {
@@ -90,9 +87,9 @@ let MUTATION_CHANCE = 100
9087

9188
var pop = [[UInt8]]()
9289

93-
for _ in 0..<populationSize {
90+
(0..<populationSize).forEach { _ in
9491
var dna = [UInt8]()
95-
for _ in 0..<dnaSize {
92+
(0..<dnaSize).forEach { _ in
9693
let char = randomChar(from: lexicon)
9794
dna.append(char)
9895
}
@@ -104,27 +101,27 @@ let MUTATION_CHANCE = 100
104101

105102
### Selection
106103

107-
There are two parts to the selection process, the first is calculating the fitness, which will assign a rating to a individual. We do this by simply calculating how close the individual is to the optimal string using ASCII values:
104+
There are two parts to the selection process, the first is calculating the fitness, which will assign a rating to a individual. We do this by simply calculating how close the individual is to the optimal string using unicode values:
108105

109106
```swift
110107
func calculateFitness(dna:[UInt8], optimal:[UInt8]) -> Int {
111108
var fitness = 0
112-
for c in 0...dna.count-1 {
109+
(0...dna.count-1).forEach { c in
113110
fitness += abs(Int(dna[c]) - Int(optimal[c]))
114111
}
115112
return fitness
116113
}
117114
```
118115

119-
The above will produce a fitness value to an individual. The perfect solution, "Hello, World" will have a fitness of 0. "Gello, World" will have a fitness of 1 since it is one ASCII value off from the optimal.
116+
The above will produce a fitness value to an individual. The perfect solution, "Hello, World" will have a fitness of 0. "Gello, World" will have a fitness of 1 since it is one unicode value off from the optimal (`H->G`).
120117

121-
This example is very, but it'll work for our example. In a real world problem, the optimal solution is unknown or impossible. [Here](https://iccl.inf.tu-dresden.de/w/images/b/b7/GA_for_TSP.pdf) is a paper about optimizing a solution for the famous [traveling salesman problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) using GA. In this example the problem is unsolvable by modern computers, but you can rate a individual solution by distance traveled. The optimal fitness here is an impossible 0. The closer the solution is to 0, the better chance for survival.
118+
This example is very simple, but it'll work for our example. In a real world problem, the optimal solution is unknown or impossible. [Here](https://iccl.inf.tu-dresden.de/w/images/b/b7/GA_for_TSP.pdf) is a paper about optimizing a solution for the famous [traveling salesman problem](https://en.wikipedia.org/wiki/Travelling_salesman_problem) using a GA. In this example the problem is unsolvable by modern computers, but you can rate a individual solution by distance traveled. The optimal fitness here is an impossible 0. The closer the solution is to 0, the better chance for survival. In our example we will reach our goal, a fitness of 0.
122119

123-
The second part to selection is weighted choice, also called roulette wheel selection. This defines how individuals are selected for the reproduction process out of the current population. Just because you are the best choice for natural selection doesn't mean the environment will select you. The individual could fall off a cliff, get dysentery or not be able to reproduce.
120+
The second part to selection is weighted choice, also called roulette wheel selection. This defines how individuals are selected for the reproduction process out of the current population. Just because you are the best choice for natural selection doesn't mean the environment will select you. The individual could fall off a cliff, get dysentery or be unable to reproduce.
124121

125-
Let's take a second and ask why on this one. Why would you not always want to select the most fit from a population? It's hard to see from this simple example, but let's think about dog breeding, because breeders remove this process and hand select dogs for the next generation. As a result you get improved desired characteristics, but the individuals will also continue to carry genetic disorders that come along with those traits. This is essentially leading the evolution down a linear path. A certain "branch" of evolution may beat out the current fittest solution at a later time.
122+
Let's take a second and ask why on this one. Why would you not always want to select the most fit from a population? It's hard to see from this simple example, but let's think about dog breeding, because breeders remove this process and hand select dogs for the next generation. As a result you get improved desired characteristics, but the individuals will also continue to carry genetic disorders that come along with those traits. A certain "branch" of evolution may beat out the current fittest solution at a later time. This may be ok depending on the problem, but to keep this educational we will go with the bio-inspired way.
126123

127-
ok, back to code. Here is our weighted choice function:
124+
With all that, here is our weight choice function:
128125

129126
```swift
130127
func weightedChoice(items:[(item:[UInt8], weight:Double)]) -> (item:[UInt8], weight:Double) {
@@ -147,13 +144,13 @@ The above function takes a list of individuals with their calculated fitness. Th
147144

148145
## Mutation
149146

150-
The all powerful mutation. The great randomization that turns bacteria into humans, just add time. So powerful yet so simple:
147+
The all powerful mutation, the thing that introduces otherwise non exisitant fitness variance. It can either hurt of improve a individuals fitness but over time it will cause evolution towards more fit populations. Imagine if our initial random population was missing the charachter `H`, in that case we need to rely on mutation to introduce that character into the population in order to achive the optimal solution.
151148

152149
```swift
153150
func mutate(lexicon: [UInt8], dna:[UInt8], mutationChance:Int) -> [UInt8] {
154151
var outputDna = dna
155152

156-
for i in 0..<dna.count {
153+
(0..<dna.count).forEach { i in
157154
let rand = Int(arc4random_uniform(UInt32(mutationChance)))
158155
if rand == 1 {
159156
outputDna[i] = randomChar(from: lexicon)
@@ -202,7 +199,7 @@ for generation in 0...GENERATIONS {
202199
}
203200
```
204201

205-
Now, for each individual in the population, we need to calculate its fitness and weighted value. Since 0 is the best value we will use `1/fitness` to represent the weighted value. Note this is not a percent, but just how much more likely the value is to be selected over others. If the highest number was the most fit, the weight calculation would be `fitness/totalFitness`, which would be a percent.
202+
Now, for each individual in the population, we need to calculate its fitness and weighted value. Since 0 is the best value we will use `1/fitness` to represent the weight. Note this is not a percent, but just how much more likely the value is to be selected over others. If the highest number was the most fit, the weight calculation would be `fitness/totalFitness`, which would be a percent.
206203

207204
```swift
208205
var weightedPopulation = [(item:[UInt8], weight:Double)]()
@@ -251,7 +248,7 @@ if minFitness == 0 { break; }
251248
print("\(generation): \(String(bytes: fittest, encoding: .utf8)!)")
252249
```
253250

254-
Since we know the fittest string, I've added a `break` to kill the program if we find it. At the end of a loop at a print statement for the fittest string:
251+
Since we know the fittest string, I've added a `break` to kill the program if we find it. At the end of a loop add a print statement for the fittest string:
255252

256253
```swift
257254
print("fittest string: \(String(bytes: fittest, encoding: .utf8)!)")
@@ -324,8 +321,8 @@ How long it takes will vary since this is based on randomization, but it should
324321

325322
## Now What?
326323

327-
We did it, we have a running simple genetic algorithm. Take some time a play around with the global variables, `POP_SIZE`, `OPTIMAL`, `MUTATION_CHANCE`, `GENERATIONS`. Just make sure to only add characters that are in the lexicon, but go ahead and update too!
324+
We did it, we have a running simple genetic algorithm. Take some time a play around with the global variables, `POP_SIZE`, `OPTIMAL`, `MUTATION_CHANCE`, `GENERATIONS`. Just make sure to only add characters that are in the lexicon or update the lexicon.
328325

329-
For an example let's try something much longer: `Ray Wenderlich's Swift Algorithm Club Rocks`. Plug that string into `OPTIMAL` and change `GENERATIONS` to `10000`. You'll be able to see that the we are getting somewhere, but you most likely will not reach the optimal string in 10,000 generations. Since we have a larger string let's raise our mutation chance to `200` (1/2 as likely to mutate). You may not get there, but you should get a lot closer than before. With a longer string, too much mutate can make it hard for fit strings to survive. Now try either upping `POP_SIZE` or increase `GENERATIONS`. Either way you should eventually get the value, but there will be a "sweet spot" for an individual of a certain size.
326+
For an example let's try something much longer: `Ray Wenderlich's Swift Algorithm Club Rocks`. Plug that string into `OPTIMAL` and change `GENERATIONS` to `10000`. You'll be able to see that the we are getting somewhere, but you most likely will not reach the optimal string in 10,000 generations. Since we have a larger string let's raise our mutation chance to `200` (1/2 as likely to mutate). You may not get there, but you should get a lot closer than before. With a longer string, too much mutation can make it hard for fit strings to survive. Now try either upping `POP_SIZE` or increase `GENERATIONS`. Either way you should eventually get the value, but there will be a "sweet spot" for an string of a certain size.
330327

331328
Please submit any kind of update to this tutorial or add more examples!

0 commit comments

Comments
 (0)