Genetic algorithms are a form of evolutionary computation, a branch of artificial intelligence that focuses on evolving effective or optimal solutions to difficult problems, based on the biological theory of evolution.
Genetic algorithms are, at their core, a search/optimisation technique. They are a way of finding maximum/minimum solutions to problems and, can be effective when there is no algorithmic solution to the problem. An example here would be the ‘Traveling Salesman’ problem.
Genetic algorithms work by taking an initial population of potential solutions (referred to as individuals), selecting a subset of the population that has the highest fitness then using that subset to generate a second generation. From the second generation again a subset with the highest fitness is selected and used to generate a third generation. This repeats until either the ‘fittest’ individual is considered a good enough solution, or until a certain number of generations have passed.
There are advantage to using genetic algorithms to solve problems over more traditional methods like hill climbing.
A genetic algorithm will almost always find an optimal solution, given enough time. The main downside is that they may take a lot of time to find that optimal solution.
There are two main critical parts in setting up a genetic algorithm for a problem.
Most of the design work when using genetic algorithms goes into those two problems.
Encoding is the process of taking all the values that make up a potential solution and turning them into a form that the genetic algorithm can operate on.
The selection of an encoding is of utmost importance to the effectiveness of the entire process and a poor representation can make the entire problem much harder than it should. Unfortunately there has been little academic work done on the process of designing representations.
Often for genetic algorithms, the end result of the encoding will be a binary string. There are other variations of evolutionary computation that use other representations, from the arrays of real numbers used by evolutionary strategies to the code trees used by genetic programming.
Depending on the problem, the fitness function can be trivial to write or near-impossible. The design of the fitness function is completely based on the problem that is being solved.
There are two important considerations for a fitness function.
If the fitness of an individual is assessed twice, it must come to the same value1. If the fitness function could return different values for the same individual, then it is of no use in determining the fittest individuals in the population and hence the genetic algorithm will not be able to identify the best solution to the problem.
The fitness of each individual is assessed at least once in each generation. The calculation of the fitness function is usually the most time consuming part of the entire process and the longer the fitness function takes to run, the longer the entire process is run
(1) There are cases where the fitness of an individual may depend on external factors which change over time. Hence a fitness function may give different values for one individual if calculated at different times. Genetic algorithms in a changing environment are a little beyond the scope of this entry.
In order to create a new generation, the fittest individuals from the previous generation are taken and used to generate the next generation. There are two main operators that are used to generate a generation from the previous one. Crossover and mutation.
Crossover involved taking two individuals, splitting each one’s encoded string and swapping parts to generate two new individuals.
Say we had two individuals with the following encoded strings (spaces added for clarity)0000 0001 1111 11100101 1010 1100 0011and we chose the splitting point for the crossover after the 4th bit, the resulting strings after the crossover will be0000 1010 1100 00110101 0001 1111 1110
In genetic algorithms crossover is the primary operator used. What I described here was a single crossover. There are a number of other variations that can be used.
Mutation is an operator applied to a single individual. It’s usually applied after crossover has generated new individuals. Mutation involves flipping a single bit somewhere in the encoded string.
Let’s take the two individuals that were generates by the crossover earlier and apply a random mutation to each0000 1010 1100 00110101 0001 1111 1110After0000 1010 1101 00110101 0001 1011 1110
In genetic algorithms mutation is very seldom applied and only a small percentage of individuals in a generation will be affected by the mutation operator.
As a quick example let’s manually evolve a simple function to see how the whole thing works.
Let’s say I have an array of 4 numbers (call it num) between 0 and 15. I want to know what values give me the best value for the following.
I know, that’s simple enough that we could work out the optimal solution just by eye. Not the point. This is enough to do a quick and effective demo with.
I’m going to encode that by simply converting the numbers in the array to binary and concatenating the binary representations of the 4 numbers (spaces just added for clarity). The fitness function is already defined. I’m going to start with an initial population of eight individuals.
1111 1111 1100 1110 – fitness = (15-15-12+14) = 20101 1010 1100 0011 – fitness = (9-10-12+3) = –101011 0111 0011 1111 – fitness = (11-7-3+15) = 161111 1001 1010 0011 – fitness = (15-9-10+3) = -11010 1010 1010 1010 – fitness = (10-10-10+10) = 01000 0010 0111 0110 – fitness = (8-2-7+6) = 50000 0001 1111 1110 – fitness = (0-1-15+14) = -21010 0101 0010 0101 – fitness = (10-5-2+5) = 8
From this I’m going to take the 4 individuals with the highest fitness, use crossover operations (with the crossover point exactly in the middle) between them until I have 8 individuals for the 2nd generation and then apply a single bit mutation to one of the individuals (detailed steps left as an exercise for the reader)
1111 1111 0011 1111 – fitness = (15-15-3+15) = 121011 0111 1100 1110 – fitness = (11-7-12+14) = 61111 1011 0010 0101 – fitness = (15-11-2+5) = 71010 0101 1100 1110 – fitness = (10-5-7+6) = 41011 0111 0111 0110 – fitness = (11-7-7+6) = 31000 0010 0011 1111 – fitness = (8-2-3+15) = 181010 0101 0111 0110 – fitness = (10-5-7+6) = 41000 0010 0010 0101 – fitness = (8-2-2+5) = 9
We can already see an improvement. The average and maximum fitness is much higher than for the first generation. I’ll do one more generation in this example, again taking the 4 fittest individuals, crossing over to generate 8 new individuals and then applying a single bit mutation to two individuals. This time however, the crossover point will between the 4th and 5th bit.
1000 1111 0011 1111 – fitness = (8-15-3+15) = 51111 0010 0011 1111 – fitness = (15-2-3+15) = 251000 1011 0010 0101 – fitness = (8-11-2+5) = 01111 0010 0011 1111 – fitness = (15-2-3+15) = 251111 0010 1010 0101 – fitness = (15-2-10+5) = 81000 0111 0011 1111 – fitness = (8-7-3+15) = 131111 0010 0010 0101 – fitness = (15-2-2+5) = 161000 1011 0010 0101 – fitness = (8-11-2+5) = 0
I think that’s enough for this example. We’re getting fairly close to the best possible solution (15,0,0,15), close enough to see how this works. The population size was very low, that’s why there are duplicates appearing in the results of the crossover. With a larger search space there would be a lot more diversity.
I hope that anyone still reading found this brief diversion into the realms of AI interesting.
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.