Betfair Pro Trader: Genetic Algorithm and Genetic Programming

Artificial intelligence (AI) is a very broad subject within computer science. My own specialisation centres around evolutionary computation. In particular I use genetic algorithm and genetic programming to optimise trading rule entries and exits.

Recently, I optimised a trading rule that I had been developing within a spreadsheet. Whenever I design a trading rule I always visualise data on a spreadsheet using charts. The next thing I do is create an hypothesis to see if it generates a trading edge. The hypothesis can then be tweaked until an edge is found or the hypothesis (and its variants) are refuted and I start again.

I saw that my hypothesis could make a prediction about the future state of the market but I wasn't sure on how to go about profiting from the strategy. I knew which data triggered an entry but I was unsure of when to exit the trade. Exiting too soon might lose some profit and staying too long in the trade might mean that I would be jumping on the profit taking bandwagon with all the manual traders when it was too late.

Genetic algorithm (GA) was used to optimise a suitable stop and profit taking exit strategy. In GA possible numerical solutions are encoded as numbers called chromosomes. These chromosomes are then grouped together into inhabitants and a population of random inhabitants are created. Each inhabitant is evaluated against test data to generate a fitness value. After fitness is evaluated the population is then paired off for reproduction with fitter inhabitants more likely to be selected for reproduction.

During reproduction chromosomes are shared between parents to create children who share their parents' traits but with small differences, as in nature. In GA the parent population is replaced by the child population and the new population is evaluated for fitness. This process is continued until an inhabitant of desired fitness is found.

Example

I might encode an inhabitant with three chromosomes representing an entry, stop loss and take profit. The entry tells the rule when to back or lay, the stop loss tells the rule when to get out of a losing trade and the take profit tells the rule when to close the trade for a profit. An initial population contains inhabitants (typically thousands) with each of these chromosomes. Each chromosome has a random value as a possible solution to a problem.

An example inhabitant would be

Inhabitant = 12, 20, 40 (enter when price rises 12%, stop at 20% loss, take at 40% profit)

Each inhabitant is evaluated against a set of data and given a fitness as a function of return on trades executed. Inhabitants are then selected in pairs as parents for reproduction. The fitter an inhabitant is the more likely they are to be chosen.

Parent1 = 12, 20, 40
Parent2 = 12, 25, 35

Parents then undergo "crossover" whereby children are created from the sharing of chdromosomes from the parents. A random point is chosen where the crossover between the parents will take place. In this example we shall say that position three is chosen.

Child1 = 12, 20, 35
Child2 = 12, 25, 40

As you can see two children have been created that are different to their parents but use the same traits as their parents. These children may or may not be superior to their parents in gaining return from their trades. Such is the case in nature, evolution is directed by the ability to survive in an environment but in a random way.

The old population is replaced by the children and tested as before. As many generations are evolved as is desired by the operator. However, cares must be taken not to over-fit inhabitants to the data otherwise they will have no predictive power. A set of data is left out of the training data and used to test a chosen solution for generality. If say the best candidate solution scores a 10% yield in training but only 1% with the test data then the candicate is probably over-fitted to the training data. It would therefore be wise to run the training phase again but for fewer generations to get a lower yielding solution that might stand up to evaluation by the test data.

In my implementation, each inhabitant had two chromosomes, one representing a percentage stop loss value for closing a losing trade and the other a percentage profit take value for taking a profit. The entry was known so it did not have to be optimised. The fitness function was the return on a £2 bet after trading each of the races in my training data. The fittest inhabitant would be the one with the most return after being evaluated against all the training races.

The results were interesting in that the genetic algorithm recommended a stop when the loss was 30% or more. That is something that I would never have imagined as a manual trader but I checked the result and it was correct. Too often, when I was a manual trader I would panic when there was any kind of loss. Eventually, I educated myself to only close a trade with a 10% loss. Now, by working in tandem with a machine intelligence I can use my superior human skills for looking for patterns in data coupled with a machine intelligence that optimises a trading rule to get the most profit out of it.

When I first worked in evolutionary computation the two books that I referred to most were David Goldberg's Genetic Algorithms in Search, Optimization and Machine Learning and John Koza's Genetic Programming: On the Programming of Computers by Means of Natural Selection. Both books are available secondhand for a reasonable price.

The two books deal with slightly different aspects of evolutionary computation. You can think of GA as a tool for optimising a known formula or equation. In my stop/profit example above, I knew when to enter a trade, I just didn't know how much of a loss to accept or how much profit to take. The GA returned just two numbers, a percentage stop value (e.g. 30%, the amount I could lose before closing a losing position) and the percentage profit taking value (e.g. 80%, the profit yielded from the trade before taking the profit and closing the position).

If neither the stop nor the profit taking values are hit then the trade is left until just before the off where the position is closed out for a value between -30% and +80% of the value of the trade. The machine intelligence had optimised these values to give the best overall profit margin for all the trades in the training data. In this case the trades as a whole would have yielded 10% on average.

In the case of genetic progamming (GP) nothing is known about the solution to a problem and so a complete program tree is evolved to maximise a fitness function. The program might contain logical statements, for example "If lay_price > 30_minute_moving_average then lay". Each program element; 'If', 'lay_price', '>', '30_minute_moving_average', 'then', 'lay' is a separate chromosome. Some programs might be non-sensical (e.g. if if then if then) but that is okay as their poor fitness means they will not be selected for reproduction. The fitter programs will pass their traits on to future generations.

This is essentially the basis of a paper I wrote in 1998 called EDDIE Beats The Bookies so I won't explain GP further and just refer you to the paper. There enough detail in the paper for an experienced AI programmer to replicate a basic GP environment for evolving decision trees. The trees in the paper were evolved to select the winner of a horse race in a very rudimentary manner so I would take the results with a pinch of salt. The aim of the research was to show how intelligence could emerge from data and be guided by the payoffs from success or failure of each tree.

I have a personal preference for evolutionary computation. For me neural networks are black boxes and you can never be sure what they are upto. A GP has a program tree that can be easily read so that you can see the logic behind the solution. That is not to say that neural networks have no place in AI as can be seen by the recent victory of a machine against a human Go player.

Further Reading

Genetic Algorithms in Search, Optimization and Machine Learning. All you need to know about GA. Simple to understand, the ideas can be ported to any programming language.The book uses primarily uses Pascal to demonstrate how to code a GA but the logic will be easily ported to any language. I managed to port the code to C++ without difficulty.

Genetic Programming: On the Programming of Computers by Means of Natural Selection is a weighty tome, containing everything you need to know about Genetic Programming. Includes many example applications such as classification and strategy evolution.

For those who have some experience of evolutionary computation and would like to read a more trading specific book I would recommend Biologically Inspired Algorithms for Financial Modelling. The book details evolutionary computation, neural network and hybrid systems with examples of trading rule discovery and optimisation.

Pages

Genetic Algorithm and Genetic Programming