Speed Ratings for Racehorses

Although I always say that I am not a fundamental trader and have no interest in horse racing performance data I do like messing around with maths and statistics. Speed ratings are one area that has always interested me with regards to horse racing.

As Bob Wilkins points out in his excellent although somewhat pointless book (I'll say why in a moment), Bioenergetics and Racehorse Ratings, comparing one horse's performance against another is an onerous task.

Using human athletic performance as a starting point, Wilkins showed that humans are easily compared because athletics tracks are flat and are standard in dimensions. Unfortunately horses are hard to compare because racecourses come in all manner of shapes and sizes with climbs and dips, different surfaces, are open to the weather, not to mention the hurdles and fences in National Hunt racing.

Also, racecourses are constructed of grass, sand/dirt and synthetic materials all of which are permeable to rain which adds to the variety of surfaces. Humans run on hard tracks with soft shoes whereas horses run on tracks of varying degrees of softness with hard shoes. Weather affects a horse's performance more so than a human's performance.

Wilkins' attempts to calculate the speed of a racehorse from a guesstimate of the size of the horse, from which he derives its power. However, Wilkins then runs into the age-old problem of determining how much slower or faster than expected the horse performed. After going to all the trouble of laying down a rationale based on bioenergetics, Wilkins provides ratings which are no better than Nick Mordin's speed figures in Mordin on Time, which uses a more traditional approach to speed figures.

Mordin makes no consideration of the horse other than weight carried. He works out his own standard times for each course and then makes adjustments for weight carried and the track conditions. The horse itself is treated as a variable that cannot be surmised. However, one would assume that the heavier horse can take more weight and eventually you will slow a lighter framed horse given enough lead.

If you have seen William Benter's presentation at the ICCM2004 Conference and read his paper Computer Based Horse Race Handicapping Systems: A Report in Efficiency of Racetrack Betting Markets then you will note that Benter's work revolves around determining finishing times, which is another way of saying speed rating. There are some hints in the video and also,  I believe, some red herrings. His distribution curves are Gaussian (normally) distributed but I don't think for one moment that a racehorse's performances are normally distributed. 

When you compare more than one horse (preferably many) then their performances are normally distributed as in the following chart (click to enlarge). The majority of horses will be clustered around the average with the greats tending towards the right tail of the distribution and the donkeys towards the left end.

However, knowing the normal distribution of all finishing times for horses isn't going to help you with determining a speed rating for a single horse. A two year old horse will have had only a few performances with which to judge its speed and the horse has yet to fully develop. A five year old horse is fully developed but it is probably running slower than it did as a three year old. When you analyse all horses together then you are getting an average of every possible age, ability and condition. Again, that is not going to help you to determine the probable future speed of a horse in its next race.

Another probability distribution that may be of use is the beta distribution an example of which can be seen in the next chart (click to enlarge). This distribution is an interesting one as it uses two shape parameters (alpha and beta) that yield many different kinds of distribution curves. The one below is the one that interests me most with regard to speed ratings.

What this distribution is saying is that it is easier for a horse to underperform in comparison to its average ability but a lot harder to better it. You can't apply a normal distribution to a single horse otherwise you would be stating that a horse has an equal probability of outperforming or underperforming with respect to its average performance. Benter's video shows a normal distribution (at around the 28 minute mark), which is why I think it is just a hurried presentation slide and not an actual finishing time probability curve. (Note that the above beta distribution curve is for speed ratings and not finishing times.)

There are many reasons why a horse will underperform. Some examples are; the horse has an undiagnosed illness, the course conditions do not suit the horse or the jockey is under trainer's orders to pace itself or to go slow. The beta distribution best describes for me the probability of a horse's chance of performing to its known ability. The majority of the time it will perform close to its average ability and the rest of the time it is more likely to underperform than overperform.

You can play around with the beta distribution using a spreadsheet, as in the following image (click to enlarge). In cell C5 you can see the cumulative beta distribution calculated using two values, Alpha and Beta, which you can play around with to see how they affect the shape of the curve. The calculation takes the value in the B column, applies alpha and beta and provides a cumulative beta distribution. To create the chart you will have to perform a delta on the cumulative probabilities (i.e. subtract one cumulative value from the previous), which will give you a series of deltas in column D. Plot the deltas as a line chart to give a distribution similar to the one above.

How do you determine the beta distribution for a racehorse? A good question and one that I do not have an exact answer for at this moment in time. You will probably have to perform something similar to a Kernel Density Estimation (other methods are available) on the discrete data to give a distribution. My idea is to create a range of standard horses depending on age and class rather than standard times for each course (as Mordin does) depending on age and ability. Two year old horses aren't going to have much previous form and their time distribution is going to be quite narrow and probably dependent on foaling month as much as anything else.

What I would like to achieve is a set of figures that have no subjective element in them at all. Is that possible? I don't know. Weather, trainer's orders, a horse's free will all conspire to make the task very difficult indeed. For now this project is on my long list of things to do. I shall return to it at some point in the future when paid work is thin on the ground.

See Also

I recommend that you read Nick Mordin's Mordin on Time, which contains a wealth of data on speed ratings derived from finishing distances. From information on lengths per second and this book you will be able to create your own speed rating for any horse.

Bioenergetics and Racehorse Ratings is an alternate approach to creating speed ratings. Using the measurement of human performance as a template the book attempts to do the same with horses. Measuring the speed of humans is considerably easier because they run on standard tracks. Horses are far harder to evaluate because they run at varying distances and on tracks of which no two are alike. This book shows how to develop a model for the performance of race horses and from there speed ratings are generated.