Continuous versus Discrete Distributions

I always have a problem with continuous distributions. Not least because I am self-taught in maths. Calculus is something I can easily forget overnight, requiring me to relearn it again the following day.

Also, I am a firm believer in the digital universe and that at the quantum level everything is discrete.

Any machine learning professor will say, "Optimisation in financial markets uses continuous variables and distributions." I disagree. Look at Betfair with its discrete odds and discrete wagering amounts. You can't bet £10.00001 on a 999.99 long shot.

Your bet is bounded from zero to whatever the maximum capacity of your chosen market is, in increments of 1p. There are just 320 possible odds values. Betting on Betfair or buying n shares at 57p each is a discrete process.

Horse races are measured to one hundredth of a second. Why calculate with all the precision available in the double data type when a horse is either going to run in 71.45 or 71.46 seconds but not 71.452849105836 seconds?

Your horse isn't going to run any faster than is bio-mechanically possible or any slower than 0 seconds so why the need for continuous distributions?

All of this discreteness makes a search space that much smaller, that much easier to search and that much easier to calculate over with a discrete classifier rather than something continuous and that much more complex.

Currently, I am exercising my mind with speed ratings for race horses. I want to avoid two things; subjective decision making and overly complicated probability distributions.

The subjective decision making is for another post so let's talk probability distributions. Much of the natural world (of which horse racing is a part of) involves normal distributions (rather the human perceived analogue of whatever the quantum universe throws up).

I would expect the running times for the set of all horses that have ridden a given distance on a particular course to be normally distributed. However, I would not expect any single horse to have a normally distributed set of running performances.

Why? Because it is easier for a horse to run slower than its career best time than to run quicker. Of course, that depends on the age and ability of the horse. If it is a two year old then its best performances are yet to come. If it is a 5 year old then its best performances are in the past. However, at any one time the distribution of performances is going to be skewed, one way or another.

A horse will have periods of poor fitness or bad luck so I would assume some sort of skewed distribution. What that distribution is remains to be discovered but it won't be normal. It will be one of those other distributions like a Beta or a PERT or something else.

Why bother looking for it? Why not run a Kernel Density Estimator or some other density estimator. Well, that's for me to work out over the coming months. And, while I'm at it, I won't be second-guessing with any subjective nonsense.

Every book I have read on speed figures involves leaving out some figures because "they don't look right". Who am I to know if a horse has had a bad day or an exceptional day? There has to be a better objective method or a system that is more accepting of outliers.


Thinking about it some more, throughout a horse's career the set of performances could well be normally distributed.

Early on and late on in the horse's career performances maybe skewed but taken together the skews might cancel itself out.

The problem is that a single horse never has enough performances to be statistically significant on their own.