Continuous versus Discrete Distributions

I always have a problem with continuous distributions. Not least because I am self-taught in maths. Calculus is something I can easily forget overnight, requiring me to relearn it again the following day.

Also, I am a firm believer in the digital universe and that at the quantum level everything is discrete.

Any machine learning professor will say, "Optimisation in financial markets uses continuous variables and distributions." I disagree. Look at Betfair with its discrete odds and discrete wagering amounts. You can't bet £10.00001 on a 999.99 long shot.

Your bet is bounded from zero to whatever the maximum capacity of your chosen market is, in increments of 1p. There are just 320 possible odds values. Betting on Betfair or buying n shares at 57p each is a discrete process.

Horse races are measured to one hundredth of a second. Why calculate with all the precision available in the double data type when a horse is either going to run in 71.45 or 71.46 seconds but not 71.452849105836 seconds?

Your horse isn't going to run any faster than is bio-mechanically possible or any slower than 0 seconds so why the need for continuous distributions?

All of this discreteness makes a search space that much smaller, that much easier to search and that much easier to calculate over with a discrete classifier rather than something continuous and that much more complex.

Currently, I am exercising my mind with speed ratings for race horses. I want to avoid two things; subjective decision making and overly complicated probability distributions.

The subjective decision making is for another post so let's talk probability distributions. Much of the natural world (of which horse racing is a part of) involves normal distributions (rather the human perceived analogue of whatever the quantum universe throws up).

I would expect the running times for the set of all horses that have ridden a given distance on a particular course to be normally distributed. However, I would not expect any single horse to have a normally distributed set of running performances.

Why? Because it is easier for a horse to run slower than its career best time than to run quicker. Of course, that depends on the age and ability of the horse. If it is a two year old then its best performances are yet to come. If it is a 5 year old then its best performances are in the past. However, at any one time the distribution of performances is going to be skewed, one way or another.

A horse will have periods of poor fitness or bad luck so I would assume some sort of skewed distribution. What that distribution is remains to be discovered but it won't be normal. It will be one of those other distributions like a Beta or a PERT or something else.

Why bother looking for it? Why not run a Kernel Density Estimator or some other density estimator. Well, that's for me to work out over the coming months. And, while I'm at it, I won't be second-guessing with any subjective nonsense.

Every book I have read on speed figures involves leaving out some figures because "they don't look right". Who am I to know if a horse has had a bad day or an exceptional day? There has to be a better objective method or a system that is more accepting of outliers.

Addendum

Thinking about it some more, throughout a horse's career the set of performances could well be normally distributed.

Early on and late on in the horse's career performances maybe skewed but taken together the skews might cancel itself out.

The problem is that a single horse never has enough performances to be statistically significant on their own.

2 comments:

  1. Nice article on speed.

    Just a thought, but perhaps there is an a way to look at the individual skew of a horses speed performances versus career starts / age.

    If you look at all performances then the distribution is going to be normal as you suspect (or close enough).
    And rightly so, on individual perfromances there just isn't enough runs to work out the correct distribution. However, you could break it down into discrete groups based on career runs and/or age. Then at least you would be one step closer.

    I'm sure there would be other factors such as runs from spell and distance to consider in order to overcome the problem of fitness.

    Anyway just a thought.

    ReplyDelete
  2. Thanks for the comment.

    What you propose is behind the back burner for me, as it is fundamental analysis.

    However, when I get a chance then I will be looking at what you suggest.

    My thought is to model "the ideal horse" accounting for a complete career. For any real horse, to determine where in its development it is and what it's potential is supposed to be given its age, weight, previous runs, days since last race etc.

    If we assume the ideal horse is rated 100 then lesser horses will be < 100 and better horses > 100.

    Maybe I won't uncover anything new and am just repeating Bayer/Mordin but by a different route. Still, it will be an interesting exercise.

    As I have said elsewhere, I am more of a technical trader and am loathe to go down the fundamental route, if I believe the prices to be as efficient as needs be.

    ReplyDelete