Probability distribution of horses

In summary, the horse race has six contestants and each has an accurate time at 5 furlongs. However, the race is still uncertain and the outcome is unknown. The probability for each horse to win the race is unknown but can be approximated by a Monte Carlo simulation.
  • #1
cosmiccase
8
0
A horse race is going to take place with six runners.
The race is over 5 furlongs (1000 meters) and for each of the six contestants it is known that their probable times at this distance are:

horse 1: 57.00 sec
horse 2: 57.20 sec
horse 3: 57.35 sec
horse 4: 57.80 sec
horse 5: 58.10 sec
horse 6: 59.50 sec

But, as is always the case in horse races, these times are uncertain, so the outcome is unknown.
In fact each of the above times is accurate by plus or minus 0.50 seconds, i.e. for horse "1" there is a Gaussian distribution with mean 57.00 and standard deviation 0.5, for horse "2" there is a Gaussian with mean 57.20 and st. dev. 0.5 and so on.

What is the probability for each horse to win the race ?

There is an easy (but a little slow) answer that can be derived by Monte Carlo simulation using random numbers, but it's not what I 'm asking for.
Does anyone know a functional approximation for the winner's pdf ?
 
Physics news on Phys.org
  • #2
Relabel horses A ... F.

Define t(min\A) = min{t(B), ..., t(F)}. t(min\A) is a random variable whose distribution can be obtained from the distributions of t(B) ... t(F).

"A" wins if t(A) < t(min\A) or 0 < t(min\A) - t(A). Let d(A) = t(min\A) - t(A). The distribution of d(A) can be obtained from those of t(min\A) and t(A). Let Fd(A) be the CDF of d(A). The probability that A would win is 1 - Fd(A)(0).
 
  • #3
What is the distribution of t(min/A) then ?

Simulation gives the following results for the numbers in my example:

horse 1: 0.47
horse 2: 0.29
horse 3: 0.19
horse 4: 0.04
horse 5: 0.01
horse 6: 0.00

(plus-minus 0.01)
 
  • #4
Let [itex]\mathbb S[/itex] = {A, ..., F}.

Prob{t(min\A) < x} = Prob{min{t(B), ..., t(F)} < x} = Prob{not all {t(B), ..., t(F)} > x} = 1 - Prob{all {t(B), ..., t(F)} > x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] where [itex]\Phi_k[/itex] is the Gaussian CDF for t(k).
 
  • #5
Just integrate using numerical methods you will get
0.4702
0.2858
0.1888
0.0427
0.0125
1.8568*1e-6
 
  • #6
Can you write this down as a product of integrals ?
Is there a functional approximation when the means are Mi and the stds Si ?
 
  • #7
cosmiccase said:
Can you write this down as a product of integrals ?
Is there a functional approximation when the means are Mi and the stds Si ?
For the answer I posted, you certainly can write it as a product of integrals because each [itex]\Phi_k(x)[/itex] is an integral. There may be a functional approximation but I don't know what it would look like. If you simulate it you should be able to fit some polynomial function using regression analysis.
 
  • #8
integrals

EnumaElish imports some good looking math symbols.
Can one get those from the font menu ?

Anyway is it

P(A) = integral from 0 to infinity of {erf(t,Mb,Sb) x erf(t,Mc,Sc) ... x erf(t,Mf,Sf)} ?
 
  • #9
I myself like 'em symbols; aren't they cool? All one has to do is to click on a symbol or formula and read the TeX instructions. That's how I started using them in the first place.

You had asked the distribution of t(min\A}. It is Prob{t(min\A) < x} = [tex]1 - \prod_{k \in (\mathbb S\backslash A)}\left[1 - \Phi_k(x)\right][/tex] and you have to substitute the CDF formula or integral for each [itex]\Phi_k(x)[/itex] for [itex]k \in \mathbb S[/itex]\A = {B, C, D, E, F}.
 
Last edited:
  • #10
PROB(A) = [tex]\int[/tex] dt . fa(t) [tex]\prod[/tex](1-erf([tex]\mu[/tex],[tex]\sigma[/tex],t))

That does n't look great but I think this is it.
The integral is from 0 to infinity.
fa is the Gaussian of "horse A".
erf are the cdfs of the Gaussians of B-C-D-E-F.

Simlilarly for the others.
It's a hell of an integral though.
How do you tackle it using numerical methods ?
I 'm surprised there is no link to it -none I can find that is.
This problem could be encountered in component failure statistics also, could n't it ?
If you have n components of unequal age or durability, you probably want to know which ones need more frequent attention and how much.
 
Last edited:
  • #11
You are saying that Prob(A wins) = Prob(Others do not cross the finish line before time t) weighted by A's frequency and integrated over t.

But, where is the probability expression that "A crosses the finish line at or before time t"?

Non-substantive observations:
1. Isn't it customary to write the dt after the integrand?
2. Have you considered a distribution defined over t > 0 only, such as the "F" distribution?
3. It can be tackled in Mathematica or similar symbolic-numeric software.
 
  • #12
I 'm following your reasoning.
"A" can cross the finish line at any time 0 to infinity.
Practically, with a mean of 57.00 secs it can be 55.00 seconds (a record performence) to infinite (goes lame - stops !). In our ideal model where lameness-jockey accidents do not exist, for all intends and purposes the pdfs expire 2-3 seconds on either side of the mean.

The runners B-C-D-E-F are finishing at times <t in the integral and are factored out.
Did n't I do it right ?

Re. my use of tex you seem to do better as you have a description as well underneath your product symbol.

What is the "F" distribution ?

The true nature of this problem is that I have the mean values but I don't know what the sigmas might be (to any tolerable degree of accuracy).
So I want to fit them experimentally, if I have an approximate formula first.
 
  • #13
I guess your formula is right.

See F-distribution, http://planetmath.org/encyclopedia/FDistribution.html .
 
Last edited by a moderator:
  • #15
cosmiccase said:
I 'm not sure the formula is right.
If it's three random variables are you sure that:

f1.(1-erf2).(1-erf3)+ f2.(1-erf1).(1-erf3)+ f3.(1-erf1).(1-erf2)

integrates to 1 ?
Nope, I am not sure. Neither am I sure that this is what your previous formula comes to. If I were you I would try to prove it one way or the other.
 

FAQ: Probability distribution of horses

What is a probability distribution of horses?

A probability distribution of horses is a mathematical representation of the likelihood of different outcomes for a horse-related event or scenario. It shows the range of possible outcomes and the probability of each outcome occurring.

How is a probability distribution of horses calculated?

A probability distribution of horses is calculated by analyzing data on horse-related events, such as race results or breeding patterns. This data is used to determine the frequency of different outcomes and assign probabilities to each outcome.

What factors affect the probability distribution of horses?

There are several factors that can affect the probability distribution of horses, including the horse's breed, age, training, and health. Other factors such as track conditions, jockey ability, and race distance can also impact the distribution.

Why is understanding the probability distribution of horses important?

Understanding the probability distribution of horses can help horse owners, breeders, and bettors make more informed decisions. By knowing the likelihood of certain outcomes, they can better strategize and manage risks associated with horse-related activities.

Can the probability distribution of horses change over time?

Yes, the probability distribution of horses can change over time. Factors such as training, health, and racing conditions can all impact the likelihood of certain outcomes. Additionally, new data and information can also shift the probabilities in a different direction.

Back
Top