Combining probabilistic forecasts

cosmicminer · May 8, 2015

I 'm reviving this thread

https://www.physicsforums.com/threads/unorthodox-probability-theory.471171/

(it says not open for further replies).

The last post:

Each of the two forecasters, weatherman and Indian, follows his own set of methods.
We can't tell to what extent those methods are the same.
The ultimate case of independence is this:
Weatherman goes to sleep every night and he is visited by good fairy Bruchilde. Bruchilde knows what is going to happen but she tells the truth in the dream with probability p using a random number generator in her laptop. Weatherman, who is really Bruchilde's spokesperson, proceeds to tell us his view when he wakes up and naturally he scores p percent of the time.
Indian similarly goes to sleep every night and he is visited by good fairy Matilde, who also knows what is going to happen and she reveals the truth to the Indian with probability q, using another laptop.

In a situation such as this, at the end of the proceedings if the event A=Rain occurs N times, it will be scored by both N*p*q times and missed by both N*(1-p)*(1-q) times (plus-minus the random fluctuation).

But on any given day there are only three types of contest going on:

The AA v. BB, the AB v. BA or the BA v. AB.

So for the case of prediction AA v. prediction BB, it's PROB(A) = f(p,q) = p*q / (p*q + (1-p)*(1-q)).
While for the other cases the symmetric formulas apply.

In the real world now where there are no fairies, the real problem has to be attacked.
This in my opinion ought to be done as follows:

Suppose it is true that p > q (weatherman somewhat superior).
Then we write q' = 0.5 + L*(q-0.5) = q(L)
L is a number between 0 and 1. For L = 0 q' = 0.5 while for L = 1, q' = q.

Then f(p,q) becomes f(p,q') = p* (0.5+L*(q-.5))/(p(0.5+L*(q-0.5))+(1-p)* (0.5-L*(q-0.5)))

If L = 0 then f = p (meaning Indian does 't count). If L = 1 then both count.

We write down the daily forecasts and the daily outcomes.
We measure the p and q values.
We let L = 1 and we compute Y = log(f of the posteriously known correct guess) every time and we add the quantities Y. Then we compute I = exp (-sum of Yes/no of records).
Then we repeat with L = 0.99, 0.98 ... down to L = 0.
The value of L that makes I maximum is our best estimator.

This can be done with more than two participants in the contest also.
Are there any better ideas ?

I made some tests with this using real data and it works.
It gives almost the same results as the logarithmic pool method (see here for example:
http://homepages.inf.ed.ac.uk/miles/papers/acl05a.pdf).

Furthermore the method can be generalised to m events, n predictors in stepwise mode (take one indian at a time).
My question is the bolded text (the empirical importance functions). Can they be improved ?

micromass · May 8, 2015

You should check out the basics of Bayesian statistics. This provides the framework for this very idea.

cosmicminer · May 8, 2015

micromass said:

You should check out the basics of Bayesian statistics. This provides the framework for this very idea.

Bayesian logic was used.
Also the logarithmic pool formula maximizes entropy, which is really a long series of Bayesian experments.

What the tests really do is find the parameter or parameters that maximize entropy, because we don't really know what those forecasters do.
Maybe they just copy each other.
Maybe only one of them really works to accomplish the task we have assigned them and the others make random guesses.
Maybe something in between.

Stephen Tashi · May 9, 2015

I agree that the general questions in the old thread are interesting, but the first post in the old thread https://www.physicsforums.com/threads/unorthodox-probability-theory.471171/ doesn't manage to state a specific mathematical problem.

Could it be that the correct probability in such situations is some function f(p,q) ? And if so how do we express it ?

There is not enough given information to compute "the correct" probability. The goal in such a practical problem is to "estimate" a probability. There are different kinds of statistical estimators and various criteria ( e.g. minimum variance, unbiased ,maximum liklihood) for what makes a given kind of estimator the "best" estimate. So what criteria shall we use ?

More fundamentally, the probability we are to estimate has not been defined. I think the goal is to estimate "The probability of rain on day N given the past history of the previous N-1 days with respect to whether it actually rained or not and the daily rain-versus-no-rain predictions given by the two forecasters".

If "p" and "q" indicate the respective probabilities of the two forecasters being correct, then the notation "f(p,q)" is not a good notation for the solution to the above problem because it implies we know p and q. However, what we know in the above problem is only the history of whether the forecasters were correct or incorrect. So we are given observed frequencies of correctness, not the probabilities of being correct.

(For an actual problem, weather prediction is not a good example, due to the difficulty of defining distinctions like "rain" vs "no rain" - e.g. how much rainfall must we have and where must we have it in order to say we had rain? Modern forecasts often state the probability of rain instead of making definite rain-vs-no-rain prediction.)

cosmicminer · May 9, 2015

Stephen Tashi said:

I agree that the general questions in the old thread are interesting, but the first post in the old thread https://www.physicsforums.com/threads/unorthodox-probability-theory.471171/ doesn't manage to state a specific mathematical problem.
There is not enough given information to compute "the correct" probability. The goal in such a practical problem is to "estimate" a probability. There are different kinds of statistical estimators and various criteria ( e.g. minimum variance, unbiased ,maximum liklihood) for what makes a given kind of estimator the "best" estimate. So what criteria shall we use ?

More fundamentally, the probability we are to estimate has not been defined. I think the goal is to estimate "The probability of rain on day N given the past history of the previous N-1 days with respect to whether it actually rained or not and the daily rain-versus-no-rain predictions given by the two forecasters".

If "p" and "q" indicate the respective probabilities of the two forecasters being correct, then the notation "f(p,q)" is not a good notation for the solution to the above problem because it implies we know p and q. However, what we know in the above problem is only the history of whether the forecasters were correct or incorrect. So we are given observed frequencies of correctness, not the probabilities of being correct.

(For an actual problem, weather prediction is not a good example, due to the difficulty of defining distinctions like "rain" vs "no rain" - e.g. how much rainfall must we have and where must we have it in order to say we had rain? Modern forecasts often state the probability of rain instead of making definite rain-vs-no-rain prediction.)

There are situations in which we do not universally agree on a physics model that gives the probability.
There are also situations in which we do have such a model and therefore we do not have to request the assistance of experts pools.

One situation for which Physics gives the answer is when a meteor is about to strike earth. Where will it land among the five seas and the five continents ? Unless the Coriolis force does something to it and affects the latitude of the drop point -which I don't think it does- it's the ratio of the areas to the total surface area of the planet. Pacific wins, it's a fact and we don't care what the pundits think ! Nevetheless it's always a chance event and it remains unclear until the stone actually does fall and it may as well skip the Pacific and go for the Atlantic.

One situation in which we do not have an accurate physics model is -or was rather- who was going win the 2014 football world cup in Brazil ?
There was a consensus of opinion in favour of the hosts Brazil I 'm afraid, but at the same time the pundits were thinking highly of the Germans too. Argentine was also liked but Holland not much so and if the hard fighting to the end Dutch had succeded, it was going to be a big surprise. Well the three teams out of those four were roughly equivalent but in the case of Brazil the pundits -or pool experts- made a mistake.
Nevertheless who were we to know ? We trusted the experts on the basis of their success rate in the past.

The absence of accurate Physics models makes us resort to the help of learned individuals who submit their opinions.
In such cases those people tend to know some maths and analyze some statistics, but it seems no one is possessing the ultimate physics model pertaining to the issue, that will take the matter beyond human opinions.

So the problem is to work out an average.
We look at the past performences of our experts and we agree to call "probability" the average worked out in such a way as to maximize the gain, in the past performences always.

So we ask the experts to give their estimates of the probabilities of the outcomes "Y1-Y2-Y3 ... Yn" of events of a certain type "X", where X is maybe "world cup competitions" and Y1-Y2-Y3 ... Yn the chances of each of the participants for the outright win.
The first one says (P1, P2, P3 ... Pn), the second says (Q1, Q2, Q3 ... Qn), the third (R1, R2, R3 ... Rn) and so on.
What is the average ?

Suppose we try some function of the type F(P,Q,R), which we call the "model function".
The best average is the one that maximizes the following quantity:

I = F1 * F2 * F3 * ...

Where F1, F2, F3 ... Fm were the probabilities assigned by the model to the now known outcomes of past experiments 1-2-3-4 ... m, conducted with the help of the very same experts.

For had we given to the world the F1, F2, F3 ... Fm as the "probabilities" for all those events, we would have given them the maximum for the composite event made up of all the past events put together and made into a single serial event.

The model that seems to work best than the others in the log-linear pool:
It goes like this:

Probability = C . Pi ^ a1 * Qi ^ a2 * Ri ^ a2 ............... (1)

where Pi, Qi, Ri, are the experts estimates, a, b, c ... are exponents and C is a normalizing constant so they all add up to 1.
Typically he best performing among the experts merits an exponent a close to 1 and the others lower exponents (so if one is totally useless he earns a zero).
To find a1, a2, a3 ... a simple number crashing routine usually helps you.

I encountered this problem in 1999 for the first time in sports statistics.
I used formula (1) but not knowing the names "logarithmic pool" etc, it just occurred to me or maybe I saw a footnote somewhere.
Then in 2010 I think it was I started this thread about the Indian and the weatherman, then in some operations research papers I found the terminology "pool" I use now -plus my older formula derived with Lagrange multipliers. What they seem to do however is they assume the log-probabilities add up linearly and hence arrive to formula (1), so there may be other ways too.
The alternative formulation I used -with the Indians- seems to also work well and some improvement to it may be possible.Now as for making the forecasters state their rain-no rain probabilities rather than "it will rain-it won't rain", as the last poster suggests, I agree.
It creates difficulties when they don't do that.
In the original problem (Indian - weatherman) I say the Indian has 85% success rate. So when he says something we all agree he is 85% true because we have measured him. But he does n't want to say it like "this time I 'm 100% sure, or 70% sure, or 90% sure". He just does n't like !
The sportcasters are the same. We can't gather them and make them speak with numbers and if we do reach out to them they will say "no !".
So we try to make the best out of the situation.

maswerte · Nov 19, 2015

The solution of the problem appears to be as I described it for the 2x2 case.
One of course has to take measurements to see what departure there is from the ideal case when the two predictions are not 100% independent.

But what if we have more than 2 predictions and more than 2 predictors ?
If we have n predictors then ok, it is:

Q = P1 . P2 . ... Pn / ( P1 . P2 . ... Pn + (1-P1).(1-P2). ... (1-Pn) )

(a continuation of the simple formula).
But what if there are more than 2 predictions ?
It baffles me.

Suppose the first weatherman says I see rain with probability A1, cloudy with probability A2 and 1 - A1 - A2 is the rain and the second weatherman says I see rain with probability B1, cloudy with probability B2 and 1 - B1 - B2 the rain. What is the formula now for the case of complete independence ?

maswerte · Nov 21, 2015

No, the difficulty here is to approximate q, for the realistic situations.

q' = 0.5 + L . (q - 0.5)

is wrong - good only for two predictions.

Here is a paper that says something:

http://www.math.canterbury.ac.nz/research/ucdms2003n6.pdf

maswerte · Nov 22, 2015

maswerte said:

No, the difficulty here is to approximate q, for the realistic situations.

q' = 0.5 + L . (q - 0.5)

is wrong - good only for two predictions.

Here is a paper that says something:

http://www.math.canterbury.ac.nz/research/ucdms2003n6.pdf

Wrong again.
The theory as I described it works and the above formula is -functionally- correct.
To better the results of the standard theories the problem seems to be the choice of the form of the q' function.

* I 'm sorry for writing some random things in my last post, but it's because I was a little confused

Combining probabilistic forecasts

FAQ: Combining probabilistic forecasts

What is the purpose of combining probabilistic forecasts?

How do you combine probabilistic forecasts?

Can combining probabilistic forecasts lead to better predictions than using a single forecast?

What are the benefits of combining probabilistic forecasts?

What are the limitations of combining probabilistic forecasts?

Similar threads

Hot Threads

Recent Insights