Binned Maximum Likelihood fit in python?

In summary, the conversation revolved around using Python for least-squares fits and the desire to start using likelihood methods for fitting binned and unbinned data. The person found some documentation in Scipy but had difficulty making it work for a simple exponential. They also mentioned finding code on stackexchange and asking for any functionality in Python equivalent to curve_fit from Scipy for binned/unbinned likelihood fits. The expert provided guidance on using Scipy's rv_continuous class for maximum likelihood curve model fitting and suggested looking into a course or book on statistical methods with Python. The conversation ended with the person stating they were able to get it to work after normalizing the pdf and expressing their hope to continue using it.
  • #1
ORF
170
18
TL;DR Summary
Is there any built-in function to perform Binned Maximum Likelihood fit in python standard libraries?
Hi,

I have been using Python for a while now, but so far for Least-squares fits using curve_fit from Scipy.

I would like to start using Likelihood method to fit binned and unbinned data. I found some documentation in Scipy of how to implement unbinned likelihood fit, but I have not managed to make it work for a simple exponential...

Unbinned likelihood fit:
from scipy.stats import rv_continuous
import numpy as np

class myfunc_gen(rv_continuous):

    "Exp distribution"

    def _pdf(self, x,a):

        return np.exp(x*a)

myfunc = myfunc_gen(name='exp')

a = 1.
x = myfunc.rvs(a, size=10)
a1, loc1, scale1 = myfunc.fit(x, a, floc=0, fscale=1)

I found that Pandas has some fit capabilities, but still quite limiting.

Question: is there any functionality in python equivalent to curve_fit from Scipy for Binned/Unbinned likelihood fits?

Thank you for your time.

Cheers,
ORF
 
Technology news on Phys.org
  • #3
This is exactly what scipy-stats-rv-continuous-fit is for. Saying 'it doesn't work' is not going to find a solution, you need to be more specific:

Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?
 
  • #4
  • #5
Hi,

pbuk said:
Was the result not what you expected? Was it close but not accurate enough? Did it fail to execute? Does it run too slowly? Did your computer catch fire?

In this case the result is wrong.
For other simple fitting function it complains either convergence is very slow or directly it reaches 100 iterations and it stops without converging. Probably there is something wrong with my code.

Still, this method is for unbinned data. Is there any method for binned data?

Thank you for your time.

Cheers,
ORF
 
  • #6
ORF said:
In this case the result is wrong ... Probably there is something wrong with my code.
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

ORF said:
Still, this method is for unbinned data. Is there any method for binned data?
rv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
 
  • Like
Likes ORF
  • #8
Thanks, but that example is exactly as the one in the documentation (replacing beta distro by exp). I would like to do unbinned and binned likelihood fits using a custom pdf/fitting function.

Thank you for your time.
 
  • #9
pbuk said:
There is quite a lot wrong with your code. Take a look at how the expon distribution is defined in scipy's source. The key parts are:
Python:
    def _pdf(self, x):
        # expon.pdf(x) = exp(-x)
        return np.exp(-x)
Note that there is no scale parameter in there, _pdf must be defined with a scale factor of 1: you add the scale factor when creating an instance of the class or when calling its methods.

And note that the exponential PDF is not ## e^x ##.

Python:
expon = expon_gen(a=0.0, name='expon')
The exponential distribution is supported in ## [0, \infty) ## but the default support for a rv_continuous distribution is ## (-\infty, \infty) ##. This can be overriden with the (unhelpfully) named parameters a and b: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.htmlrv_histogram.fit?

I think you probably need a course or a book on statistical methods with Python, unfortunately I can't recommend any.
Hi, thanks for your very complete explanation. After normalizing the pdf it converges nicely.

I hope I can continue from it.

Thanks,
ORF
 
  • Like
Likes pbuk
  • #10
ORF said:
I hope I can continue from it.
Do let us know how you get on with rv_histogram.fit for your binned data - I have never used it, but it looks like it should work similarly to the continuous fit once configured properly.
 

FAQ: Binned Maximum Likelihood fit in python?

What is a binned maximum likelihood fit?

A binned maximum likelihood fit is a statistical method used to estimate the parameters of a probability distribution by comparing the observed data to the expected data from the distribution. It involves dividing the data into bins and maximizing the likelihood of the data falling within each bin, given the assumed distribution.

How is a binned maximum likelihood fit performed in python?

In python, a binned maximum likelihood fit can be performed using the scipy.optimize.curve_fit function. This function takes the data and the assumed distribution as inputs and returns the estimated parameters of the distribution.

What are the advantages of using a binned maximum likelihood fit?

One advantage of using a binned maximum likelihood fit is that it can handle data that is not normally distributed. It also takes into account the uncertainties in the data, making it a more accurate method for parameter estimation.

Are there any limitations to using a binned maximum likelihood fit?

One limitation of using a binned maximum likelihood fit is that it requires a large amount of data in order to accurately estimate the parameters of the distribution. It also assumes that the data is independent and identically distributed.

How can the results of a binned maximum likelihood fit be interpreted?

The results of a binned maximum likelihood fit can be interpreted as the estimated parameters of the assumed distribution. These parameters can be used to make predictions or further analyze the data. It is also important to assess the goodness of fit, such as with a chi-square test, to determine the validity of the fit.

Similar threads

Replies
6
Views
6K
Replies
6
Views
2K
Replies
4
Views
2K
Replies
6
Views
6K
Replies
1
Views
2K
Replies
1
Views
1K
Back
Top