Improving intuition on applying the likelihood ratio test

In summary, likelihood ratios are used to assess the probability of one hypothesis being true compared to another. It is easier to understand how the logarithm changes as the probability increases. The Chi-Square distribution is a result of the log-likelihood, with higher probabilities indicating a better likelihood of a hypothesis being true. The MLE is an optimization problem to find the greatest probability given sample data. The probability distributions can be estimated from the data, constructed from assumptions, or based on first principles. The CLT is often used in normal distribution statistics, and the likelihood function is derived by considering the process itself.
  • #1
TheCanadian
367
13
I am trying to better understand likelihood ratio test and have found a few helpful resources that explicitly solve problems, but was just curious if you have any more to recommend. Links that perhaps work out full problems and also nicely explain the theory. Similar links you have found illuminating for the Wald and Lagrange multiplier tests would also be of much interest!
 
  • Like
Likes K. Doc Holiday
Physics news on Phys.org
  • #2
Hey TheCanadian.

The likelihood ratios are just probabilities with respect to each other. You have one probability for one hypothesis [given the data] and another hypothesis [given the data].

It would be easier to assess the log likelihood and to understand how the logarithm changes as the probability increases.

You should find that as the probability decreases the negative of the log of the likelihood increases meaning that you get a massive chi-squared statistic and this means that it is not going to be likely that the model fits based on the data you have.

Just remember that the probability is a probability of getting a particular parameter estimate given the sample data [you take sample data and you estimate a parameter based on that sample data].

The Chi-Square distribution is a statistical result of the log-likelihood but the intuition behind interpreting the actual value is that higher probabilities correspond to better likelihood of a hypothesis being true [at least more evidence shown that it is] and you are in essence using the two probabilities to compare [relatively] just how one hypothesis is going to be better with respect to another.

If the log-likelihood is confusing then just think about then the different probabilities are greater or less than each other and how one is either close to zero or close to one.
 
  • Like
Likes WWGD and K. Doc Holiday
  • #3
chiro said:
Hey TheCanadian.

The likelihood ratios are just probabilities with respect to each other. You have one probability for one hypothesis [given the data] and another hypothesis [given the data].

It would be easier to assess the log likelihood and to understand how the logarithm changes as the probability increases.

You should find that as the probability decreases the negative of the log of the likelihood increases meaning that you get a massive chi-squared statistic and this means that it is not going to be likely that the model fits based on the data you have.

Just remember that the probability is a probability of getting a particular parameter estimate given the sample data [you take sample data and you estimate a parameter based on that sample data].

The Chi-Square distribution is a statistical result of the log-likelihood but the intuition behind interpreting the actual value is that higher probabilities correspond to better likelihood of a hypothesis being true [at least more evidence shown that it is] and you are in essence using the two probabilities to compare [relatively] just how one hypothesis is going to be better with respect to another.

If the log-likelihood is confusing then just think about then the different probabilities are greater or less than each other and how one is either close to zero or close to one.

Thank you for the response. I guess my questions largely lie in how one constructs the probability distributions themselves. For example, in the first link, they state that the maximum likelihood estimate of ##\mu## is given by ## L(\overline{x}) =
\prod_{i=1}^{n} \frac {1}{\sqrt{2\pi \overline{\sigma}}} e^{\frac{(x_i -\overline{x})^2}{2\overline{\sigma}^2}} ##. Although why these are valid maximum likelihood estimates is not very clear to me.
 
  • #4
The MLE is an optimization problem to find when the probability is greatest given the sample data.

The probability distributions are either estimated from the data, constructed from assumptions, or can be statistical distributions that are the study of statistical inference.
 
  • #5
TheCanadian said:
I guess my questions largely lie in how one constructs the probability distributions themselves.

Previous experience suggests that the volume X, the volume in fluid ounces of a randomly selected jar of the company's honey is normally distributed with a known variance of 2.
 
  • #6
Do you know the Central Limit Theorem? This will be useful to understand a lot of normal distribution statistics.

With MLE you start out with a likelihood function that is either derived or flat out assumed. The derivation is done on first principles of probability modeling [a good example is a binomial distribution for counts of independent events or a Poisson for rates].

You will need to give us more information to assess how the likelihood is derived if it uses a first principles approach or if it's just assumed.
 
  • Like
Likes WWGD and TheCanadian
  • #7
chiro said:
Do you know the Central Limit Theorem? This will be useful to understand a lot of normal distribution statistics.

With MLE you start out with a likelihood function that is either derived or flat out assumed. The derivation is done on first principles of probability modeling [a good example is a binomial distribution for counts of independent events or a Poisson for rates].

You will need to give us more information to assess how the likelihood is derived if it uses a first principles approach or if it's just assumed.

I am aware of the Central Limit Theorem. So it appears you assume a model and continue adjusting/adding parameters such that your model matches observations?
 
  • #8
For MLE you assume that every sample point has a distribution and use that to optimize the likelihood that gives the parameters [that you are estimating] to maximize it.

It's a lot like maximizing a cost function or some other attribute - here you are optimizing the probability value given a sample with respect to a parameter you are estimating.

I mention the CLT because it says that given enough information you can approximate any estimator by a Normal distribution and all large scale statistics just assume that and use the Normal for statistical inference.

The likelihood is often chosen by thinking about the process itself and deriving a likelihood function based on those attributes. You can just estimate the distribution and update it from the data but it will lack the fundamentals of a first principles approach since you deduce the likelihood from beliefs and ideas which give context to the data as opposed to just taking it and using the data by itself.
 
  • Like
Likes WWGD and TheCanadian
  • #9
chiro said:
For MLE you assume that every sample point has a distribution and use that to optimize the likelihood that gives the parameters [that you are estimating] to maximize it.

It's a lot like maximizing a cost function or some other attribute - here you are optimizing the probability value given a sample with respect to a parameter you are estimating.

I mention the CLT because it says that given enough information you can approximate any estimator by a Normal distribution and all large scale statistics just assume that and use the Normal for statistical inference.

The likelihood is often chosen by thinking about the process itself and deriving a likelihood function based on those attributes. You can just estimate the distribution and update it from the data but it will lack the fundamentals of a first principles approach since you deduce the likelihood from beliefs and ideas which give context to the data as opposed to just taking it and using the data by itself.
How do you approximate anything other than the sampling mean with the CLT?
 
  • #10
It's important to realize that the word "liklihood" is used because "liklihood" is not the same thing as "probability". When ##f(x)## is a probability density function, its evaluation ##f(a)## at a number ##x = a## is not a probability. The value ##f(a)## is a probability density. That is what "liklihood" means.

TheCanadian said:
For example, in the first link, they state that the maximum likelihood estimate of ##\mu## is given by ## L(\overline{x}) =
\prod_{i=1}^{n} \frac {1}{\sqrt{2\pi \overline{\sigma}}} e^{\frac{(x_i -\overline{x})^2}{2\overline{\sigma}^2}} ##. Although why these are valid maximum likelihood estimates is not very clear to me.
Let ##g(y) = \prod_{i=1}^{n} \frac {1}{\sqrt{2\pi \overline{\sigma}}} e^{\frac{(x_i -y)^2}
{2\overline{\sigma}^2}} ##. What value of ##y## maximizes ##g(y)##? Is it clear that this is the mathematical question? As to why the answer is ##y_{max} = \frac{ \sum_{i=1}^n x_i}{n}##, it isn't a conclusion from a general principle of some sort. The answer comes from doing the math to maximize the particular function ##g(y)##. We could try to work out that math, if that is your question.

Or are you asking why ##g(y)## is the joint probability density for the measured data?

Hypothesis tests are subjective. A subjective line of thinking about the maximum liklihood test is that we should not reject the null hypothesis about a parameter value unless there is a different parameter value that makes the data a lot more probable. Since "liklihood" doesn't mean "probability", we must be careful in applying this intuition to probability density functions that are multi-modal or that take on maximum values at some number ##x = a## and then fall off sharply around ##x = a##. When such things happen, the maximum liklihood at ##x = a## isn't a good representation of the probability that the random variable is approximately equal to ##a##.
 
Last edited:
  • #11
My understanding is that the likelihood function is the density of the sample data in function of ( unknown) population parameters ## \theta_i##, i.e., ## L(X_1, X_2,..,X_n; \theta_1, \theta_2,..,\theta_n)=P(X_1=x_1,...,X_n=x_n | \theta_1, \theta_2,..,\theta_n) ## and estimators obtained this way have nice properties like being, e.g., almost unbiased and with small variance. I believe in OLS, if errors (residuals) are IID ##(0, \sigma^2)## then the coefficients are the best likelihood estimators of the regression line. EDIT
 
Last edited:

FAQ: Improving intuition on applying the likelihood ratio test

1. What is the likelihood ratio test and how does it work?

The likelihood ratio test is a statistical method used to compare two competing statistical models. It is based on the likelihood function, which is a measure of how well a given model fits the data. The test compares the likelihood of the data under the null hypothesis (a simpler model) to the likelihood of the data under the alternative hypothesis (a more complex model). If the likelihood of the data under the alternative model is significantly higher, the null hypothesis is rejected in favor of the alternative model.

How can one improve their intuition on applying the likelihood ratio test?

To improve intuition on applying the likelihood ratio test, it is important to have a strong understanding of basic statistical concepts such as hypothesis testing, null and alternative hypotheses, and p-values. It is also helpful to practice using the test on different datasets and comparing the results to other statistical methods. Additionally, reading case studies and examples of the likelihood ratio test being applied in various fields can help build intuition.

What are the assumptions of the likelihood ratio test?

The likelihood ratio test assumes that the data follows a specific probability distribution, such as the normal distribution. It also assumes that the data is independent, meaning that the values of one observation do not affect the values of another. Additionally, the test assumes that the models being compared are nested, meaning that one model is a simplified version of the other.

When should the likelihood ratio test be used?

The likelihood ratio test is typically used when comparing two nested models, where one model is a simplified version of the other. It is commonly used in regression analysis, where a simple linear model is compared to a more complex model with additional variables. The test can also be used in other statistical analyses, such as comparing different probability distributions or comparing the fit of different machine learning models.

How can one interpret the results of the likelihood ratio test?

The results of the likelihood ratio test are typically presented as a p-value, which indicates the probability of obtaining the observed data under the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis and in favor of the alternative hypothesis. In general, if the p-value is less than a predetermined significance level (usually 0.05), the null hypothesis is rejected and the alternative hypothesis is supported. Additionally, the likelihood ratio test can also be used to calculate a likelihood ratio statistic, which can be compared to a critical value to determine the significance of the results.

Similar threads

Back
Top