Estimate the source of a variable when there are several distributions

In summary: Thank you!This sounds like an application best suited for maximum likelihoodMaybe Bayes?Maybe Bayes?Good idea, particularly if there is some prior information about which volumes are more probable.I'm always in favor of Bayesian inference if you figure out how to do it. Gaussian distributions give you priors that are fairly easy to manipulate. If not, and because I'm pretty sure that your process is linear, then a minimum variance estimator is a good fallback. Look up BLUE (Best Linear Unbiased Estimator) as used in, e.g., "spectral unmixing."Thank you very much for the super fast and great answers!They are very helpful.
  • #1
uzi kiko
22
3
Hello everyone

In my study, I inject a different amount of fluid in each experiment, such as 1 ml, 2 ml..., and test the change in the general dielectric properties of the solution.

Now that I have done much (over 100) measurements for each injection in a specific volume, one can see that for each injection the values of the new dielectric properties have a normal distribution with specified mean and variance.

For example: for injection of 1 ml, the results distribute with μ1 and σ1, for injection of 2 ml we get μ2 and σ2. When there is some overlap between the various distributions.

Now suppose I get an X value of dielectric property, what is the correct way to estimate the probability that the value X came from a specific injection volume?

Seemingly, I could use the Z test for each distribution separately, but it seems to me that the Z test does not take into account the other distributions.

Thank you!
 
Physics news on Phys.org
  • #2
This sounds like an application best suited for maximum likelihood
 
  • #3
Maybe Bayes?
 
  • Like
Likes Dale
  • #4
WWGD said:
Maybe Bayes?
Good idea, particularly if there is some prior information about which volumes are more probable.
 
  • Like
Likes WWGD
  • #5
I'm always in favor of Bayesian inference if you figure out how to do it. Gaussian distributions give you priors that are fairly easy to manipulate. If not, and because I'm pretty sure that your process is linear, then a minimum variance estimator is a good fallback. Look up BLUE (Best Linear Unbiased Estimator) as used in, e.g., "spectral unmixing."
 
  • Like
Likes Dale
  • #6
Thank you very much for the super fast and great answers!
They are very helpful.

Let's say that I used MLE and came out with the most likelihood model.
For example, let say that for injection of 1ml the values will distribute as X~N(0,1) and for 2ml X~N(1,1).

Now, after I calculated the MLE for the newly observed data, I came out with an estimation about μ and σ, let's say for the example that I reach, μ=1.1 and σ=0.9.

My question is, how can I calculate the probability that the observed data came from N(1,1)?
 
  • #7
Assuming the problem is linear, applying linear regression to the mean values gives you a first-order estimate. In the example above, the estimated volume would be 2.1 ml (not N(1,1)), and confidence (probability) can be inferred from the correlation coefficient. You need to first verify from your data that your system is linear.

To get a true probability, you could try applying Bayesian inference. There are other tests as well, but someone versed in classical statistics would know better than I.
 
Last edited:
  • #8
uzi kiko said:
In my study, I inject a different amount of fluid in each experiment, such as 1 ml, 2 ml..., and test the change in the general dielectric properties of the solution.

Now suppose I get an X value of dielectric property, what is the correct way to estimate the probability that the value X came from a specific injection volume?

This is not a "well defined" mathematical quesion. (It's similar to asking how one finds the remaining sides of a triangle when given the length of one side and the size of one angle.)

One way to interpret "I get an X value of a dielectric property" is that you pick the X value at random from all the X values that you measured in your experiments, giving all the measured values an equal probability of being chosen. Let's suppose you did the same number of experiments with a 2 ml concentration as with the other oncentrations.

Another way to interpret "I get an X value of a dielectric property" is that another person who did experiments tells you the X value. He picks this value at random from the experiments that he did. Perhaps he did many more tests using a 2 ml concentration than the other concentrations.

The answer to "the probability the value X came from the specific injection volume" depends on knowing the probability distribution for how the injection volumes are selected. If you can specify a specific distribution for how the injection volumes are selected ( a "prior distribution") then we can compute things like "The probability the injection volume is 2 ml given X = 3.1".
 

FAQ: Estimate the source of a variable when there are several distributions

How do I determine the source of a variable when there are several distributions?

To estimate the source of a variable when there are several distributions, you can use statistical methods such as hypothesis testing and regression analysis. These methods can help you identify which distribution has the strongest relationship with the variable in question.

What factors should I consider when estimating the source of a variable?

When estimating the source of a variable, it is important to consider the characteristics of the distributions, such as their shape, spread, and central tendency. You should also consider the size of the samples and any potential confounding variables that may affect the relationship between the variable and the distributions.

Can I use visualizations to estimate the source of a variable?

Yes, visualizations such as histograms, box plots, and scatter plots can be useful in estimating the source of a variable. These visualizations can help you compare the distributions and identify any patterns or relationships between the variable and the distributions.

How can I determine if a variable is influenced by multiple distributions?

If a variable is influenced by multiple distributions, you may see overlapping or intersecting distributions in your visualizations. Additionally, statistical tests such as ANOVA or multiple regression can help you determine the extent to which each distribution contributes to the variability of the variable.

Are there any limitations to estimating the source of a variable when there are several distributions?

Yes, there are several limitations to consider when estimating the source of a variable with multiple distributions. These include the assumption of normality, the presence of outliers, and the potential for confounding variables. It is important to carefully consider these limitations and use appropriate methods to address them in your analysis.

Similar threads

Replies
30
Views
3K
Replies
12
Views
2K
Replies
4
Views
2K
Replies
3
Views
1K
Replies
5
Views
2K
Replies
2
Views
1K
Back
Top