Creating a histogram and then applying a gaussian fit. Help

In summary: You should have a column with D values, and a corresponding column with frequencies (i.e. how many times D of that value occurred). For each value of D, you want to calculate the probability of the normal distribution, and multiply that by the total number of data points to get the number of points that would fall in that bin if the data were normally distributed. You can use the NORMDIST function for this. Then you can create a line chart with D on the x-axis and the calculated probabilities on the y-axis. This should overlay the normal distribution curve on your histogram.
  • #1
absolute3
4
0
Ok, I need to take the data:

Code:
D1	   D2
6.5	   3
6	   4
6.7	   5.5
7	   3.8
6.3	   4.5
8.6	   5.8
5.5	   4
7	   3.5
7	   4.5
7	   5
6.5	   4
6.8	   3.2
7	   3.6
6	   4.5
6	   2.8

and make a histogram (centered around 0 -- i.e. 0 will not be an edge of a bin) of the data and then fit a gaussian to the data.

This is the normal distribution equation:
http://img67.imageshack.us/img67/6014/equationma0.jpg

And I am to normalize the data using (D1-D2)/D1

I normally would not ask for help with something like this, but it is for a 1unit seminar and this was all the information I was given. I have never even had any statistics education. I tried looking up how to create a histogram and all of that, but just reached a state of utter confusion when it came to "bins."

Can someone at least point me in the right direction?
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."
 
  • #3
EnumaElish said:
After you calculate the "normalized" data, D = 1 - D2/D1, you are to create multiple bins that will make the D's look like the "bell curve." It can help to experiment with different bin sizes to make the histogram as similar to the normal curve as possible. I guess "fitting a gaussian to the data" is one way of saying "calculate the average and the standard deviation for the D's."


What exactly is a bin? And would you recommend doing this on Excel, or is there a superior program available online?
 
  • #4
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram
 
Last edited:
  • #5
EnumaElish said:
A bin is a subinterval. Say, your data were randomly distributed between -10 and +10. Then you could represent this as a single bin, which would appear as a single column representing "all data." This would not be a very informative histogram. Alternatively, you could have 20 bins each with unit length, i.e. [-10 to -9], ..., [+9 to +10]. Excel is as good as any for this problem.

See http://en.wikipedia.org/wiki/Histogram

Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?
 
  • #6
I suppose you should estimate mean and standard deviation, and use your estimates as the parameters "mu" and "sigma" in your ecuation, with A = 1/(sigma . sqrt(2 . pi)).

By the way, why use (D1-D2)/D1 instead of (D2-D1)/D1 ? Unless you deliberately want to change the sign of the difference.
 
Last edited:
  • #7
absolute3 said:
Last questions methinks:

1. How to I apply the equation for normal distribution to my data in Excel?
Use Excel functions AVERAGE and STDEV to calculate these parameters from data. Then use NORMDIST to calculate the probability:

From Excel help:
NORMDIST returns the normal distribution for the specified mean and standard deviation. This function has a very wide range of applications in statistics, including hypothesis testing.

Syntax

NORMDIST(x,mean,standard_dev,cumulative)

X is the value for which you want the distribution.

Mean is the arithmetic mean of the distribution.

Standard_dev is the standard deviation of the distribution.

Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMDIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.
 
  • #8
Alright, I have my histogram, as well as my Standard Deviation and Mean for the data.

My final question is how do I get excel to actually overlay the normal distribution curve over my existing histogram?

Do I have to use NORMDIST to return the normal distribution first? And if so, what does the parameter 'X' represent?
 
  • #9
X is equivalent to your "normalized" variable, which I named D above.
 

FAQ: Creating a histogram and then applying a gaussian fit. Help

How do I create a histogram?

To create a histogram, you first need to organize your data into intervals or bins. Then, you can use a graphing software or spreadsheet program to plot the frequency of data points within each bin. This will create a visual representation of the distribution of your data.

What is a gaussian fit?

A gaussian fit is a mathematical function that approximates the shape of a normal distribution curve. It is often used to describe the distribution of data points that follow a bell-shaped curve. This fitting process can help identify the central tendency and variability of a dataset.

Why is it important to apply a gaussian fit to a histogram?

Applying a gaussian fit to a histogram can help to better understand the distribution of the data. It can provide information about the central tendency, spread, and shape of the data. This can be useful in identifying any outliers or patterns in the data.

How do I apply a gaussian fit to a histogram?

To apply a gaussian fit to a histogram, you can use a graphing software or spreadsheet program that has a built-in function for fitting curves. Alternatively, you can manually calculate the parameters of the gaussian function and plot it on the histogram.

Can a gaussian fit be applied to any type of data distribution?

No, a gaussian fit is most commonly used for data that follows a normal distribution, or a bell-shaped curve. If your data does not exhibit this type of distribution, it may not be appropriate to apply a gaussian fit.

Back
Top