Working out Trend line on a Scatter Graph

  • MHB
  • Thread starter masterims
  • Start date
  • Tags
    Graph Line
In summary: Solver add-in) 6. Set the target to the sum-of-squares cell. 7. Set the changing cell to the cell containing $k$. 8. Hit Solve. 9. Read the value of $k$ from the changing cell. 10. Use that value of $k$ in the formula. 11. Copy the formula down for all of the rows. 12. Now, create a scatter plot of the data, and add a scatter plot of the calculated values. 13. Add a trendline to the calculated values, and make sure the equation is displayed. 14. Voila!In summary, the conversation discusses the process of finding the line of best fit or trend line for a
  • #1
masterims
4
0
Hi All,

Thank you for taking your time to look at my question. I am a web developer and have created a online system that produces graphs but cannot understand how the line of best fit (trend line) is calculated. On paper its easy, just draw it with your hand in between the two lines. But I don't know how to work this out using the values I have.

My Task:
Workout the flow rate (Q) of a pumping station. In other words find out how fast the water reduces in a well from the top to bottom.

Physical Process:
The engineer records the water level at the top of the well. At set intervals he/she records how far the water goes down until it reaches a certain point at the bottom.

My Working out:
I have the time and water level from the engineer. Using this information I work out the change in time at every level recorded and calculate the flow rate. I know how to do this, that's all done.

Now I need to plot flow rate on the x-axis and water level on the y-axis. Once plotted it looks like this:

View attachment 2733

The only thing I am stuck on is drawing the trend line (line of best fit).
How can I do this using the level and flow rate values I have? Please help me?

Here is the data for the graph which I worked out using time and level.

Test 1
PumpRate... Level (mAOD)
775.514 ... 2.098
395.693 ... 2.077
265.949 ... 2.035
203.317 ... 2.013
163.946 ... 1.971
138.37 ... 1.938
120.358 ...1.909
106.457 ...1.873
95.097 ...1.826
86.054 ...1.78
78.778 ...1.737
72.565 ...1.69
68.755 ...1.685
64.85 ...1.66
61.943 ...1.651
59.652 ...1.651
57.419 ...1.643
55.136 ...1.623
52.763 ...1.589
50.358 ...1.543
47.734 ...1.476
45.491 ...1.416
41.943 ...1.279
38.523 ...1.133
34.838 ...0.957
31.368 ...0.777
28.238 ...0.602
26.57 ...0.503
25.187 ...0.418
25.063 ...0.41
25.283 ...0.424
25.283 ...0.424
25.283 ...0.424Test 2
PumpRate ... Level (mAOD)
785.623 ...2.089
399.303 ...2.057
263.957 ...1.981
201.55 ...1.952
161.327 ...1.892
135.408 ...1.844
116.19 ...1.785
101.385 ...1.719
91.214 ...1.68
84.197 ...1.666
78.62 ...1.656
74.084 ...1.649
70.211 ...1.641
66.7 ...1.627
63.238 ...1.599
60.349 ...1.576
57.458 ...1.54
54.837 ...1.502
52.564 ...1.467
50.518 ...1.432
48.624 ...1.395
41.32 ...1.084
40.166 ...1.056
39.332 ...1.04
38.027 ...0.994
36.788 ...0.946
35.326 ...0.879
34.304 ...0.833
33.321 ...0.785
32.27 ...0.728
31.243 ...0.668
30.042 ...0.591
28.411 ...0.477
27.325 ...0.397

I can also provide you with the time and water level data the engineer has collected if that's required.

Many thanks.
 

Attachments

  • Graph.jpg
    Graph.jpg
    17.7 KB · Views: 89
Mathematics news on Phys.org
  • #2
I have several comments:

1. It looks like you've plotted this data in Excel. If so, right-click the actual data in the scatter plot, and select Add Trendline. Make sure you pick the function that's closest to what you want, and Display the Equation and the $R^2$ value (measure of how good the fit is).

2. It doesn't look at all linear to me. It looks more like exponential approach to me. You could fit a straight line to it, but I'm not sure how valuable that'd be. I'm not sure Excel can fit an exponential approach, but you can always jury-rig it with the Excel Solver routine.
 
  • #3
Hi Ackbach,

Many thanks for getting back to me. I've plotted the graph in excel, my program produces the same. I've tried to add trend lines in excel but I don't get the curve I need.

In these examples the two tests are almost the same but there will be occasions where one curve will be a lot lower than the other. This is why I need to work out an average that goes in between them. Therefore the average line can be used.Using excel this is what I get with a 'exponential curve':

View attachment 2742

This is what I get with a 'Logarithmic curve' (The average is outside the two tests, not in between):

View attachment 2743What I'm really asking is , is it even possible mathematically? I have looked into 'least squares method' but that did not work either. Is there a method in maths out there can can be used to solve this? I just need to know what that method is, I can read up on it and work it out.

This is what I need, I've drawn this using a curved line in excel. Is there a way mathematically to achieve this?

View attachment 2746
Many thanks.
 

Attachments

  • exponential.jpg
    exponential.jpg
    22.1 KB · Views: 68
  • Logarithmic.jpg
    Logarithmic.jpg
    21.7 KB · Views: 75
  • curve.jpg
    curve.jpg
    18.9 KB · Views: 83
  • #4
There is a way to do this with least squares. The issue is that the kind of function you need to assume is not present in Excel's lineup. I would think a function of this kind would fit well:

$$\text{mAOD}(Q)=\text{mAOD}_{\text{final}}-(\text{mAOD}_{\text{final}}-\text{mAOD}_{\text{initial}}) \, e^{-k(Q-Q_{\text{initial}})}.$$

How did I get this function? Well, I essentially just copied the form of the solution of Newton's Law of Cooling, where objects exponentially approach the environment's temperature.

Now, you really only have one unknown here. You can read off $\text{mAOD}_{\text{initial}}, \; \text{mAOD}_{\text{final}}, \;$ and $Q_{\text{initial}}$ from the table. For Test 1, for example, you have
\begin{align*}
\text{mAOD}_{\text{initial}}&=0.424 \\
\text{mAOD}_{\text{final}}&=2.098 \\
Q_{\text{initial}}&=25.283
\end{align*}

The unknown is $k$. So here's what you do in Excel:

1. Create a single cell where you're going to put $k$.

2. Create a column of numbers equal to $\text{mAOD}(Q)$ according to the above formula. Don't forget to use dollar-column-dollar-row format for the known values, so it doesn't shift when you copy the formula down. Copy it down for all of, say, Test 1.

3. Create another column to the right of that, where you put the sum-of-squares. The formula here will be $(\text{mAOD}_{\text{calculated}}-\text{mAOD}_{\text{measured}})^2$.

4. Sum the squares column at its bottom.

5. Open up the Solver Routine (This should be in Tools, although you may have to install this as an add-on if it's not already there - google how to do this; you might also need your installation DVD), and try to minimize the sum-of-squares cell that you computed in Step 4 by changing the cell containing $k$.

6. If you wish, plot the calculated column along with the measured column against the flow column to see graphically how good the fit was.

That's it! I can't tell you off-hand how to compute the $R^2$ value, but there should be a way to do that manually; just google it.
 
  • #5
I absolutely love problems like this, so don't feel like I'm over-doing it here. I went ahead and did the analogous calculations in LibreOffice Calc (you have to install a non-linear solver extension to make it work). Here are the results for Test 1: $k=0.0375183810469435$, with a resulting graph of

View attachment 2750

You can see that from $Q=50$ to $Q=300$, the values diverge a bit from the measured values. You could probably improve this a bit by allowing the solver routine to change the initial and final values. By forcing the model to accept the actual initial and final values, you're sort of "straight-jacketing" the solution.
 

Attachments

  • LibreOffice Solver Results.png
    LibreOffice Solver Results.png
    26 KB · Views: 74
  • #6
Here's a solution where I've allowed the Solver Routine to vary the initial and final values. As you can see, it does much better for the $Q\in[50,300]$ range, but does not do nearly as well at predicting the final value.

View attachment 2751

One more option: you could "glue together" two separate solutions, with the boundary right at the $Q=50$ mark, which is where an expected solution kind of has to change a bit. That could allow a better overall fit; you'd have a piece-wise defined function at that point.

Finally, you could also do cubic splines. There are routines to let you do that. Cubic splines are nice because they are continuous, and their first derivatives are continuous.
 

Attachments

  • LibreOffice Solver Results with Initial and Final Varying.png
    LibreOffice Solver Results with Initial and Final Varying.png
    25 KB · Views: 79
  • #7
Thank you Ackbach, I am going to have a look at all this and see how it goes. I will let you know how I get on. It does sound complex so it may take a while for me to get my head around it.

I would most grateful if you could send me the LibreOffice file you have created so i can go over it please?

Kind regards,
Imran
 
Last edited:
  • #8
masterims said:
Thank you Ackbach, I am going to have a look at all this and see how it goes. I will let you know how I get on. It does sound complex so it may take a while for me to get my head around it.

I would most grateful if you could send me the LibreOffice file you have created so i can go over it please?

Kind regards,
Imran

Check your email for the attached file.
 

FAQ: Working out Trend line on a Scatter Graph

How do I create a trend line on a scatter graph?

To create a trend line on a scatter graph, you will need to use a graphing software or tool that has the capability to add a trend line. In most cases, you can simply right-click on the scatter plot and select "add trend line" or a similar option. You can also manually calculate the trend line by using a line of best fit formula and plotting the points on the graph.

What is the purpose of a trend line on a scatter graph?

A trend line on a scatter graph is used to show the general direction of the data and to identify any patterns or trends in the data. It can also help to make predictions or forecasts based on the data.

How do I interpret a trend line on a scatter graph?

The slope of the trend line on a scatter graph indicates the direction and strength of the relationship between the two variables. A positive slope indicates a positive correlation, meaning that as one variable increases, the other variable also increases. A negative slope indicates a negative correlation, meaning that as one variable increases, the other variable decreases. The steeper the slope, the stronger the relationship between the two variables.

Can a trend line be used to make predictions?

Yes, a trend line on a scatter graph can be used to make predictions or forecasts based on the data. However, it is important to note that the accuracy of these predictions depends on the strength and consistency of the relationship between the variables. A strong and consistent relationship will result in more accurate predictions, while a weak or inconsistent relationship may lead to less reliable predictions.

Is it necessary to have a trend line on a scatter graph?

No, it is not necessary to have a trend line on a scatter graph. It is a tool that can be useful in analyzing and interpreting the data, but it is not essential. Some data may not have a clear trend or pattern, in which case a trend line may not be appropriate. It is important to consider the purpose and context of the data before deciding whether or not to include a trend line on a scatter graph.

Similar threads

Back
Top