Least-squares - fitting an inverse function

In summary, a user is having trouble fitting a set of data points to an inverse function due to instability and dependence on initial guess. They have tried different parameterizations and found some success, but are still experiencing difficulties. Other users suggest trying different methods, such as directly solving the problem or using QR-factorization or Singular Value Decomposition. The user is also trying to measure distance using a capacitive sensor in real time and is looking for a streaming algorithm.
  • #1
kandelabr
113
0
I have a set of data points that I must fit to an inverse function like this:

y(x) = a/(x+b) + c

My problem is that least-squares fitting using this equation is extremely unstable and heavily dependent on initial guess. No matter how accurate parameters I start with, the algorithm often goes berserk (trying 3.5e+58 instead of 1.2 etc.). It also doesn't matter which algorithm I use.

I guess there must be some mathematical preconditioning voodoo or something that I could use, but I can't find anything that works for me.

Any ideas?

Thanks!
 
Mathematics news on Phys.org
  • #2
Try parameterizing it differently, but in a way that is equivalent.

Maybe: y(x) = a/(x/b + 1) + c

Try some different ones until you find something stable.
 
  • #3
kandelabr said:
I have a set of data points that I must fit to an inverse function like this:

y(x) = a/(x+b) + c

My problem is that least-squares fitting using this equation is extremely unstable and heavily dependent on initial guess. No matter how accurate parameters I start with, the algorithm often goes berserk (trying 3.5e+58 instead of 1.2 etc.). It also doesn't matter which algorithm I use.

I guess there must be some mathematical preconditioning voodoo or something that I could use, but I can't find anything that works for me.

Any ideas?

Thanks!
It depends sometimes on the data you are trying to fit.

If the same data set contains 3.5E+58 and 1.2, you're probably going to run into problems with scaling the data. Sometimes, these problems can be minimized, but you'll have to provide more information about the problems you are experiencing and the data you are using.
 
  • #4
By the way- that is NOT an "inverse" function, it is a "rational" function. The word "inverse", applied to functions means the inverse of the composition of two functions, not the reciprocal.
 
  • #5
Whoa, sorry for late reply. Let me answer all of you at once.

Dr. Courtney said:
Try parameterizing it differently, but in a way that is equivalent.

Maybe: y(x) = a/(x/b + 1) + c

Try some different ones until you find something stable.

I have tried, and voila, it's much better. I have also found out that I can fix one parameter and only fit the other. That makes everything very simple.
This is a calibration of a somewhat simplified capacitive displacement sensor. It turned out that those simplifications bring complications with everything else, so we abandoned this type of sensing and went straight to strain gauges. :) (we'll still be using the same calibration procedure in the non-simplified versions)

SteamKing said:
It depends sometimes on the data you are trying to fit.

If the same data set contains 3.5E+58 and 1.2, you're probably going to run into problems with scaling the data. Sometimes, these problems can be minimized, but you'll have to provide more information about the problems you are experiencing and the data you are using.

You're right. But it's what results contain, not my data set. I just wanted to point out that if you get a value this high but you're expecting something 'normal', you know something went wrong.

HallsofIvy said:
By the way- that is NOT an "inverse" function, it is a "rational" function. The word "inverse", applied to functions means the inverse of the composition of two functions, not the reciprocal.

You're also right. I have no idea where I got that name from. It's an inverse of y(x) = x, though :)Thank you all for your answers. I'll be having trouble in the future, so we'll discuss more later :)
 
  • #6
kandelabr said:
y(x) = a/(x+b) + c
[...]
Any ideas?

With

[itex]x \cdot y = a + b \cdot c - b \cdot y + c \cdot x[/itex]

you don't need an algorithm at all. But as it results in another minimum you would need to check if it works with your data.
 
  • #7
what do you mean with "don't need an algorithm"? I'm solving overdetermined system of equations using least squares. am i missing something?
 
  • #8
A direct method to solve the problem (no iterative process, no initial guess required) is shown in attachment.
This refers to §.5 of the paper :

https://fr.scribd.com/doc/14819165/Regressions-coniques-quadriques-circulaire-spherique

Since this paper is in French, the equations are fully written in attachment. So one can apply the method without reading the referenced paper.

Sorry, I don't succeed to joint the attachment. If interested, send me a private message.
 
  • #9
looks interesting, i'll check it out.

thanks
 
  • #10
kandelabr said:
what do you mean with "don't need an algorithm"? I'm solving overdetermined system of equations using least squares.

In this case the least square methods results in a system of three linear equations that can be solved explicit.
 
  • #11
I actually tried that before - I chose random points and solved the system numerically with numpy, but the problem is that measurements have some noise and different points give different results. I even chose multiple sets of random selections of points and averaged the results, but it doesn't work that way.
 
  • #12
kandelabr said:
I actually tried that before - I chose random points and solved the system numerically with numpy, but the problem is that measurements have some noise and different points give different results. I even chose multiple sets of random selections of points and averaged the results, but it doesn't work that way.
If you want to use the entire data set, noise and all, you can form the normal equations and solve them, like is done for a linear regression. The problem is that numerically, for some types of data, the solution to the normal equations may be unstable, particularly if you have data covering several orders of magnitude or more.

In those cases, formation of the normal equations may not be advisable. You can still use the data, by writing an equation for each data point, and solving the resulting rectangular system using QR-factorization or Singular Value Decomposition.
 
  • #13
Hello all,

I found this thread because I'm trying to do the same thing: measure a distance using a capacitive sensor. As an extra, I'm doing this as the data comes in, so a streaming algorithm would be best. Normal linear regression can easily be used with streaming data, since only summations are involved (i.e., you don't need to go over all the data for each incoming sample; you can just push a new sample in and pop the old one out). That's what we've been using so far.
DrStupid said:
##x \cdot y = a + b \cdot c - b \cdot y + c \cdot x##

Now this is really helpful, because it would allow us to upgrade the algorithm from linear to hyperbolic regression (I know, it would be linear regression over a hyperbolic function). I have tried the simple approach as described http://onlinecalc.sdsu.edu/onlineregression15.php, but that doesn't take into account the offset ##c##. I read the French paper (as far as I could), but I haven't been able to extrapolate it to this case. §5 is bit too general for my abilities...

@JJacquelin or @DrStupid, could you send or transcribe the set of equations you mentioned? That would benefit a lot of people, I think.
 
  • #14
MarcJ said:
§5 is bit too general for my abilities...

But it answeres your question. Let me try to explain it with matrix notation:

With [itex]a = \left( {\begin{array}{*{20}c} {a_0 } \\ {a_1 } \\ \vdots \\ {a_m } \\ \end{array}} \right)[/itex] , [itex]f = \left( {\begin{array}{*{20}c}
{F_0 \left( {x_0 } \right)} & {F_1 \left( {x_0 } \right)} & \cdots & {F_m \left( {x_0 } \right)} \\ {F_0 \left( {x_1 } \right)} & {F_1 \left( {x_1 } \right)} & {} & {} \\
\vdots & {} & \ddots & {} \\ {F_0 \left( {x_n } \right)} & {} & {} & {F_m \left( {x_n } \right)} \\ \end{array}} \right)[/itex] and [itex] F = \left( {\begin{array}{*{20}c}
{F\left( {x_0 } \right)} \\ {F\left( {x_1 } \right)} \\ \vdots \\ {F\left( {x_n } \right)} \\ \end{array}} \right)[/itex]

(where x can be a vector itself) the equation

[itex]F\left( {x_k } \right) \approx \sum\limits_i {a_i f_i \left( {x_k } \right)}[/itex]

can be written as

[itex]F \approx f \cdot a[/itex]

and the minimization problem

[itex]\left( {f \cdot a - F} \right)^2 \Rightarrow Min.[/itex]

results in

[itex]a = \left( {f^T f} \right)^{ - 1} f^T F[/itex]

You can derive the elements of the pseudoinverse [itex]\left( {f^T f} \right)^{ - 1} f^T[/itex] manually or use the matrix equation directly. Even Excel can do that. But if you use Excel you might prefer the build-in functionality for multiliniear regression (LINEST function).
 
  • #15
Thanks Doctor!

Thing is, I'm not using Excel of even high-level libraries. This runs on an embedded platform where memory is tight. All code is in C. So even for the linear regression we're using now, I have to spell out the equations and make sure nothing overflows. That works quite well, but the system would benefit from using all information in the signal. We know the relationship: $$C(d) = C_0 + \epsilon_0 \epsilon_r \frac{A}{d}$$ We only change ##d##, and fringe effects are insignificant here, so the expression holds very well. ##C_0## is mostly due to wiring capacitance and is relatively large, so it cannot simply be ignored. We don't know ahead of time where we start on the curve.

For after-the-fact analysis I can use Numpy's polyfit or Scipy's curve_fit. The real-time process needs equations, and my linear algebra is too rusty by now to expand ##\left( {f^T f} \right)^{ - 1} f^T## :oldconfused: (it's been over 15 years...) I hope you can help me here.
 
  • #16
MarcJ said:
This runs on an embedded platform where memory is tight.

That's not good for the solution above, because the pseudeinverse is an m×m matrix. But I have another solution with an n×n matrix:

With [itex]F_k : = F\left( {x_k } \right)[/itex] and [itex]f_k : = \left( {f_0 \left( {x_k } \right),f_1 \left( {x_k } \right), \ldots ,f_m \left( {x_k } \right)} \right)^T[/itex] the minimization problem

[itex]\sum\limits_k {\left( {f_k^T a - F_k } \right)^2 } \Rightarrow Min.[/itex]

results in

[itex]a = \left( {\sum\limits_k {f_k f_k^T } } \right)^{ - 1} \sum\limits_k {f_k F_k }[/itex]

With three parameters this means

[itex]\sum\limits_k {f_k F_k } = \left( {\begin{array}{*{20}c} {\sum\limits_k {f_{1,k} \cdot F_k } } \\ {\sum\limits_k {f_{2,k} \cdot F_k } } \\ {\sum\limits_k {f_{3,k} \cdot F_k } } \\ \end{array}} \right)[/itex]

and

[itex]\sum\limits_k {f_k f_k^T } = \left( {\begin{array}{*{20}c}
{\sum\limits_k {f_{1,k} f_{1,k} } } & {\sum\limits_k {f_{1,k} f_{2,k} } } & {\sum\limits_k {f_{1,k} f_{3,k} } } \\
{\sum\limits_k {f_{1,k} f_{2,k} } } & {\sum\limits_k {f_{2,k} f_{2,k} } } & {\sum\limits_k {f_{2,k} f_{3,k} } } \\
{\sum\limits_k {f_{1,k} f_{3,k} } } & {\sum\limits_k {f_{2,k} f_{3,k} } } & {\sum\limits_k {f_{3,k} f_{3,k} } } \\
\end{array}} \right): = \left( {\begin{array}{*{20}c}
{S_{11} } & {S_{12} } & {S_{13} } \\
{S_{12} } & {S_{22} } & {S_{23} } \\
{S_{13} } & {S_{23} } & {S_{33} } \\
\end{array}} \right)[/itex]

and the inverse is

[itex]\left( {\begin{array}{*{20}c}
{S_{11} } & {S_{12} } & {S_{13} } \\
{S_{12} } & {S_{22} } & {S_{23} } \\
{S_{13} } & {S_{23} } & {S_{33} } \\
\end{array}} \right)^{ - 1} = \frac{1}{K} \cdot \left( {\begin{array}{*{20}c}
{S_{22} S_{33} - S_{23} S_{23} } & {S_{13} S_{23} - S_{12} S_{33} } & {S_{12} S_{23} - S_{13} S_{22} } \\
{S_{13} S_{23} - S_{12} S_{33} } & {S_{11} S_{33} - S_{13} S_{13} } & {S_{12} S_{13} - S_{11} S_{23} } \\
{S_{12} S_{23} - S_{13} S_{22} } & {S_{12} S_{13} - S_{11} S_{23} } & {S_{11} S_{22} - S_{12} S_{12} } \\
\end{array}} \right)[/itex]

with

[itex]K = S_{11} \left( {S_{22} S_{33} - S_{23} S_{23} } \right) + S_{12} \left( {S_{13} S_{23} - S_{12} S_{33} } \right) + S_{13} \left( {S_{12} S_{23} - S_{13} S_{22} } \right)[/itex]

The implementation in C shouldn't be very difficult.
 
Last edited:
  • #17
MarcJ said:
We know the relationship: $$C(d) = C_0 + \epsilon_0 \epsilon_r \frac{A}{d}$$ We only change ##d##, and fringe effects are insignificant here, so the expression holds very well. ##C_0## is mostly due to wiring capacitance and is relatively large, so it cannot simply be ignored.

Yesterday I was so busy with my formulas that I didn’t pay attention to your specific problem. Now I realize that this is just a simple linear regression:

[itex]y = m \cdot x + n[/itex]

with [itex]y = C[/itex], [itex]x = \frac{1}{d}[/itex], [itex]m = \varepsilon _0 \varepsilon _r A[/itex] and [itex]n = C_0 [/itex].

Do I miss something?
 
  • #18
I'm getting closer, but we're not there yet. Let me first reply to your thoughts. We don't know the exact distance ##d## in the system, but we can make a well-defined move (##\Delta d##). That's what I meant when I said we don't know our starting point on the curve. So we need ##b## to account for an unknown offset in distance...

Somehow, I got hung up on how the cross product ##xy## would fit in the calculations. Then I realized it's perfectly straightforward. This is what I'm doing now: we're trying to find the coefficients in $$y = \frac{a}{x + b} + c$$ which can be written as $$xy = a + bc - by + cx$$ Replacing the independant part ##a + bc## with ##a'##, we need to minimize the errors ##\varepsilon_i## in the system $$x_i y_i = a' - b y_i + c x_i + \varepsilon_i$$
which can be written as $$\begin{align} \begin{pmatrix} x_1 y_1 \\ x_2 y_2 \\ \vdots \\ x_n y_n \end{pmatrix} &= \begin{pmatrix} 1 & -y_1 & x_1 \\ 1 & -y_2 & x_2 \\ \vdots \\ 1 & -y_n & x_n \end{pmatrix} \begin{pmatrix} a' \\ b \\ c \end{pmatrix} + \begin{pmatrix} \varepsilon_1 \\ \varepsilon_2 \\ & \vdots \\ \varepsilon_n \end{pmatrix} \nonumber \\
\vec {xy} &= \mathbf F \vec a + \vec \varepsilon \nonumber \end{align}$$
Minimizing ##\vec \varepsilon##: $$\hat {\vec a} = \left(\mathbf F^T \mathbf F\right)^{-1} \mathbf F^T \vec{xy}$$
So I tried this with numpy on data that is similar to sensor data, and it works! Even better: we may be able to offload these calculations to the application processor, where we do have a Python environment available. The crucial lines are:
Code:
import numpy as np
xy = np.array([x * y for (x, y) in zip(x, y)])
F = np.array([[1, -y, x] for (x, y) in zip(x, y)])
est = np.linalg.inv(F.T.dot(F)).dot(F.T).dot(xy)
return [est[0] - est[1] * est[2], est[1], est[2]]
Now the real world steps in, adding noise to the data. We know that the noise has a normal (Gaussian) distribution and does not scale with the amplitude of the signal. You'd think that in a large data set of, say, 1000 samples, the noise would average out. Unfortunately, it doesn't when the noise is significant. I analyzed part of a curve without noise (##a = 3000, b = 1, c = 107, 15 \leq x \leq 5##), the numbers obtained from an earlier curve-fitting experiment.
Code:
def f_inv(x, a, b, c):
        return a / (x + b) + c
a = 3000
b = 1
c = 107
# data set
x = np.array(np.linspace(15, 5, 1000))
y = np.array([f_inv(i, a, b, c) for i in x])
Running the data set through the code above, the same coefficients are found. However, when I add noise comparable to a real-world signal, I see a significant error.
Code:
y += 10 * np.random.randn(*y.shape)
I'll see if I can upload some plots, and I can send the script if you like. Any idea how to improve the fit in the presence of noise?

BTW, typing the equations does take quite some time. Thanks for your efforts!
 
  • #19
I have cleaned up the Python code and created some plots for the values of ##a##, ##b## and ##c## mentioned above. Noise is simply added to the 'ideal' data, has a mean of 0 and a standard deviation of 0, 1 and 10.
figure_1.png
figure_2.png
figure_3.png

What I infer is that, at higher noise levels, the mean of the residual is still low, but apparently positive and negative errors cancel out. I was hoping that the regression would lead to minimum errors over the entire data set...
 

Attachments

  • hypfit.py.txt
    2.7 KB · Views: 562
  • #20
MarcJ said:
Any idea how to improve the fit in the presence of noise?

As integrals are usually less sensitive to noise I tried

[itex]\int {xy\,dx} = \left( {a + bc} \right) \cdot \left( {x - x_0 } \right) + c \cdot \frac{{x^2 - x_0^2 }}{2} - b \cdot \int {y\,dx}[/itex]

(using Simpson's rule for integration) and got a much better result. Compared to a combination of a linear regression for a and c and a golden section search for b, this linearisation apears to be almost as accurate but faster.
 

FAQ: Least-squares - fitting an inverse function

What is the purpose of least-squares fitting for an inverse function?

The purpose of least-squares fitting for an inverse function is to find the best fit for a set of data points that follow a general trend of an inverse relationship. This allows scientists to determine the mathematical relationship between two variables and make predictions based on that relationship.

How does least-squares fitting work for an inverse function?

Least-squares fitting for an inverse function involves finding the line of best fit that minimizes the sum of the squares of the vertical distances between the data points and the line. This is done by adjusting the parameters of the inverse function until the sum of the squares is at a minimum.

What are the benefits of using least-squares fitting for an inverse function?

One of the main benefits of using least-squares fitting for an inverse function is that it allows for a more accurate representation of the relationship between two variables, especially when dealing with non-linear relationships. It also provides a way to quantify the uncertainty in the fitted parameters.

What are the limitations of using least-squares fitting for an inverse function?

One limitation of using least-squares fitting for an inverse function is that it assumes the data points are independent and have equal variance. This may not be true in all cases, which can lead to inaccurate results. Additionally, this method may not be suitable for all types of data, such as data with significant outliers.

Are there alternative methods to least-squares fitting for an inverse function?

Yes, there are alternative methods to least-squares fitting for an inverse function, such as maximum likelihood estimation and Bayesian inference. These methods may be more suitable for certain types of data or can provide additional information about the uncertainty in the fitted parameters.

Similar threads

Replies
2
Views
995
Replies
4
Views
3K
Replies
19
Views
2K
Replies
2
Views
2K
Replies
6
Views
902
Back
Top