- #36
sysprog
- 2,617
- 1,796
@Gaussian97, any chance you could post the code in question?
Well, the code in question is more than 2000 lines, but the relevant part is:sysprog said:@Gaussian97, any chance you could post the code in question?
# The i and Q variables are not relevant.
def deriv(f, i, Q, x0, epsilon):
h = epsilon * x0
result = (f(i, x0 + h, Q**2) - f(i, x0 - h, Q**2))/(2*h)
return result
# The function we want to differentiate is:
import lhapdf
pdf = lhapdf.mkPDF(PDFname)
f = pdf.xfxQ2
Yes, expanding the Taylor series for each side of the symmetric difference method shows that the ## \frac{h^2}{2!} f''(x) ## term eliminates and you are left with errors ## O(h^3) ## instead of ## O(h^2) ## with a one-sided (Newton) method.FactChecker said:I think that, given equal amounts of round-off and truncation errors, this will tend to give a more accurate estimate at the midpoint value than a one-sided calculation would. This is just a gut feeling on my part and my knowledge of the numerical issues is too old to be reliable.
So I'm still thinking on this a lot... I'm thinking the (1±e) is a very accurate way to do this. It removes scale, I guess it's the best way I can put it. I actually like this a lot and now that I get it, going to rewrite some codeGaussian97 said:Summary:: Question about an algorithm to compute the derivative of a function.
I'm not sure if this is the correct forum to post this question, or should I post it in a math forum. But I was looking at some code when I found a 'strange' implementation to compute the derivative of a function, and I wanted to know if any of you has an idea of why such an implementation is used.
The formula used is
$$f'(x) = \frac{f((1+\varepsilon)x)-f((1-\varepsilon)x)}{2\varepsilon x}$$
Of course, ##\varepsilon## should be a small number. I know that there are many ways to implement the derivative of a function numerically, and obviously, the formula above does indeed converge to the derivative in the limit ##\varepsilon\to 0## in the case of differentiable functions.
My question is if someone has also used this formula instead of the usual ##f'(x) = \frac{f(x+\varepsilon)-f(x-\varepsilon)}{2\varepsilon}## or if someone knows any advantage for using this alternative formula.
Something that may be important is that the formula is used to compute derivatives of functions that are only defined in the interval ##(0,1)##, so maybe I thought this formula has some advantage when ##x \sim 0## or ##x \sim 1##?
For example, this formula has the advantage that even if ##x<\varepsilon## the argument will never be smaller than 0, which probably is one of the reasons for using it.
Does anyone have any information?
If x is in (0,1), doesn't multiplying by it aggravate those problems?pasmith said:Floating point arithmetic is inherently inaccurate - particularly if you're dealing with very small or very large numbers. You don't want [itex]1/\epsilon[/itex] to overflow, you don't want [itex]x \pm \epsilon [/itex] to be rounded to [itex]x[/itex] and you don't want [itex]|f(x + \epsilon) - f(x - \epsilon)|[/itex] to be rounded to zero.
This is an attempt to avoid those problems.
I'm still trying to make sense of that factFactChecker said:If x is in (0,1), doesn't multiplying by it aggravate those problems?
Yeah, for some reason I initially interpreted the function as f(x + e) etc. It makes a lot more sense with f(x * (1 + e))pbuk said:It is not that we need to make the step size (which I am going to call ## h ## because ## \varepsilon ## is always machine epsilon) small near zero - we need to make the step size as small as we can everywhere: at least small enough that we can be sure that the ## O(h^3) ## term disappears below ## \varepsilon ##.
However the smaller we make the step size in relation to ## x ##, the bigger the problem we are going to have with roundoff error: bear in mind that we try to calculate values of ## f(x + h) ## we are actually calculating ## f(x + h + \varepsilon(x + h)) ##.
As a result we can see that we can in general minimise the errors introduced by ## \varepsilon ## by making ## h ## proportional to ## x ##, and this is what the quoted code does. Exactly what proportion to choose depends on how badly behaved ## f(x) ## is, but given that we are trying to make the ## O(h^3) ## term disappear, something around ## \varepsilon^{1 \over 3} ## (adjusted for the range of ## f'''(x) ##) is probably a good place to start.
I think that if it had been in other than an interpreted language, it potentially could have ##-## thanks for the reply . . .Gaussian97 said:I don't know if this gives any extra information.
It seems pretty clear that this is what the implementation is trying to do, but after thinking about it more, I agree with @FactChecker, for a couple of reasons. I think the implementation in some cases introduces more problems that it solves, though perhaps not applicable in this specific case, but for example, there are problems with negative values of x, and the artificial discontinuity introduced at x=0. Also, if you take something like f(x) = sin(x), answers for f'(x) and f'(x + 2n*pi) will not be exactly the same. Obviously minimizing the epsilon based on the magnitude of x might be more important, but there are other ways to accomplish this without introducing the hx term or the division by x, for example, h should be O(x * 2^-24), so a constant value can be chosen based on the floating point exponent of x that will still guarantee that (x+/-h <> x) without all the weird side effects.pbuk said:It is not that we need to make the step size (which I am going to call ## h ## because ## \varepsilon ## is always machine epsilon) small near zero - we need to make the step size as small as we can everywhere: at least small enough that we can be sure that the ## O(h^3) ## term disappears below ## \varepsilon ##.
However the smaller we make the step size in relation to ## x ##, the bigger the problem we are going to have with roundoff error: bear in mind that we try to calculate values of ## f(x + h) ## we are actually calculating ## f(x + h + \varepsilon(x + h)) ##.
As a result we can see that we can in general minimise the errors introduced by ## \varepsilon ## by making ## h ## proportional to ## x ##, and this is what the quoted code does. Exactly what proportion to choose depends on how badly behaved ## f(x) ## is, but given that we are trying to make the ## O(h^3) ## term disappear, something around ## \varepsilon^{1 \over 3} ## (adjusted for the range of ## f'''(x) ##) is probably a good place to start.
Right, I understood that. But also in that case you can just pick h to be 2^-24 (or maybe -23?) For all x, for 32-bit floats.pbuk said:But the implementation is specifically and only for functions over the domain (0, 1) so none of that applies.
Well, that's actually wrong if you use x very close to zero, but that can be accounted for.valenumr said:Right, I understood that. But also in that case you can just pick h to be 2^-24 (or maybe -23?) For all x, for 32-bit floats.
That does not appear to be the case.valenumr said:Right, I understood that.
valenumr said:But also in that case you can just pick h to be 2^-24 (or maybe -23?) For all x, for 32-bit floats.
Yes, I did get it, but I don't think this is a good implementation for reasons mentioned.valenumr said:though perhaps not applicable in this specific case
To me, it's not impractical at all to implement a method that provides the most accurate possible results within the capabilities of machine representation using the actual limit definition of a derivative.mgeorge001 said:I like this formula because in x + Eps, Eps has a dimension, usually ignored, but scaling is often important, something this formula is explicitly giving as Eps is dimensionless in (1 + Eps)x and we measure scale relative to 1, whereas now, x is the only quantity with a dimension. Numerical analysts must take this sort of thing into account all the time to get "adaptive" algorithms. The one point to be aware of in allowing x to set dimension (or absolute scale) in the problem is that if x = 0 falls within the map, this loses scale information, and you will see phenomenon akin to Gibbs phenomenon or Runge's phenomenon. Of course, there are strategems to aviod the trivial setting x = 0 exactly, so strictly speaking, once you are aware, you can avoid program crashes, but Gibbs phenomenon looks pretty unavoidable here. I guess I would use "weak" methods like test functions in Sobolev theory to avoid this. Other than a few minor issues like that, it seems superficially to me to warrant some investigation. In classical applied math and physics people did this sort of thing routinely as in the dielectric constant that appears in classical electrodynamics. Lots of mathematicians are sharp about this sort of thing (e.g. Sobolev, himself) but often there is a gap between the mathematicians and the numerical analyst practitioners. I am only a math teacher, so that's probably worse!
I think it is important to see the proposed formula more in light of classical "macroscopics", i.e. not trying to get at the highly accurate microscopic detail. I personally do not know whether the formula is of much use or not. We see the same "classical" problem of "loss of scale" that can haunt Newton's method for finding zeros. I only meant to point out that the proposal has some appeal in a "big picture" way. It is worth bearing in mind that classical Maxwell theory had to give way to quantum mechanics as ways of addressing microscopic behaviors became more relevant and accessible. But I don't think that implies "lack of utility" as there are instances where we want this sort of picture, and as we well know, Newton's method works pretty nicely in lots of cases despite grievous deficiencies.valenumr said:To me, it's not impractical at all to implement a method that provides the most accurate possible results within the capabilities of machine representation using the actual limit definition of a derivative.
I keep thinking of examples where this approach fails, and perhaps in specialized circumstances it is works well, but I am not really seeing any utility other than expedient implementation.
This is a case where the two-sided estimate would tend to (slightly) mitigate the noise compared to the one-sided estimate (for equal ##\epsilon## values). It is averaging the function change over a larger change of x.rayj said:Another consideration is origination of the data and how the derivatives will be used. If this is experimental data (as most all data is), there will be noise. It is typical for noise to be high frequency such as may be found with radio frequency interference on electrical transducers and on digitally computed results. When such noisy data goes into either of the derivative functions, the high frequency is amplified and may overwhelm the fundamental data.
Well, for sure you can have functions that just don't work well with basic numerical approaches. If it is something high frequency, for example, and you try to compute a derivative that should be at a maxima or minima, it won't work well with a standard approach, but might be better with a +/- dh. But I think that really is a more fundamental problem of scale and numerical accuracy in the simulation. I think it's reasonable (I haven't gotten deep on the math too much) that the extra error term (say h²) in a quadratic gets erased doing it this way, but in a lot of cases, this term will be much smaller than f(x) to the point that it is outside of machine precision. Obviously not always. But if it does become problem I don't think the method of taking the derivative is the issue. It would be better to scale the function appropriately if at all possible to get more accurate results.mgeorge001 said:I think it is important to see the proposed formula more in light of classical "macroscopics", i.e. not trying to get at the highly accurate microscopic detail. I personally do not know whether the formula is of much use or not. We see the same "classical" problem of "loss of scale" that can haunt Newton's method for finding zeros. I only meant to point out that the proposal has some appeal in a "big picture" way. It is worth bearing in mind that classical Maxwell theory had to give way to quantum mechanics as ways of addressing microscopic behaviors became more relevant and accessible. But I don't think that implies "lack of utility" as there are instances where we want this sort of picture, and as we well know, Newton's method works pretty nicely in lots of cases despite grievous deficiencies.
<sigh> then what is the point in commenting? Numerical analysis without the analysis is just guessing.valenumr said:I haven't gotten deep on the math too much
No, I repeat, the point in a two-sided calculation is that the ## f''(x) ## termvalenumr said:the extra error term (say h²) in a quadratic gets erased doing it this way
I think your point about scale is significant. Thanks. You have a mature perspective on this, that's obvious. A lot of this kind of work boils down to assumptions about small/small: That's a tricky 0/0 issue, but with a lot of work, like being able to reliably neglect say an h^2, I think numerical experts can address the difficulties meaningfully. The 0/0 problem vis a vis logs is equivalent to the infinity - infinity problem. Both enter the picture sometimes.valenumr said:Well, for sure you can have functions that just don't work well with basic numerical approaches. If it is something high frequency, for example, and you try to compute a derivative that should be at a maxima or minima, it won't work well with a standard approach, but might be better with a +/- dh. But I think that really is a more fundamental problem of scale and numerical accuracy in the simulation. I think it's reasonable (I haven't gotten deep on the math too much) that the extra error term (say h²) in a quadratic gets erased doing it this way, but in a lot of cases, this term will be much smaller than f(x) to the point that it is outside of machine precision. Obviously not always. But if it does become problem I don't think the method of taking the derivative is the issue. It would be better to scale the function appropriately if at all possible to get more accurate results.
Perhaps I misread or misinterpreted that as saying the term is eliminated when ## f''(x) ## is zero, which over an interval is just a straight line, and isn't that interesting. So I meant to say, and didn't express it well, that I'm pretty sure if ## f''(x) ## is some constant, then the answer should be exact (within machine rounding precision).pbuk said:<sigh> then what is the point in commenting? Numerical analysis without the analysis is just guessing.No, I repeat, the point in a two-sided calculation is that the ## f''(x) ## termgets erasedis eliminated. Where ## f''(x) \gg f'(x) ##, which is of course true at an extremum, this leads to better accuracy. However where this is not true the two-sided calculation can be less accurate, because you are doubling the effect of ## \epsilon ##.