Difference between MAPE and SSE

  • A
  • Thread starter maistral
  • Start date
  • Tags
    Difference
In summary: Or are you thinking of "scale-independence" as meaning that MAPE would work well with datasets of different sizes?And other than these differences, are there any other fundamental difference in mathematical tendency between the two?I'm not sure what you're asking.
  • #1
maistral
240
17
TL;DR Summary
Can someone give a link, or perhaps a summary as to what are the fundamental differences between the two, and how they behave as objective functions in minimization?
I am working on an equation that is supposed to model two dependent variables Y and Z using four parameters a, b, c, and d (for regression) and a single independent variable X. What I am doing is that given a set of values for X, I am going to regress a, b, c, and d to fit Ycalc and Zcalc to Yexpt'l and Zexpt'l.

My problem is this: I tried using both MAPE, and SSE normalized via the standard deviation of each dependent variable as objective functions:

MAPE = 100/nX Σ[|Yi, calc - Yi, expt'l| / Yi, expt'l] + 100/nX Σ[|Zi, calc - Zi, expt'l| / Zi, expt'l]

SSE = Σ{[(Yi, calc - Yi, expt'l) / σYcalc]2} + Σ{[(Zi, calc - Zi, expt'l) / σZcalc]2}

All summations are from i = 1 to nX.

My issue is as follows: It always (at least, to my requirement) ends up with MAPE doing a better job of determining the parameters a, b, c, and d in fitting Y and Z. Why is this so? May I know what is the fundamental difference between the two, and why and why not should I use MAPE / SSE?
 
Physics news on Phys.org
  • #2
maistral said:
I am going to regress a, b, c, and d to fit Ycalc and Zcalc to Yexpt'l and Zexpt'l.

My problem is this: I tried using both MAPE, and SSE normalized via the standard deviation of each dependent variable as objective functions:

Do you mean that you estimated a,b,c,d in two different ways? - one way was by picking values that minimized MAPE and the other way was by picking values that minimized SSE ?
My issue is as follows: It always (at least, to my requirement) ends up with MAPE doing a better job of determining the parameters a, b, c, and d in fitting Y and Z.

What criteria are you using to determine which method did a "better" job? How are you measuring the quality of the fit? Do you have a mathematical function that measures the "error" in the fit that is different from MAPE or SSE?
 
  • Like
Likes FactChecker
  • #3
Stephen Tashi said:
Do you mean that you estimated a,b,c,d in two different ways? - one way was by picking values that minimized MAPE and the other way was by picking values that minimized SSE ?
Yup, this is what I meant. And it always ended up with the MAPE formulation doing a better job, at least, for my requirement.

Stephen Tashi said:
What criteria are you using to determine which method did a "better" job? How are you measuring the quality of the fit? Do you have a mathematical function that measures the "error" in the fit that is different from MAPE or SSE?
No, I mean I have a dataset for (Y, Z) vs. X. When I fit variables Y and Z, I find that the curve resulting from MAPE behaving more properly. My data has this certain funny behavior of exponentially behaving at low and mid ranges, then at the end of the data range, it suddenly shoots up. Also, on a lesser note, I can draw a certain linear or quadratic relations using a, b, c, and d versus other variables, which 'generalizes' them somehow.

Actually, I've decided that I'm going to use MAPE because of this. I just could not tell why does MAPE do a better job than SSE which goes back to my original question - may I know what is the mathematical tendency of MAPE as compared to SSE?
 
Last edited:
  • #4
Upon further reading I found out that MAPE is supposed to be scale-independent, and SSE is scale-dependent. May I know what scale-dependency means? And other than these differences, are there any other fundamental difference in mathematical tendency between the two?
 
  • #5
maistral said:
No, I mean I have a dataset for (Y, Z) vs. X. When I fit variables Y and Z, I find that the curve resulting from MAPE behaving more properly. My data has this certain funny behavior of exponentially behaving at low and mid ranges, then at the end of the data range, it suddenly shoots up.

may I know what is the mathematical tendency of MAPE as compared to SSE?

If you want mathematical answers, you'll have to ask questions that are mathematically precise. The visual appeal of a curve fit is subjective and can vary from person to person. To get specific advice, I suggest you post data or some graphs. That type of question usually gets a lot of suggestions.

Whether a curve fit "looks right" is not a purely mathematical question. It involves whatever field of science applies to the data.

Upon further reading I found out that MAPE is supposed to be scale-independent, and SSE is scale-dependent.
Those are somewhat ambiguous statements. Are you thinking of "MAPE" as method of curve fitting versus thinking of it as a single number that measures how well a curve fits?

Suppose we are measuring the X data in cm and the Y data in kg. We compute that the curve that minimizes the mean absolute percentage error. The values of the parameters of the curve are ##a_1,b_1,c_1,d_1##. Then we express the Y data in grams ( thus changing the scale of the Y data) and compute another curve that minimizes the mean absolute percentage error to the rescaled data. The parameters for that curve are ##a_2,b_2,c_2,d_2##. In general, it will turn out that ##a_1 \ne a_2, b_1 \ne b_2, c_1 \ne c_2, d_1 \ne d_2##. Considering "MAPE" to be a method of curve fitting the values obtained for the parameters are not scale independent. However, the mean absolute percentage errors produced by the two curves are identical. So, considering MAPE to be a single number measuring how well a family of curves fits, it's value is scale independent.
 
Last edited:
  • #6
Hi, and thanks for replying.

Stephen Tashi said:
If you want mathematical answers, you'll have to ask questions that are mathematically precise. The visual appeal of a curve fit is subjective and can vary from person to person. To get specific advice, I suggest you post data or some graphs. That type of question usually gets a lot of suggestions.
Actually I was referring to how the MAPE reduces errors compared to SSE. Because I find it weird that they have the same numerators and yet they converge in an entirely different manner.

Stephen Tashi said:
Whether a curve fit "looks right" is not a purely mathematical question. It involves whatever field of science applies to the data.

Actually I could vouch for this, I can say that the SSE does not give a curve that "looks right", and that MAPE does so. Did this come from a book, or does it come with your experience? I think I would need to write this argument.

For reference, these are my results for A vs. a parameter where the values for A are generalized via the parameter:

This one's from SSE
1613987611975.png


And this one's from MAPE
1613987637768.png
 
  • #7
maistral said:
For reference, these are my results for A vs. a parameter where the values for A are generalized via the parameter:

I don't know what you mean by "the values for A are generalized via the parameter".

Relevant to difference between MAPE and SSE fitting would be graphs of the same data and two curves fit to it, one minimizing MAPE and one minimizing SSE. The two graphs you show look like they plot different data.
 
  • #8
Stephen Tashi said:
I don't know what you mean by "the values for A are generalized via the parameter".

Relevant to difference between MAPE and SSE fitting would be graphs of the same data and two curves fit to it, one minimizing MAPE and one minimizing SSE. The two graphs you show look like they plot different data.
They're supposed to be different.

It's like this, I fit A, B, C, and D on a Y, Z vs. X equation. Then I get different values of A, B, C, and D for each set of Y, Z vs. X.

Then if I generalize A, B, C, and D by plotting it against a certain parameter (which is known to our field to work), those graphs appear. MAPE shows a generalization of A better than SSE.
 
  • #9
maistral said:
It's like this, I fit A, B, C, and D on a Y, Z vs. X equation. Then I get different values of A, B, C, and D for each set of Y, Z vs. X.

Then if I generalize A, B, C, and D by plotting it against a certain parameter (which is known to our field to work), those graphs appear. MAPE shows a generalization of A better than SSE.

So this is a more complicated scenario that simply fitting one curve to one set of data. However, we should start with looking at how the curve fits for individual sets of Y,Z vs X data look.

It's not clear what you mean by "generalizing A". Are you saying that you have a theoretical equation that predicts "A" as a function of the "certain parameter"? Does each set of Y,Z vs X data correspond to a single valule of that certain parameter?

I assume you understand how "percentage error" differs qualitatively from "squared error" - for example, that (110-100)/100 is the same percentage error as (1.1 - 1.0)/1.0 but that (110-100)^2 is a larger squared error than (1.1- 1.0)^2.
 
  • #10
Hi. I was able to replicate my problem in a more simple manner. I hope this would bring in more insight.

So what I did was generate data from y = exp(x) + rand(). Then I tried fitting a quadratic function on the dataset twice, the first one (orange) using the SSE formulation (minimizing the sum of the squares of the residuals) and the second one using the MAPE formulation (minimizing the average absolute percentage error). The results are as follows:

1614236161199.png


This is what I 'mean' before. Why is it that the MAPE and SSE curves (in turn, the coefficients of the fitting quadratic polynomial) not the same? While I know the objective functions are obviously not the same, what are the differences between MAPE and SSE in handling errors such that it ended up this way?

For my study, MAPE seems to work better - and it was the one being used by almost all researches in my field, but IMO it's so brain-dead to use something because everyone is using it and that no one knows the concept behind it. I still cannot differentiate why does it work better compared to SSE. That's why I was asking how SSE or MAPE handles errors. Like, does the coefficient n in MAPE have any bearing? Or something?

EDIT: Updating the spreadsheet file to include the following:

So I tested what if I only had a single measurement for a certain point, and now MAPE works better. SSE works better at the larger x-values though:
1614237069446.png


May I know why is this happening?
 

Attachments

  • quadratic-regression-in-class-dataset.xlsx
    37 KB · Views: 193
Last edited:
  • #11
maistral said:
This is what I 'mean' before. Why is it that the MAPE and SSE curves (in turn, the coefficients of the fitting quadratic polynomial) not the same?
I can't understand why you would expect the fits to be the same curve. If there happened to be one curve that produced zero error at all points then I can understand why MAPE and SSE would both result in that curve. Otherwise, why should they produce the same result?

While I know the objective functions are obviously not the same, what are the differences between MAPE and SSE in handling errors such that it ended up this way?
Let's take a simple case. Suppose there are 2 data points, ## y = 10, y = 100## and you want to find a single number to approximate those two values.

The value ##a## that minimizes the MSE = ##(1/2)( (a-10)^2 + (a-100)^2) ## is ##a = (10 + 100)/2 = 55##.

The value ##a## minimizes MAPE = ## (1/2)( |a-10|/10 + | a-100|/100) ## is ##a = 10##

In manner of speaking, minimizing MAPE doesn't consider the error between 100 and ##a=10## to be large, but SSE does.

For my study, MAPE seems to work better - and it was the one being used by almost all researches in my field, but IMO it's so brain-dead to use something because everyone is using it and that no one knows the concept behind it.

Without the specifics of the problem, I can only speculate why MAPE would work better. Suppose we are trying to fit a a theoretical curve ##A = g(y)## to data and we must do this by first fitting ##y = f(x)## to some data for ##x##. The actual function ##g(y)## might be something like ##g(y) = y + 1## where an error in the value of ##g(y)## when the error in ##y## is ##\delta y = 10## is the same error regardless of whether we are dealing with ##y = 10## or ##y = 100##. On the other hand, the function ##g(y)## might be something like ##g(y) = (y + 1)/y ## where an error of ##\delta y = 10## in ##y## produces different errors in ##g(y)## depending on whether ##y = 10## or ##y = 100##. The first case suggests fitting ##f(x)## to data using SSE. The second case suggests fitting ##f(x)## to data using MAPE.
 
  • #12
Stephen Tashi said:
Let's take a simple case. Suppose there are 2 data points, ## y = 10, y = 100## and you want to find a single number to approximate those two values.

The value ##a## that minimizes the MSE = ##(1/2)( (a-10)^2 + (a-100)^2) ## is ##a = (10 + 100)/2 = 55##.

The value ##a## minimizes MAPE = ## (1/2)( |a-10|/10 + | a-100|/100) ## is ##a = 10##

In manner of speaking, minimizing MAPE doesn't consider the error between 100 and ##a=10## to be large, but SSE does.

Oh. This is what I meant about how they are handling errors. Thank you very much for this.
Stephen Tashi said:
Without the specifics of the problem, I can only speculate why MAPE would work better. Suppose we are trying to fit a a theoretical curve ##A = g(y)## to data and we must do this by first fitting ##y = f(x)## to some data for ##x##. The actual function ##g(y)## might be something like ##g(y) = y + 1## where an error in the value of ##g(y)## when the error in ##y## is ##\delta y = 10## is the same error regardless of whether we are dealing with ##y = 10## or ##y = 100##. On the other hand, the function ##g(y)## might be something like ##g(y) = (y + 1)/y ## where an error of ##\delta y = 10## in ##y## produces different errors in ##g(y)## depending on whether ##y = 10## or ##y = 100##. The first case suggests fitting ##f(x)## to data using SSE. The second case suggests fitting ##f(x)## to data using MAPE.

And this is the kind of argument I need for my study, lol. However I cannot... understand a few things.

As far as I understood from your example, if I have, say, an equation g(y) and I have no prior knowledge of this function except from the data generated from g(y), plus some errors in y that are somehow consistent in quantity with all the other values of y, I should use SSE?

And that if, say, for example, I have g(y), and again, I have no prior knowledge of this function except again the data generated from this function, plus some errors that vary widely depending on the value of y, I should use MAPE?
 

FAQ: Difference between MAPE and SSE

What is MAPE and SSE?

MAPE stands for Mean Absolute Percentage Error and SSE stands for Sum of Squared Errors. They are both metrics used to measure the accuracy of a forecasting model.

What is the difference between MAPE and SSE?

The main difference between MAPE and SSE is that MAPE measures the percentage difference between the forecasted and actual values, while SSE measures the squared difference between the two values. MAPE is a relative measure, while SSE is an absolute measure.

Which metric is better to use, MAPE or SSE?

It depends on the specific needs and goals of the analysis. MAPE is more useful when the data has a wide range of values, as it takes into account the percentage difference. SSE is better when the data has a narrow range of values, as it gives more weight to larger errors.

Can MAPE and SSE be used together?

Yes, MAPE and SSE can be used together to get a more comprehensive understanding of the accuracy of a forecasting model. MAPE can give an overall picture of the model's performance, while SSE can provide more detailed information about the magnitude of errors.

How do you interpret MAPE and SSE values?

MAPE is usually expressed as a percentage, with lower values indicating better accuracy. SSE is expressed in the same units as the data, with lower values also indicating better accuracy. It is important to compare these values to the baseline model or other models to determine the effectiveness of the forecasting model.

Similar threads

Replies
4
Views
6K
Replies
5
Views
2K
Replies
4
Views
2K
Replies
1
Views
1K
Replies
2
Views
6K
Replies
4
Views
2K
Back
Top