- #1
Feynstein100
- 171
- 16
We have a collection of 8 discrete data points. They are:
10, 20, 30, 20, 30, 40, 30, 40
In increasing order:
10, 20*2, 30*3, 40*2
The harmonic mean of this data series is 22.86
I read on Wikipedia that the harmonic mean is skewed towards the smaller values i.e. smaller values will affect the HM more than larger values. So I thought that if we add 2 additional data points 20 and 30, our HM would be even smaller. And yet, when I calculated the HM of this new data series with 10 points:
10, 20*3, 30*4, 40*2
it turned out to be 23.08 i.e. higher than the previous case. Why did that happen?
One of our new points was lower than the HM whereas the other was higher. I thought the HM would be more skewed toward the lower value and thus would bring the overall mean down. Ah is it because the second datapoint was much higher than the HM?
In general, I'm interested in the question of how adding new datapoints will affect the HM of the existing data series.
We're not changing the endpoints, they remain constant. So any new point added will lie somewhere inside the bounds of the data series. In our example, that's 10 and 40.
So I think the answer is quite simple. If New point < HM, it lowers the HM. If New point > HM, it increases the HM.
It seems quite straightforward for adding one datapoint but what if we add multiple? In essence appending another data series to the existing one. Can we predict in advance if the new HM will be higher or lower?
10, 20, 30, 20, 30, 40, 30, 40
In increasing order:
10, 20*2, 30*3, 40*2
The harmonic mean of this data series is 22.86
I read on Wikipedia that the harmonic mean is skewed towards the smaller values i.e. smaller values will affect the HM more than larger values. So I thought that if we add 2 additional data points 20 and 30, our HM would be even smaller. And yet, when I calculated the HM of this new data series with 10 points:
10, 20*3, 30*4, 40*2
it turned out to be 23.08 i.e. higher than the previous case. Why did that happen?
One of our new points was lower than the HM whereas the other was higher. I thought the HM would be more skewed toward the lower value and thus would bring the overall mean down. Ah is it because the second datapoint was much higher than the HM?
In general, I'm interested in the question of how adding new datapoints will affect the HM of the existing data series.
We're not changing the endpoints, they remain constant. So any new point added will lie somewhere inside the bounds of the data series. In our example, that's 10 and 40.
So I think the answer is quite simple. If New point < HM, it lowers the HM. If New point > HM, it increases the HM.
It seems quite straightforward for adding one datapoint but what if we add multiple? In essence appending another data series to the existing one. Can we predict in advance if the new HM will be higher or lower?