Recalculate a range by variable 'Median'?

In summary: Income Multipliers.This would then produce a new table, that would be more relevant to the personal income data that I am working with.
  • #1
marcophys
152
20
TL;DR Summary
From a data set with known median and population distribution - create a new data set from a variable median
Hello everyone :)
I'm struggling to wrap my head around recalculating a data set based upon median.

The data set represents a fixed distribution pattern of population to income group.
There is no data available for 'population to income group' at differing medians, hence we accept the distribution pattern as constant.
recalculation_by_median.png

The population groups are calculated from the total population * multiplier (or vice versa).

Required
Create a new table, by changing the median.
In fact, the median will be quoted by 'Income'.
Unlike the 5k intervals as shown in the image; the income range (and population groups) can be modified to display 1k intervals to a maximum known 250k.
Thereafter the final group will be 'Over 250k'.

By displaying 1k intervals, the median can be found using cross referencing.

Notes:
I have yet to develop the table to produce the income median value.
Instead, I wanted to see if anyone knows how the median can be linked to the multipliers
In effect, the median could then recalculate the multipliers, and generate a new table.

In this way, a lower median would cause the population groups to the left to increase, and to the right to decrease, according to the initial distribution pattern.

It is understood, that the initial distribution pattern may not apply perfectly to the new median, but it will produce a more relevant distribution, than not redistributing the population.

Clearly, I can simply query the original data set and disregard medians.
However, including medians would produce more accurate results; hence my willingness to put the effort in.

Hopefully someone has dealt with similar requirements, and can share their knowledge :)
 
Physics news on Phys.org
  • #2
I don't think you are using the words "median", "population" or "distribution" in the way we normally use them in statistics. In particular, with a "population" (sample size) of 104,975 people, the median income is about that of the 52,487th person ranked by income or, from the data in the table
$$ £40,000 + £5,000 \times \dfrac{ \frac {104,975} 2 - 48,224}{52,855-48,224} \approx £44,603 $$

As for "creating a new table by changing the median" this doesn't make sense: the median is calculated from the table, not the other way round.

How we would normally proceed:
  • Plot the data on a chart.
  • Infer a distribution from the chart.
  • Estimate the distribution parameters from the sample statistics.
  • Calculate a goodness of fit of the hypothesised distribution and assess its suitability.

We then have a model for which we can vary the parameters to form a different distribution.

Note however that if your sample data has more than 20% of samples in the >£100k bin so judging how good the fit of your model is above that value is not possible. It would help if you knew the sample mean.

Can I suggest that you say what data you are starting with and what you are actually trying to achieve with it, and we can help you get there?
 
Last edited:
  • #3
Thanks for the reply pbuk :smile:
Apologies for certain omissions ... and thank you for your median equation.
I simply totalled the population groups and divided by 2 (mimicking the method that is often quoted).

The Data Set
The data set population is * 1000, to match the 'State Count' population projection of US Males.
(The state count is marginally lower than series Low, of the 3 census projections: Low, Mid, High, in Table 1. 2020)

Due to the series Low projection being very close to the State Count total ( 330,726 to 329,484 ); I created the full state count projection from the series Low.
The reason being that localised data is only available from the State Count.

I then rectified the published data to the State Count, to produce a working data set, that provides $1 to $2,499 increments to $99,999, and $100,000 to $149,999, $150,000 to $199,999, $200,000 to $249,999, $250,000+

All age groups tally to the state count, and various cross references indicate that the data set is now square, and correct to the information available.

The Original objective (behind this thread)
... was to enable querying by published income data, down to County level.

However, after reviewing that data today ... I now see that it does not correspond to the personal income data that I am working from.
Rather, it lists per capita income, which I believe is gained from total financial activity, divided by population.
Whereas the income data that I'm using is based upon monthly sampling of the population.
Hence, there is no correlation ... the quoted medians are way higher than the income study.
Hence, the original objective cannot be achieved.

Concerning the mathematical challenge :smile:
pbuk said:
As for "creating a new table by changing the median" this doesn't make sense: the median is calculated from the table, not the other way round.
Yes; I am aware of this principal, as general analysis of a table.

However, my thinking was that:
If we have a table that is fixed (all data known), it should be possible to modify that data by varying the key values: Total Population, and Median
... for reason that (presumably) an equation can be written for the data set, where median is an outcome (median = xxx)
From that, the equation could be re-written whereby the median moves to the other side of the equation, in exchange for the movement of a value to the left of =

Hahahah! You will likely appreciate that I'm viewing this from first taught principals :smile:
It doesn't mean that I have the equation in my head.

I have a good mathematical brain (maths doesn't cause a mental shutdown), but I'm not a mathematician.
I am well aware that mathematicians can not only utilise tricks, but they can see in their mind what equations to use.
Each to his own, and we all have our bodies of work (knowledge) that can benefit each other.

For reasons stated above, it seems clear that the original objective cannot be achieved (annoyingly).
... but in truth, it is not overly critical.
I was wearing my engineering hat, whereby I wanted to go as deep as possible, into what the data could provide.

What I have discovered, is the limitation to that exercise (within feasible achievement).

Dealing With What I Have
What I have, is a data set that is correct to the studies.
I can modify it by adjusting Total Population (relatively easy).
Simply change the total to suit a region, and the population groups shrink accordingly, whilst maintaining the original pattern.

The pattern cannot be correct, due to localised economic conditions.
As a consequence, I'm toying with a concept to optionally modify the query result.

Overall
I think that this requirement was a 'bridge too far'.
Thanks pbuk for taking the time to wrap your head around this conundrum :smile:
 
  • #4
marcophys said:
Thanks pbuk for taking the time to wrap your head around this conundrum :smile:
No problem.

marcophys said:
From that, the equation could be re-written whereby the median moves to the other side of the equation, in exchange for the movement of a value to the left of =
The problem is not that you can't alter the data to achieve a different sample median, it is that there are infinitely many ways you can do that, and the most obvious ways are unrealistic - for instance you can increase the median by $5,000 simply by moving all of the $45-50k bin into the £50-55k bin and all of the $40-45k bin into the $45-50k bin, leaving the $40-45k bin empty.
 
  • #5
pbuk said:
The problem is not that you can't alter the data to achieve a different sample median, it is that there are infinitely many ways you can do that, and the most obvious ways are unrealistic - for instance you can increase the median by $5,000 simply by moving all of the $45-50k bin into the £50-55k bin and all of the $40-45k bin into the $45-50k bin, leaving the $40-45k bin empty.
Point taken.
Clearly this one was over the hills and far away.
I don't mind ... it's always good to try to push past the limits, because it is in that way that we find ourselves setting new limits.

Thanks again, for your time :smile:
 

FAQ: Recalculate a range by variable 'Median'?

What is the purpose of recalculation by variable 'Median'?

The purpose of recalculation by variable 'Median' is to determine the middle value in a range of data. This is useful for understanding the central tendency of the data and can be used to identify outliers or unusual values.

How is the median calculated?

The median is calculated by arranging the data in ascending or descending order and finding the middle value. If there is an even number of data points, the median is the average of the two middle values.

Why is the median sometimes preferred over the mean?

The median is sometimes preferred over the mean because it is less affected by extreme values or outliers in the data. This makes it a more robust measure of central tendency.

Can the median be used with any type of data?

Yes, the median can be used with any type of data, including numerical, categorical, and ordinal data. It is a versatile measure of central tendency that is not limited by the type of data being analyzed.

How does recalculation by variable 'Median' affect the overall range of data?

Recalculation by variable 'Median' does not affect the overall range of data. It only identifies the middle value in the range and does not change the values of the data points themselves.

Similar threads

Replies
3
Views
1K
Replies
7
Views
2K
Replies
3
Views
2K
Replies
1
Views
2K
Replies
20
Views
3K
Replies
1
Views
3K
Back
Top