- #1
foz84
- 1
- 0
Hello there
I am trying to develop to formula that will derive a used cars value from a database of used car values.
Assume we are trying to value Car A. The first step is to form a sample of cars matching Car A for each header of the database columns (make, model, variant, engine size, transmission etc etc). The sample size is set to min 3, max 9. Once this sample group has been identified we are left with, in most cases 9 records, with two variables; price and mileage. I then use a deviation filter to remove records that deviate heavily from the mean (price and mileage) of the sample. The inverse relationship between price/mileage means that the deviation of these values should cancel out to 0. I have a set value where if this deviates too much from 0 that record is classed as an outlier and removed from the sample.
Once the sample has been judged suitable to use, we then calculate a sample price per mile. This is the price range of sample/mileage range of sample.
Now to calculate the value of Car A, we match to the closest 3 mileage records in sample. We use the mean price of those 3 records + (mean mileage of those 3 records - Car A mileage) * Sample price per mile.
Hope this all makes sense!
Now this seems to be working rather well, but can run into problems when Car A mileage differs significantly the mean of the 3 closest records. If I summarise the final formula as V = P + (M - A) * S. When (M - A) is a large number it creates some funny results, can I somehow weight (unsure of terminology!) the formula so when (M - A) is large number we reduce the vale of S? Or any other approach?
My maths ability is rather limited and I have no idea how to approach this problem? Can algebra even offer a solution? Or is this a calculus problem? Please excuse my ignorance.
Would be extremely grateful for any help or suggestion offered?
Thanks
I am trying to develop to formula that will derive a used cars value from a database of used car values.
Assume we are trying to value Car A. The first step is to form a sample of cars matching Car A for each header of the database columns (make, model, variant, engine size, transmission etc etc). The sample size is set to min 3, max 9. Once this sample group has been identified we are left with, in most cases 9 records, with two variables; price and mileage. I then use a deviation filter to remove records that deviate heavily from the mean (price and mileage) of the sample. The inverse relationship between price/mileage means that the deviation of these values should cancel out to 0. I have a set value where if this deviates too much from 0 that record is classed as an outlier and removed from the sample.
Once the sample has been judged suitable to use, we then calculate a sample price per mile. This is the price range of sample/mileage range of sample.
Now to calculate the value of Car A, we match to the closest 3 mileage records in sample. We use the mean price of those 3 records + (mean mileage of those 3 records - Car A mileage) * Sample price per mile.
Hope this all makes sense!
Now this seems to be working rather well, but can run into problems when Car A mileage differs significantly the mean of the 3 closest records. If I summarise the final formula as V = P + (M - A) * S. When (M - A) is a large number it creates some funny results, can I somehow weight (unsure of terminology!) the formula so when (M - A) is large number we reduce the vale of S? Or any other approach?
My maths ability is rather limited and I have no idea how to approach this problem? Can algebra even offer a solution? Or is this a calculus problem? Please excuse my ignorance.
Would be extremely grateful for any help or suggestion offered?
Thanks
Last edited: