- #1
DaveC426913
Gold Member
- 22,986
- 6,661
I should have studies statistics in school.
I have been collecting some trip data from you commute to and from work, and I want to determine what factors are influencing trip duration. There are multiple factors, which I cannot easily isolate.
The factors I'm most interested in are:
- duration of trip as a function of start time
- duration of trip as a function of day-of-week
- duration of trip as a function of route taken
What I want to do is to understand how best to analyze the data. For instance: if I want to examine a single factor, can I meaningfully 'normalize' the other factors?
Here is a snippet of the data I've collected.
Date AM/PM Start Finish Temp Route Notes (Duration)
Feb 11 Th A 0814 0857 -9 QEW Cold Dry 43m
Feb 11 Th P 1823 1859 -9 QEW Cold Dry 36m
Feb 12 Fr A 0813 0859 -18 QEW Freezing 46m
Feb 12 Fr P 1748 1820 -8 QEW Cold 32m
Feb 19 Th A 0811 0847 -16 Lakeshore Dry 36m
Feb 24 Tu A 0804 0849 -12 QEW Dry 45m
Mar 10 Tu A 0821 0904 +1 Lakeshore Wet 43mI've only got about 2 dozen entries, so it may be problematic to chop up the data into small sections and analyze each factor in isolation. Is there a way of averaging the data to make better use of it?
For example, when I'm analyzing duration as a function of day-of-week, the AM/PM parameter is a confounding factor. (Notice that PM times are significantly shorter than AM times, which could throw off my results.) But even if the absolute value of PM times is off, surely the trend is still there. Do I have to throw away all the PM data when analyzing AM data for day-of-week trends? Or can I somehow normalize the PM data to help me see overall trends over AM and PM?
This will be of particular importance in that I have very little data (< 6 entries) for alternate routes (Lakeshore).
I have been collecting some trip data from you commute to and from work, and I want to determine what factors are influencing trip duration. There are multiple factors, which I cannot easily isolate.
The factors I'm most interested in are:
- duration of trip as a function of start time
- duration of trip as a function of day-of-week
- duration of trip as a function of route taken
What I want to do is to understand how best to analyze the data. For instance: if I want to examine a single factor, can I meaningfully 'normalize' the other factors?
Here is a snippet of the data I've collected.
Date AM/PM Start Finish Temp Route Notes (Duration)
Feb 11 Th A 0814 0857 -9 QEW Cold Dry 43m
Feb 11 Th P 1823 1859 -9 QEW Cold Dry 36m
Feb 12 Fr A 0813 0859 -18 QEW Freezing 46m
Feb 12 Fr P 1748 1820 -8 QEW Cold 32m
Feb 19 Th A 0811 0847 -16 Lakeshore Dry 36m
Feb 24 Tu A 0804 0849 -12 QEW Dry 45m
Mar 10 Tu A 0821 0904 +1 Lakeshore Wet 43mI've only got about 2 dozen entries, so it may be problematic to chop up the data into small sections and analyze each factor in isolation. Is there a way of averaging the data to make better use of it?
For example, when I'm analyzing duration as a function of day-of-week, the AM/PM parameter is a confounding factor. (Notice that PM times are significantly shorter than AM times, which could throw off my results.) But even if the absolute value of PM times is off, surely the trend is still there. Do I have to throw away all the PM data when analyzing AM data for day-of-week trends? Or can I somehow normalize the PM data to help me see overall trends over AM and PM?
This will be of particular importance in that I have very little data (< 6 entries) for alternate routes (Lakeshore).