Range of Difference: Bounds for Length of Stay

In summary, the conversation discusses using hotel data to estimate the length of stay for guests. There is a suggestion to use the distribution of the maximum departure date minus the minimum arrival date, or to use the distribution of the maximum departure date. The difficulty lies in not knowing the distribution of the arrival and departure dates. The conversation also touches on converting the dates to a different format and using order statistics on the length of stay. There is also a mention of using quartiles and constructing a confidence interval for the average length of stay. The conversation ends with a request for an estimator for the population length of stay.
  • #1
WWGD
Science Advisor
Gold Member
7,420
11,424
TL;DR Summary
Want to know if range of hotel stay in days satisfies given bound
Ok, so I'm given hotel data :{Arrival Date, Departure Date}, each in terms of nth day of the year , and I want to estimate whether the range/difference, aka, the length of stay is below a bound. Say a week ( 7 days) for definiteness.

I'm thinking of using either the distribution of the range or to use order statistics for the auxiliary variable Difference in Dates := D= Departure Date - Arrival Date and use the distribution of the range ##D_{Max}- D_{Min}##, or maybe just the distribution of the Max.
Is this a good way?
Thing is I don't know the distribution of neither Arrival Date nor of Departure Date, so I don't see how to compute the distribution of any of these 3: Max Departure, Min Arrival, Max Departure- Min Arrival, to compute the order statistics.
Maybe @StephenTashi can comment?
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
WWGD said:
and use the distribution of the range ##D_{Max}- D_{Min}##

I don't understand the data. Is there a possibly different ##D_{max}## for each person who stayed at the hotel? Do most people in the data have more than one stay?
 
  • Like
Likes pbuk
  • #3
Stephen Tashi said:
I don't understand the data.
Me either. If you are interested in the length of stay then surely it is trivial to compute it for each stay and do stats on that directly? Are you interested in some mathematical equivalence or actually computing values, and if the latter how is the data stored and what language are you using to analyse it?
 
  • #4
Stephen Tashi said:
I don't understand the data. Is there a possibly different ##D_{max}## for each person who stayed at the hotel? Do most people in the data have more than one stay?
Thank for your reply. Apologies
pbuk said:
Me either. If you are interested in the length of stay then surely it is trivial to compute it for each stay and do stats on that directly? Are you interested in some mathematical equivalence or actually computing values, and if the latter how is the data stored and what language are you using to analyse it?
Yes, this is what I meant to do, but I was on my phone, which was dying on me. I wanted to define order statistics on the length of stay = Departure Date - Arrival Date. Each date described as nth day of the year.
 
  • #5
Length of stay is a discrete random variable, taking values in the Natural Numbers and ##\{0\}##. As such, we can compute its deciles, including median, etc., unless I'm missing something. Or maybe there's some other statistic to evaluate claims about its range.
 
  • #6
WWGD said:
Length of stay is a discrete random variable,
I would rather say length of stay can be modeled by a discrete random variable.

WWGD said:
taking values in the Natural Numbers and ##\{0\}##.
Unless this is the kind of hotel that rents rooms by the hour I think the range is strictly positive :wink:

WWGD said:
As such, we can compute its deciles, including median, etc., unless I'm missing something.
Yes of course, I'm still not seeing where the difficulty lies?

WWGD said:
Each date described as nth day of the year.
That will cause problems over year ends; I would be inclined to convert to some other format such as posix timestamp so you have ## days = \lfloor \frac{end - start}{86,400} \rfloor ##.
 
  • Like
Likes WWGD
  • #7
pbuk said:
I would rather say length of stay can be modeled by a discrete random variable.Unless this is the kind of hotel that rents rooms by the hour I think the range is strictly positive :wink:Yes of course, I'm still not seeing where the difficulty lies?That will cause problems over year ends; I would be inclined to convert to some other format such as posix timestamp so you have ## days = \lfloor \frac{end - start}{86,400} \rfloor ##.
But In order to compute the order statistics, I need to know the distribution of the length of stay. How do I do that? Maybe a bootstrap? Sorry if my question is too simple. I'm not too familiar with this topic.
 
Last edited:
  • #8
WWGD said:
I need to know the distribution of the length of stay. How do I do that?
By constructing the sample space ## \{ l_i \} = \{ depart_i - arrive_i \} ##.
 
  • Like
Likes WWGD
  • #9
WWGD said:
I wanted to define order statistics on the length of stay = Departure Date - Arrival Date.

More vocabulary issues: For a specific sample of data, "order statistics" is already a defined term - just like "sample mean" is already a defined term. An order statistic for a specific set of data is a constant. Considering it as a formula for computing that number, an order statistic is a random variable.

The same applies to terms like "quartiles" except that one might also apply such a term to a probability distribution instead of a sample. If you think of "quartiles" applying to a probability distribution then they are population parameters instead of sample statistics.
 
  • Like
Likes WWGD
  • #10
For a concrete example, if the source data is a SQL table we might have
SQL:
SELECT
  decile
  , MIN(stay) AS min
  , MAX(stay) AS max
  , AVG(stay) AS mean

FROM (
  SELECT
    DATEDIFF(depart, arrive) AS stay
    , NTILE (10) OVER (
      ORDER BY DATEDIFF(depart, arrive)
    ) AS decile
  FROM
    stays
) AS stays

GROUP BY
  decile
;
 
  • Like
Likes WWGD
  • #11
pbuk said:
For a concrete example, if the source data is a SQL table we might have
SQL:
SELECT
  decile
  , MIN(stay) AS min
  , MAX(stay) AS max
  , AVG(stay) AS mean

FROM (
  SELECT
    DATEDIFF(depart, arrive) AS stay
    , NTILE (10) OVER (
      ORDER BY DATEDIFF(depart, arrive)
    ) AS decile
  FROM
    stays
) AS stays

GROUP BY
  decile
;
Thanks. But how do I use this for a test on a given length of stay/ or to construct a confidence interval of some sort? Say the claim is made that average length of stay is 5 days. What statistic do I compute , and what is its distribution? What is an estimator for the population range?
 
Last edited:
  • #12
WWGD said:
what is its distribution?
That is indeed the question. You see we can't say anything about the relationship between a sample and the population unless we know how the data are distributed.
WWGD said:
Say the claim is made that average length of stay is 5 days.
Well let's say the claim is made that length of stay is Poisson distributed with a mean value of 5 days. We could test this with a chi-squared test.

WWGD said:
What is an estimator for the population range?
Again that depends on the distribution: many distributions including Poisson have no upper bound. On the other hand a linear distribution is bounded. But think about the implications of this: are you saying that by looking at a sample of some people that have stayed in some hotels during a certain period you want to draw a conclusion that nobody stays in any hotel ever for more than n days?

This situation has other dangers. Let's say you calculate that the average length of stay is 5 days, what does this actually tell you? Certainly not that people who stay for 5 days are your most important customers, for two reasons:
  1. You may not have any customers who stay for 5 days - the mean could be made up of 100 stays of 1 night and 200 stays of 7 nights! This points towards the bigger problem:
  2. Length of stay is probably not a useful statistic anyway, you probably want length of stay squared, which is what hotels generally measure although in a slightly different form: they look at bed nights, or room nights. So in the above example we would see 100 bed nights on stays of 1 night and 1,400 bed nights on stays of 7 nights, so the average length of stay would be (100 x 1 + 1,400 x 7) / 1,500 = 6.6 nights.
So if you are looking for concise answers to come from means and variances you are going to have to know a lot more about the population distribution.
 
  • Like
Likes WWGD

FAQ: Range of Difference: Bounds for Length of Stay

What is the Range of Difference in terms of Length of Stay?

The Range of Difference in terms of Length of Stay refers to the difference between the shortest and longest length of stay for a particular group or population. It is a measure of the variability or spread of data in terms of length of stay.

Why is it important to know the Range of Difference in Length of Stay?

Knowing the Range of Difference in Length of Stay can provide valuable insights into the characteristics of a group or population. It can help identify outliers or extreme values in length of stay, and can also be used to compare different groups or time periods.

How is the Range of Difference calculated?

The Range of Difference is calculated by subtracting the shortest length of stay from the longest length of stay in a set of data. For example, if the shortest length of stay is 3 days and the longest is 10 days, the Range of Difference would be 7 days (10-3=7).

Can the Range of Difference be negative?

No, the Range of Difference cannot be negative. It is always a positive value, as it represents the difference between two numbers.

How does the Range of Difference differ from other measures of variability?

The Range of Difference is a simple measure of variability that only takes into account the two extreme values in a set of data. Other measures of variability, such as standard deviation or variance, consider all values in the data set and provide a more precise measure of variability.

Similar threads

Replies
15
Views
3K
Replies
7
Views
10K
Replies
191
Views
24K
Replies
19
Views
4K
Replies
11
Views
2K
Replies
4
Views
3K
Replies
13
Views
2K
Replies
20
Views
3K
Back
Top