Different results for factor vs continuous

FallenApple · Apr 27, 2017

So, I'm doing an interaction model with response vs treatment_type interaction with age+controls(for confounders) with age being continous, say patients ranging from 20 years old to 90 years old.

so I have two models.
y=age+treatment_type . + . age*treatment_type

y=factor(age)+treatment_type . + . factor(age)*treatment_type

Basically what I got was that age is highly significant for the continuous model. For slight deviations in age, there is a huge effect on of treatment.

However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)Why? What could be the reason for this. Is the reason why it's so significant for the continuous model is that patients simply differ from each other so much, that it seems like age has an effect, but it really doesn't.

Afterall, it has no effect when looking at groups. But somehow, within the groups, slight deviations give a large effect.

FactChecker · Apr 27, 2017

Just to clarify, how are your age factor levels defined?

Dale · Apr 27, 2017

FallenApple said:

For slight deviations in age, there is a huge effect on of treatment

That seems suspicious on its own. If an effect is large then you should clearly see it when you plot the data even without doing statistics. Is that the case?

How do your regression diagnostic plots look? Do you have some high leverage or otherwise suspicious points?

FallenApple said:

However, when age is factorized into groups, there is barely any interaction effect. (pvalue not significant)

How many groups have you factored age into? If you have factored it into many groups then you will have a model with a large number of degrees of freedom. A good statistics package will take that into account and reduce the significance correspondingly.

FallenApple · Apr 28, 2017

FactChecker said:

Just to clarify, how are your age factor levels defined?

I've split it up into 4 sections. So basically 20-40, 40-60 etc.

FallenApple · Apr 28, 2017

Dale said:

That seems suspicious on its own. If an effect is large then you should clearly see it when you plot the data even without doing statistics. Is that the case?

How do your regression diagnostic plots look? Do you have some high leverage or otherwise suspicious points?

Many. But I can't reject those because they occur due to some systematic process. I've accounted for that by using a negative binomial link.

How many groups have you factored age into? If you have factored it into many groups then you will have a model with a large number of degrees of freedom. A good statistics package will take that into account and reduce the significance correspondingly.

Just 4. But I've factored it again into many. And here's the plot.

It seems that they are maybe canceling.

FactChecker · Apr 28, 2017

Looking at your data, it looks like there is only one (solid line treatment, age (18.9, 27.4]) combination that is significantly different from the others. (Are the different lines different treatments?) Is there much data in that combination category or could it be a small-sample outlier?

I recommend that you statistically analyse the one glaring (solid line treatment, age (18.9, 27.4]) combination as one step and then look at the others in a separate statistical analysis.

Dale · Apr 28, 2017

FallenApple said:

And here's the plot

That doesn't look like it should be non significant. How does the data itself look. Can you see the interaction in the raw data?

FallenApple · Apr 28, 2017

Dale said:

That doesn't look like it should be non significant. How does the data itself look. Can you see the interaction in the raw data?

When I increase the number of partitions in the factor, it seems that there it follows the same trend(just a bunch of zigzags with the solid one being the most prominant). I think that is why for continuous age, it's highly significant, because even one slight increment in the age could send it in a certain direction.

FallenApple · Apr 28, 2017

FactChecker said:

Looking at your data, it looks like there is only one (solid line treatment, age (18.9, 27.4]) combination that is significantly different from the others. (Are the different lines different treatments?) Is there much data in that combination category or could it be a small-sample outlier?

I recommend that you statistically analyse the one glaring (solid line treatment, age (18.9, 27.4]) combination as one step and then look at the others in a separate statistical analysis.

So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway

FactChecker · Apr 28, 2017

FallenApple said:

So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway

From the looks of the data, I think that it would be very misleading to allow the extreme result from one combination of (treatment, age) to influence your conclusions about the other combinations. In fact, the other treatments show, if anything, a slight bit of the opposite trend. If you do not address that combination separately, I don't think your conclusions will have any merit.

Dale · Apr 28, 2017

FallenApple said:

So split the data into two different sets? It is a small sample. Less than 10% of the data set. Yet, there's only a small amount of people under this treatment option in the first place. So every data point counts. The response is a count of relatively rare events( negative side effect) so most of the response would be zero anyway

Then it doesn't sound like you will have enough data points to justify a large number of degrees of freedom. That is probably driving the lack of significance somewhat.

Also, do you see this interaction when you plot the data itself (not the fit)? I think I have asked this three times now.

FallenApple · Apr 28, 2017

Dale said:

Then it doesn't sound like you will have enough data points to justify a large number of degrees of freedom. That is probably driving the lack of significance somewhat.

Also, do you see this interaction when you plot the data itself (not the fit)? I think I have asked this three times now.

I see. That makes sense. So generally, would I need to have the samples balanced?

I thought of one thing, so it might not work because there is such a low number within that combination. Like in the tens compared to over a thousand total.

But if age is continuous, then there is no combination sample of data. Its just the whole data set. Is this the correct way to see it?

I'm not sure what you mean. The pattern that I plotted was based on the data. It wasn't derived from a regression.

Different results for factor vs continuous

FAQ: Different results for factor vs continuous

1. What is the difference between a factor and continuous variable in a scientific study?

2. Why is it important to distinguish between factor and continuous variables in a study?

3. Can a variable be both a factor and continuous?

4. How do you determine if a variable should be treated as a factor or continuous?

5. What are some common statistical tests used for factor and continuous variables?

Similar threads

Hot Threads

Recent Insights