Statistical analysis of COVID reinfection

In summary, the conversation discusses the use of a Markov chain model to determine the probability of reinfection with COVID-19. The speaker also suggests using logistic regression or searching for real medical studies for more insight. The possibility of using statistical tests, such as a Chi-square test of homogeneity or goodness of fit, is also mentioned. However, it is noted that there may not be enough data to reach significant conclusions.
  • #1
Vrbic
407
18
TL;DR Summary
How sophistically calculate a probability of covid reinfection from available data?
Hi, I'm a physicist so I have a basic knowledge of probability and hypothesis testing etc. I would like to more sophistically calculate from available data in my country whether ones Covid infected people have a statistically significant different probability of reinfection than people who are infected the first time.

Let's define reinfection as two infections (proved by the test) with at least a 60 days period between them or more.

An available data are:
Numbers of total infected, number of totals healed, both day by day (ie I immediately know a status 60 days before) and three records of reinfected (data in January, February and March). I know that three records of reinfection are not much but... at least to get some guess.

My question is what procedure to use to find the answer to whether reinfection is more or less probable than ordinary infection.

Thank you all for comments.
 
Physics news on Phys.org
  • #2
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.
 
  • #3
Dale said:
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Thank you for your advice, I try it!
 
  • #4
Vrbic said:
... sophistically
... sophistically
1616502703587.png
 
  • Like
Likes Vrbic
  • #5
Dale said:
You would probably use a Markov chain model. Maybe four states: uninfected, sick, dead, and recovered. There are no transitions into uninfected and no transitions out of dead. There is also no transition from uninfected to recovered. If you consider only COVID deaths then there would also be no transition from uninfected or recovered to dead.

With only three data points on the reinfection I doubt you will be able to get a good estimate of that probability.

Hi, again thank you for your response.
I read something about the Markov chain and I probably don't make it. Could you please send me some similar solved problem where I could learn them? Is it alright to change transition probabilities during the time? I mean in January probability of sickness (transition uninfected -> sick) or reinfection (recovered -> sick) will be different than in February etc.

The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?
 
  • #6
Vrbic said:
Could you please send me some similar solved problem where I could learn them?
Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Vrbic said:
Is it alright to change transition probabilities during the time?
Yes, but from what you described you probably do not have enough data to assess that.

Vrbic said:
The first time I was thinking about that I thought about some hypothesis testing (I'm familiar only with ANOVA). Isn't it a better approach to this problem?
You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.
 
  • #7
Dale said:
Not really. I am aware of Markov processes, but I have never actually done it myself so I have no practical experience or concrete suggestions to offer.

Maybe one of my colleagues here will have more insight.

Yes, but from what you described you probably do not have enough data to assess that.

You could do a logistic regression. I don’t know what would make it better, but it certainly could be done.
Thank you again, I will look at it.
 
  • #8
Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext
During the first surge (ie, before June, 2020), 533 381 people were tested, of whom 11 727 (2·20%) were PCR positive, and 525 339 were eligible for follow-up in the second surge, of whom 11 068 (2·11%) had tested positive during the first surge. Among eligible PCR-positive individuals from the first surge of the epidemic, 72 (0·65% [95% CI 0·51–0·82]) tested positive again during the second surge compared with 16 819 (3·27% [3·22–3·32]) of 514 271 who tested negative during the first surge (adjusted RR 0·195 [95% CI 0·155–0·246]). Protection against repeat infection was 80·5% (95% CI 75·4–84·5). The alternative cohort analysis gave similar estimates (adjusted RR 0·212 [0·179–0·251], estimated protection 78·8% [74·9–82·1]). In the alternative cohort analysis, among those aged 65 years and older, observed protection against repeat infection was 47·1% (95% CI 24·7–62·8). We found no difference in estimated protection against repeat infection by sex (male 78·4% [72·1–83·2] vs female 79·1% [73·9–83·3]) or evidence of waning protection over time (3–6 months of follow-up 79·3% [74·4–83·3] vs ≥7 months of follow-up 77·7% [70·9–82·9]).
Interpretation
Our findings could inform decisions on which groups should be vaccinated and advocate for vaccination of previously infected individuals because natural protection, especially among older people, cannot be relied on.
 
  • Like
Likes FactChecker and Dale
  • #9
It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.
 
  • #10
BWV said:
Why waste time doing this yourself when you are not qualified? Do you want your doctor trying to do physics? Just search some real medical studies, like this:

https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(21)00575-4/fulltext
Thank you very much, I will read it.

I understand your point and agree with it, but I don't want to let my doctor publish any physics result, but when it is his hobby, why do not let him have fun at a low level. And that was my query. An easy way how to do it (not precise, but use some solved problem from a book for example) and have fun with that.
 
  • Like
Likes FactChecker and Dale
  • #11
FactChecker said:
It may help if you describe what you want to statistically test.
Do you want to show, with some certainty, that the rate of reinfection is reduced by the vaccine? If so, consider using a Chi-square test of homogeneity. See https://stats.stackexchange.com/questions/226789/how-to-compare-ratios-in-r.
Do you want to show, with some certainty, that the rate of reinfection is 0 after the vaccine? If so, consider using a Chi-square goodness of fit test.
There are several options. See this for some.
Keep in mind that many statistical tests are typically used so that the null hypothesis is retained unless there is very strong statistical evidence otherwise. I am not sure that there is enough data to reach those levels of significance.
Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.
 
  • #12
Vrbic said:
Thank you! It seems very helpful.

I would like to test a hypothesis: The first infection and reinfection have the same probability.
I think it should be the first step, then if I refute this claim, and it will be enough data I would like to try to find the probability difference between the first infection and reinfection, but just to test this hypothesis is enough for now.
One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all. I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.
 
  • #13
FactChecker said:
One of the problems (there are many) with statistically analyzing this is that reinfection may be due to new variants that may not be in your past data at all.
I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.

I'm afraid that there is no way to address that issue except by knowing the exact nature of the variation, how quickly it is spreading, where it is in the world, and what the consequences of the change are. There are many variants now and many more on the way.
I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.
 
  • #14
Vrbic said:
I definitely agree, I may do a quite good guess by using date from an earlier era when only one mutation was here. And I understand, it means my result is lost in history because it will no longer the case.
Not only "lost in history", but only analyzing history rather than predicting anything useful.
I understand, but as I mentioned earlier, I want to get familiar with some new statistic method no to find the correct result, so for me, the journey is the destination.
Fair enough. One thing that you should know about statistics is that it is very treacherous ground. There are many possible ways that results can be biased and conclusions can be wrong. There are problems with correlated variables, self-selecting samples, time-varying distributions, reversing cause-and-effect, etc., etc., etc. When you get experience in spotting the problems, you will see them every day in studies reported in the news.
 

FAQ: Statistical analysis of COVID reinfection

What is statistical analysis of COVID reinfection?

Statistical analysis of COVID reinfection is a method of using mathematical and statistical techniques to analyze data related to individuals who have been infected with COVID-19 more than once. This analysis can help researchers understand the likelihood and patterns of reinfection.

Why is statistical analysis of COVID reinfection important?

Statistical analysis of COVID reinfection is important because it can provide valuable insights into the effectiveness of vaccines and natural immunity against the virus. It can also help identify risk factors for reinfection and inform public health policies and strategies.

What data is used in statistical analysis of COVID reinfection?

The data used in statistical analysis of COVID reinfection includes information on individuals who have been infected with COVID-19 multiple times, such as their age, gender, underlying health conditions, and the time between infections. Other data, such as vaccination status and antibody levels, may also be included.

How is statistical analysis of COVID reinfection conducted?

Statistical analysis of COVID reinfection involves collecting and organizing data, selecting appropriate statistical methods, and analyzing the data to identify patterns and trends. This can include techniques such as logistic regression, survival analysis, and time series analysis.

What are the limitations of statistical analysis of COVID reinfection?

Statistical analysis of COVID reinfection is limited by the quality and availability of data, as well as potential biases in the data. It is also important to note that statistical analysis alone cannot determine causation, and other factors may influence reinfection rates, such as changes in virus variants or individual behaviors.

Similar threads

Back
Top