- #1
lavoisier
- 177
- 24
Hello,
yesterday at work there was a discussion about genotoxicity testing, and a point caused some controversy. I wonder if you could shed some light, please.
To give you some context: we work in drug discovery, and before a substance can be administered to people, we need to check that it's not toxic, and in particular not genotoxic.
To assess that, we need to run a genotoxicity assay (called micronucleus, more information here: https://en.wikipedia.org/wiki/Micronucleus_test).
This assay has two versions: in vitro and in vivo. The in vivo assay is the one that matters (legally) to decide if the substance can be used in humans, but it's very expensive, so we tend to run the in vitro one first as an initial screen of (relatively) many substances. If the in vitro assay is negative, we run the in vivo assay for confirmation of negativity (non-genotoxicity). If the in vitro assay is positive, we must decide whether to drop the substance or run it in the in vivo one to de-risk it (because if the in vivo assay is negative, it does not matter that the in vitro one was positive: it is then considered an in vitro false positive; so we could say that the in vivo assay is considered a golden standard telling us the real genotoxicity).
And here's where Bayes comes into play (I think).
The in vitro assay is reported to predict in vivo genotoxicity with a sensitivity of 90% and a specificity of 40% (i.e. with a massive FPR of 60%).
One colleague raised the question: is it worth running the in vitro assay as an initial screen of potential candidate substances, given that among those that test positive, many of them will probably be false positives?
The doubt in that case is whether to progress them to in vivo, i.e. spend the money perhaps just to confirm that they are actually genotoxic = true positives, or not progressing them even though they may actually be non-genotoxic = false positives?
[Consider that substances that get to this stage have undergone a long and expensive discovery process, and if they make it to human testing they can potentially generate a large return on the investment, so it is financially very risky to drop them unless it's quite sure they are toxic. That's why people are more concerned about false positives than about false negatives in this context].
On the whole I tend to agree. One would be tempted to say: just as well to select your substances by other criteria and only run the few you consider 'best' directly in the definitive in vivo assay.
However, I have doubts on this reasoning, in particular related to the prior probability of true genotoxicity (let's call it P(G); hence P(nonG) = 1-P(G), as there are only these two categories).
From the theory we know that P(G) is needed to calculate P(G|T+), where T+ is a positive outcome in the in vitro assay, and more importantly in this context, its complementary P(nonG|T+):
P(nonG|T+) = P(T+|nonG) * P(nonG) / P(T+) = ... = FPR*(1-P(G)) / [FPR*(1-P(G)) + TPR*P(G)]
Trouble is I have no idea what P(G) is in the set of substances one may decide test. It's not like the prevalence of a disease in a population, for which we have historical estimates, and perhaps even information on specific subsets. Here we can choose any set of substances, and some sets can have higher or lower P(G) depending on the structures. But I don't know if and how that can be estimated.
Say that I take an indifferent approach and decide P(G) = 50%. Then:
P(nonG|T+) = 60%*50% / [60%*50% + 90%*50%] = 40%
This seems to oppose my colleague's view: if the test is positive, it is more likely that the substance is genotoxic (60%) than it isn't (40%), although quite weakly.
However, say that I have historical data saying that substances randomly chosen from chemical space are 25% likely to be genotoxic, and that the substances we may decide to test as drug candidates are essentially random themselves. Then P(G) = 25% and:
P(nonG|T+) = 60%*75% / [60%*75% + 90%*25%] = 66.7%
This seems to support my colleague's view: when the in vitro assay is positive, it's actually more likely that the substance is non-genotoxic (66.7%) than it is (33.3%).
I am aware that in terms of odds I am always doing better using the assay than not using it, because:
P(nonG|T+)/P(nonG) = 40% / 50% = 0.8, in the first case
and
P(nonG|T+)/P(nonG) = 66.7% / 75% = 0.89, in the second case
so I know that choosing which substances to progress and which not to progress based on the positivity of the in vitro assay is always better than random.
But people who don't appreciate such statistical subtleties will not be impressed if we tell them to drop one (or a series of) otherwise promising substance(s) based on the positivity of the in vitro assay, while at the same time we tell them that the probability that this positivity would be confirmed in vivo is only 33.3%. They don't care that this is higher than in the 'starting' set (25%): they only see that we are killing their substance based on a laughable 1/3 probability that it's bad. They don't like that, believe me.
And in fact ours is a more or less arbitrary estimate of P(G), which however influences very much the final probability that a positive result in the pre-screen assay is confirmed in the 'golden standard' one.
Question 1: could we elaborate and/or present the data in such a way to make the assessment/decision about individual substances independent of the choice of prior?
The second doubt I have is whether the above story would change for series of structurally closely related substances, which is usually what we test in the in vitro assay.
I mean that we don't really sample randomly from chemical space: we tend to sample relatively small regions of it where molecules are rather similar to one another. And there is a general principle in medicinal chemistry, stating that structurally similar molecules tend to have similar biological activity (of which genotoxicity is one type). This principle has important exceptions (activity cliffs), but on the whole it's not going to be too wrong.
So even when we have information on P(G) for 'random', generic molecules, it may well be that the specific set of molecules a team submits to the in vitro micronucleus assay has an inherently higher (or lower) P(G) than the general 'population'.
Would you take that into account in other contexts, e.g. in a laboratory test for a disease? If you know that a specific ethnic group has a higher incidence of the disease, and a person from this ethnic group tests positive, will you use P(disease) for the general population or P(disease)' for that ethnic group, to calculate P(disease|test positive)?
This also led me to wonder whether getting multiple in vitro positive results for a series of closely related substances should make us change the estimate for P(G).
Say for instance that a team tested in vitro 10 very similar substances, and 8 of them were positive, 2 negative.
The probability that all 8 positives are true positives, based on a P(G) of 25% is (I think - please correct me if I'm wrong):
"P(8 G | 8 T+)" = [P(G|T+)]8 = [33.3%]8 = 0.015%
So there is a lot of hope that at least one of them can be salvaged by de-risking in the in vivo assay, (although that will be expensive).
However, isn't the fact that 8 out of 10 substances turned out positive in vitro suspicious, if the hypothesis that P(G) = 25% were true?
If we took 10 random substances from a set with P(G) = 25%, on average 2.5 of them would be genotoxic, 7.5 non-genotoxic. If we then tested all of them in the in vitro assay, on average 2.5*90% + 7.5*60% = 6.75 would turn out positive, 2.5*10% + 7.5*40% = 3.25 would turn out negative.
So shouldn't we review our estimate of P(G) in order for the averages to match our observations?
In this case: P(G) = (8/10 - 60%)/(90%-60%) = 66.7%.
Which would lead to a very different P(G|T+) = 90%*66.7% / [90%*66.7% + 60%*33.3%] = 75%
I am not convinced by this approach, though, because it looks quite circular, and would lead to inconsistencies in case the number of in vitro positives is more than what is expected. E.g. if we had 10 positives out of 10, by the above logic we should conclude P(G) = 40%/30% = 133% (!), because we don't accept that no negatives would be found. But this does happen in reality: there are series of molecules that are ALL positive in vitro...
I have the impression that P(G) is really something that one must decide a priori and then leave alone; or am I wrong?
I tried reading some other threads and articles (see links below), but most are well above my level, and seem to address much more complex cases than this simple one I'm tackling without success.
https://www.physicsforums.com/threa...is-inference-predictions.829722/#post-5216570
https://www.physicsforums.com/threa...is-inference-predictions.829722/#post-5216570
https://cran.r-project.org/web/packages/LaplacesDemon/vignettes/BayesianInference.pdf
Question 2: is the observation of a multiple in vitro micronucleus positives in closely related substances an indication that the underlying prior probability is higher for these substances than for generic chemical space? And if so, how should it be calculated?
Sorry for the long post - this is complicated stuff (at least, it is for me).
Thanks
L
yesterday at work there was a discussion about genotoxicity testing, and a point caused some controversy. I wonder if you could shed some light, please.
To give you some context: we work in drug discovery, and before a substance can be administered to people, we need to check that it's not toxic, and in particular not genotoxic.
To assess that, we need to run a genotoxicity assay (called micronucleus, more information here: https://en.wikipedia.org/wiki/Micronucleus_test).
This assay has two versions: in vitro and in vivo. The in vivo assay is the one that matters (legally) to decide if the substance can be used in humans, but it's very expensive, so we tend to run the in vitro one first as an initial screen of (relatively) many substances. If the in vitro assay is negative, we run the in vivo assay for confirmation of negativity (non-genotoxicity). If the in vitro assay is positive, we must decide whether to drop the substance or run it in the in vivo one to de-risk it (because if the in vivo assay is negative, it does not matter that the in vitro one was positive: it is then considered an in vitro false positive; so we could say that the in vivo assay is considered a golden standard telling us the real genotoxicity).
And here's where Bayes comes into play (I think).
The in vitro assay is reported to predict in vivo genotoxicity with a sensitivity of 90% and a specificity of 40% (i.e. with a massive FPR of 60%).
One colleague raised the question: is it worth running the in vitro assay as an initial screen of potential candidate substances, given that among those that test positive, many of them will probably be false positives?
The doubt in that case is whether to progress them to in vivo, i.e. spend the money perhaps just to confirm that they are actually genotoxic = true positives, or not progressing them even though they may actually be non-genotoxic = false positives?
[Consider that substances that get to this stage have undergone a long and expensive discovery process, and if they make it to human testing they can potentially generate a large return on the investment, so it is financially very risky to drop them unless it's quite sure they are toxic. That's why people are more concerned about false positives than about false negatives in this context].
On the whole I tend to agree. One would be tempted to say: just as well to select your substances by other criteria and only run the few you consider 'best' directly in the definitive in vivo assay.
However, I have doubts on this reasoning, in particular related to the prior probability of true genotoxicity (let's call it P(G); hence P(nonG) = 1-P(G), as there are only these two categories).
From the theory we know that P(G) is needed to calculate P(G|T+), where T+ is a positive outcome in the in vitro assay, and more importantly in this context, its complementary P(nonG|T+):
P(nonG|T+) = P(T+|nonG) * P(nonG) / P(T+) = ... = FPR*(1-P(G)) / [FPR*(1-P(G)) + TPR*P(G)]
Trouble is I have no idea what P(G) is in the set of substances one may decide test. It's not like the prevalence of a disease in a population, for which we have historical estimates, and perhaps even information on specific subsets. Here we can choose any set of substances, and some sets can have higher or lower P(G) depending on the structures. But I don't know if and how that can be estimated.
Say that I take an indifferent approach and decide P(G) = 50%. Then:
P(nonG|T+) = 60%*50% / [60%*50% + 90%*50%] = 40%
This seems to oppose my colleague's view: if the test is positive, it is more likely that the substance is genotoxic (60%) than it isn't (40%), although quite weakly.
However, say that I have historical data saying that substances randomly chosen from chemical space are 25% likely to be genotoxic, and that the substances we may decide to test as drug candidates are essentially random themselves. Then P(G) = 25% and:
P(nonG|T+) = 60%*75% / [60%*75% + 90%*25%] = 66.7%
This seems to support my colleague's view: when the in vitro assay is positive, it's actually more likely that the substance is non-genotoxic (66.7%) than it is (33.3%).
I am aware that in terms of odds I am always doing better using the assay than not using it, because:
P(nonG|T+)/P(nonG) = 40% / 50% = 0.8, in the first case
and
P(nonG|T+)/P(nonG) = 66.7% / 75% = 0.89, in the second case
so I know that choosing which substances to progress and which not to progress based on the positivity of the in vitro assay is always better than random.
But people who don't appreciate such statistical subtleties will not be impressed if we tell them to drop one (or a series of) otherwise promising substance(s) based on the positivity of the in vitro assay, while at the same time we tell them that the probability that this positivity would be confirmed in vivo is only 33.3%. They don't care that this is higher than in the 'starting' set (25%): they only see that we are killing their substance based on a laughable 1/3 probability that it's bad. They don't like that, believe me.
And in fact ours is a more or less arbitrary estimate of P(G), which however influences very much the final probability that a positive result in the pre-screen assay is confirmed in the 'golden standard' one.
Question 1: could we elaborate and/or present the data in such a way to make the assessment/decision about individual substances independent of the choice of prior?
The second doubt I have is whether the above story would change for series of structurally closely related substances, which is usually what we test in the in vitro assay.
I mean that we don't really sample randomly from chemical space: we tend to sample relatively small regions of it where molecules are rather similar to one another. And there is a general principle in medicinal chemistry, stating that structurally similar molecules tend to have similar biological activity (of which genotoxicity is one type). This principle has important exceptions (activity cliffs), but on the whole it's not going to be too wrong.
So even when we have information on P(G) for 'random', generic molecules, it may well be that the specific set of molecules a team submits to the in vitro micronucleus assay has an inherently higher (or lower) P(G) than the general 'population'.
Would you take that into account in other contexts, e.g. in a laboratory test for a disease? If you know that a specific ethnic group has a higher incidence of the disease, and a person from this ethnic group tests positive, will you use P(disease) for the general population or P(disease)' for that ethnic group, to calculate P(disease|test positive)?
This also led me to wonder whether getting multiple in vitro positive results for a series of closely related substances should make us change the estimate for P(G).
Say for instance that a team tested in vitro 10 very similar substances, and 8 of them were positive, 2 negative.
The probability that all 8 positives are true positives, based on a P(G) of 25% is (I think - please correct me if I'm wrong):
"P(8 G | 8 T+)" = [P(G|T+)]8 = [33.3%]8 = 0.015%
So there is a lot of hope that at least one of them can be salvaged by de-risking in the in vivo assay, (although that will be expensive).
However, isn't the fact that 8 out of 10 substances turned out positive in vitro suspicious, if the hypothesis that P(G) = 25% were true?
If we took 10 random substances from a set with P(G) = 25%, on average 2.5 of them would be genotoxic, 7.5 non-genotoxic. If we then tested all of them in the in vitro assay, on average 2.5*90% + 7.5*60% = 6.75 would turn out positive, 2.5*10% + 7.5*40% = 3.25 would turn out negative.
So shouldn't we review our estimate of P(G) in order for the averages to match our observations?
In this case: P(G) = (8/10 - 60%)/(90%-60%) = 66.7%.
Which would lead to a very different P(G|T+) = 90%*66.7% / [90%*66.7% + 60%*33.3%] = 75%
I am not convinced by this approach, though, because it looks quite circular, and would lead to inconsistencies in case the number of in vitro positives is more than what is expected. E.g. if we had 10 positives out of 10, by the above logic we should conclude P(G) = 40%/30% = 133% (!), because we don't accept that no negatives would be found. But this does happen in reality: there are series of molecules that are ALL positive in vitro...
I have the impression that P(G) is really something that one must decide a priori and then leave alone; or am I wrong?
I tried reading some other threads and articles (see links below), but most are well above my level, and seem to address much more complex cases than this simple one I'm tackling without success.
https://www.physicsforums.com/threa...is-inference-predictions.829722/#post-5216570
https://www.physicsforums.com/threa...is-inference-predictions.829722/#post-5216570
https://cran.r-project.org/web/packages/LaplacesDemon/vignettes/BayesianInference.pdf
Question 2: is the observation of a multiple in vitro micronucleus positives in closely related substances an indication that the underlying prior probability is higher for these substances than for generic chemical space? And if so, how should it be calculated?
Sorry for the long post - this is complicated stuff (at least, it is for me).
Thanks
L