22 Comments
User's avatar
Adam D. Borecky, MD's avatar

I agree that the 50% dropout rate is misleading relative to our typical clinical situation. I’ve also found the titration schedule on the label to be too fast for tolerability. I also suspect that real world practice includes a lot more cross-tapering and off-label augmentation which wasn’t captured by the 52-week OLE. I have been getting some akathisia, but this is confounded by polypharmacy

Annie's avatar

It will be interesting to see, as things move forward, how much the GI side effects become a barrier to its use. While the availability of Cobenfy may be useful for patients with pre-existing obesity, diabetes, high triglycerides, etc., it would be problematic if it causes vomiting resulting in patients vomiting up all their meds, or the constipation/diahorrea results in medicine toxicity/vitamin deficiencies etc.

Side effects like nausea, constipation, diahorrea, can be a real barrier to taking a medication. Such symptoms make daily life difficult and can prevent a patient from even being able to leave the house. So, Cobenfy may help a proportion of patients towards some form of stability but a subset of those same patients may be in the miserable position of having to weigh that against never being 5 secs away from a loo, or unable to be in a car due to nausea.

Aussie Med Student's avatar

I notice that you call a 1 in a 1000 chance of clinically meaningful adverse cardiovascular effects "highly unusual" - that's the traditional mortality rate for measles pre vaccine, and measles is considered a highly lethal disease that's a huge threat to health even though there's the same level of likelihood of the adverse event in question.

When will public health authorities tell parents death is a "highly unusual" outcome if their child has measles??? (The framing of risk is something that interests me.)

The usual strategy is OMG your child could die... If that's a legitimate approach, so is OMG you could have a serious cardiovascular adverse effect! to the risk of clinically meaningful cardiovascular adverse events here.

Nils Wendel, MD's avatar

I think there is a sense in which communicating risk exists on two axes: (1) likelihood and (2) seriousness of outcome. I think a 1 in 1000 risk of death can reasonably be messaged differently than a 1 in 1000 risk of reversible cardiovascular adverse events. How to do that is an interesting question and something we are not terribly consistent about, but I'm not sure that your example makes the point you want it to.

Benjamin Classen's avatar

Great recap! I’m wondering about the 2 deaths which are supposedly treatment-emergent, any thoughts on that?

Nils Wendel, MD's avatar

I think you misread the paper? It says: "both deaths were assessed by investigators as unrelated to KarXT treatment."

Benjamin Classen's avatar

Ah okay, I indeed misread the paper

Michael Halassa's avatar

Nils, I think your perspective is reasonable but I'm not sure you're engaging with the heterogeneity. We've gone over this before so I won't rehash it but I think it is a serious issue. The average patient does not exist and these are all average data...

Nils Wendel, MD's avatar

Could you be more specific about what heterogeneity you’re referring to? Is it something about this paper in particular, or about Cobenfy in general?

Michael Halassa's avatar

Sure. It's neither about this paper in particular nor about XT specifically, it is about the framing more generally. This is how interpret RCT data and how I would think about making conclusions regarding a drug like XT (or any drug for any patient for that matter): https://michaelhalassa.substack.com/p/what-does-evidence-based-mean-for

Nils Wendel, MD's avatar

Ok, I read the article and I think I mostly agree with the point it's making. I'm not sure I understand why you brought this up here though. Is this just a general critique of my writing? Are you saying that I don't engage with study heterogeneity enough?

Michael Halassa's avatar

Not at all. I’m just saying that RCTs don’t engage with heterogeneity. I think your writing is thoughtful. I’m just saying that reporting on the RCT without the larger context we deal with (and I know you think about this hard) can give the impression that they dictate treatment despite inter-individual variability. Again, I get this is not your goal so perhaps I’m overstepping and if I did, I apologize.

Nils Wendel, MD's avatar

Oh no worries, didn't mean for my tone to come off as upset, I was just genuinely confused as to what you were getting at!

I take your point, I'm just trying to be a bit more succinct with some of the more factual stuff I'm doing (like this particular essay) so I can spend more time being long winded with some more philosophical and psychological pieces I'm working on.

Michael Halassa's avatar

Understood. This XT thing is particularly on my mind, because I think you are right to point out that for positive symptoms, it seems to work more like an Abilify than a Zyprexa. But for the negative/cognitive, it’s much better than either (for some people). To demonstrate that through an RCT would be a huge undertaking, but regardless, clinical experience at least indicates it.

What’s unfathomable to me is that I can have someone on the inpatient unit demonstratively improve on XT (nursing notes would highlight brightening, improved clarity and more interactions) only to come back and find out that a covering psychiatrist switched them to a traditional antipsychotic without any explanation. Same is true with outpatient providers; I’ve had people decompensate and come back inpatient because of switching without rationale. Anyway, I recognize this may sound like a bit of a tangent but wanted to give more context where some of my original comments were coming from.

Benjamin Classen's avatar

The average treatment effect from an RCT may very well be the best estimate EVEN IF the treatment effect is heterogeneous, depending on the amount of heterogeneity and the precision with which we can stratify.

Furthermore, I don’t see the empirical evidence that psychiatric drug treatment effects are heterogeneous to any appreciable degree.

And the main bottleneck is this: how many subgroups do you want to define? Say we devise 3 dimensions of stratification where we declare two subgroups along each dimension (eg biomarker X being low or high), that yields 8 groups, if you want to avoid lumping patient groups together as the spirit of precision medicine dictates. Would you then run 8 RCTs? Who is going to run these trials?

If treatment effects of psychiatric drugs were truly heterogeneous, maybe precision medicine‘s search for the heterogeneity would have borne more fruit in the last 70 years of basically using identical drugs.

Michael Halassa's avatar

Benjamin, this response is very reasonable and I don’t disagree with the spirit of what you’re saying. The bottleneck is exactly right and it is a formidable challenge indeed. The reason I’m enthusiastic about precision medicine in psychosis is exactly because of your last point: how do you expect any precision if all you have is variation on the same d2 blocker? But now we have muscarinic agents (XT is just the first), and other mechanisms in the pipeline. But you’re right about the practical impediment of realizing this, and it’s the reason why clinical intuition can be important and shouldn’t be automatically rejected as noise.

Benjamin Classen's avatar

The value of clinical intuition is clear based on the fact that all major psychiatric medications, xanomeline included, was discovered by clinical observation. This to me is clear. But this is looking for novel medications and totally sensible. What I am skeptical of is searching for interactions between medication and patient covariates (which could not even be interpreted as causal, since covariates can usually not be randomly assigned). This search would need to be guided by a strong understanding of the involved pathophysiology, ie not really feasible in psychiatry currently.

Michael Halassa's avatar

Thanks Benjamin. I agree with your point about the role of clinical observation in discovering new treatments. That has clearly been central in all of medicine, including psychiatry.

I’m not sure the distinction between drug discovery and moderator identification is as clear though. The same kind of observation that generates the hypothesis that a drug works can also generate the hypothesis that it works differently across patients. The causal interpretation is more challenging as you point out and your comment about 8 RCTs is great in that context. I think the practical solution will almost certainly be an optimization that stratifies up to a point compatible with randomized testing.

Now as far as the pathophysiology is concerned, I agree with you. Mechanistic understanding should guide how we think about response heterogeneity. In this case there are at least some biologically grounded priors: muscarinic receptor expression patterns, how Ach works, how people tend to reason by relying on semantic vs working memory etc… that make differential response plausible based on inter-individual variability of all these factors.

Where I would see things differently is in treating mechanism and stratification as strictly sequential. In many areas of medicine, predictive response patterns are identified empirically first and only later mapped onto underlying biology. The process tends to be iterative: early signals guide stratification, which in turn helps refine mechanistic understanding.

Again, I think your points are very reasonable, I just see some of the components differently

Benjamin Classen's avatar

The average patient does not exist, but the best estimate of treatment efficacy for any given individual is still the group estimate from an RCT. In order to claim the contrary, please conduct another RCT in a predefined subgroup and show its treatment effect is higher. Spurious (machine learning based) post-hoc analyses of subgroups predicting response is a false-positive fabric. Additionally, impressions of varying treatment effects based on clinical experience cannot disentangle varying treatment effect from varying covariate effect.

Michael Halassa's avatar

Thank you for your engagement.

The average treatment effect from an RCT is the best estimate for an individual only under the assumption of treatment effect homogeneity. When treatment effects are heterogeneous, which is common in psychiatry, the group mean is simply the average of many different individual responses rather than the optimal prediction for a given patient. That is precisely why the field of heterogeneous treatment effects and precision medicine exists.

In practice, observational data, biomarkers, and predictive modeling are often used to identify candidate response signatures before testing them prospectively in stratified trials. Those exploratory steps are usually what make the subgroup RCT possible in the first place.

Regarding machine learning analyses, the key issue is whether they are properly cross-validated and prospectively framed. Poorly controlled analyses can certainly generate false positives, but well-validated predictive models are widely used in medicine to generate hypotheses about response heterogeneity.

Real-world data analyses play an important role. Methods such as propensity matching and covariate adjustment are designed to separate covariate effects from treatment effects in observational cohorts. They are imperfect, as all non-randomized analyses are, but they are an important part of how precision medicine develops.

If treatment effects were truly homogeneous, precision medicine would not exist as a scientific field.