Advertisement
Quantifying clinical relevance in treatments for psychiatric disorders Review article| Volume 33, ISSUE 12, PB49-B61, December 01, 2011

Solving the Antidepressant Efficacy Question: Effect Sizes in Major Depressive Disorder

      Abstract

      Background

      Numerous reviews and meta-analyses of the antidepressant literature in major depressive disorders (MDD), both acute and maintenance, have been published, some claiming that antidepressants are mostly ineffective and others that they are mostly effective, in either acute or maintenance treatment.

      Objective

      The aims of this study were to review and critique the latest and most notable antidepressant MDD studies and to conduct our own reanalysis of the US Food and Drug Administration database studies specifically analyzed by Kirsch et al.

      Methods

      We gathered effect estimates of each MDD study. In our reanalysis of the acute depression studies, we corrected analyses for a statistical floor effect so that relative (instead of absolute) effect size differences were calculated. We also critiqued a recent meta-analysis of the maintenance treatment literature.

      Results

      Our reanalysis showed that antidepressant benefit is seen not only in severe depression but also in moderate depression and confirmed a lack of benefit for antidepressants over placebo in mild depression. Relative antidepressant versus placebo benefit increased linearly from 5% in mild depression to 12% in moderate depression to 16% in severe depression. The claim that antidepressants are completely ineffective, or even harmful, in maintenance treatment studies involves unawareness of the enriched design effect, which, in that analysis, was used to analyze placebo efficacy. The same problem exists for the standard interpretation of those studies, although they do not prove antidepressant efficacy either, since they are biased in favor of antidepressants.

      Conclusions

      In sum, we conclude that antidepressants are effective in acute depressive episodes that are moderate to severe but are not effective in mild depression. Except for the mildest depressive episodes, correction for the statistical floor effect proves that antidepressants are effective acutely. These considerations only apply to acute depression, however. For maintenance, the long-term efficacy of antidepressants is unproven, but the data do not support the conclusion that they are harmful.

      Key words

      Introduction

      Much controversy has surrounded recent meta-analyses and randomized clinical trials (RCTs) of antidepressant efficacy in major depressive disorder (MDD), including in the nonscientific media. In this review, we use the concept of effect sizes to make clinical and scientific sense of what has become a cultural debate.
      Examined here are the most prominent RCTs or meta-analyses of RCTs published in the last 5 years for both acute and maintenance efficacy of antidepressants in MDD. A summary of the review of these studies is provided in Table I.
      Table ISummary of analysis of reviews of antidepressant efficacy in RCTs of MDD.
      StudyNTrials ReviewedEffect Sizes (95% CI)Comments
      Rush et al
      • Rush A.J.
      • Trivedi M.H.
      • Wisniewski S.R.
      • et al.
      Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.


      STAR*D RCT
      3671167% acute remission, 26% maintenance remissionNo pbo group. Good acute efficacy is shown, but maintenance efficacy is about one half less than acute efficacy.
      Kocsis et al
      • Kocsis J.H.
      • Thase M.E.
      • Trivedi M.H.
      • et al.
      Prevention of recurrent episodes of depression with venlafaxine ER in a 1-year maintenance phase from the PREVENT Study.
      and Kornstein et al
      • Kornstein S.G.
      • Kocsis J.H.
      • Ahmed S.
      • et al.
      Assessing the efficacy of 2 years of maintenance treatment with venlafaxine extended release 75-225 mg/day in patients with recurrent major depression: a secondary analysis of data from the PREVENT study.


      Maintenance RCT of venlafaxine vs pbo
      First maintenance study (year 0) n = 1096

      Second maintenance study (year 1) n = 114
      292% 2-year efficacy reported; this reflects 11% of original sample“Super-enrichment” design. Second maintenance study sample was only ∼10% of the initial sample
      Turner et al
      • Turner E.H.
      • Matthews A.M.
      • Linardatos E.
      • et al.
      Selective publication of antidepressant trials and its influence on apparent efficacy.


      MA of FDA database of RCTs
      12,564740.37 (0.33, 0.41) for published studies vs 0.15 (0.08, 0.22) for unpublished studies.

      ES of 0.31(0.27, 0.35) when all studies are combined.
      31% of studies were unpublished, accounting for 27.5% of the sample
      Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.


      MA of FDA database
      513335Overall standardized ES was 0.61. Absolute ES HDRS of 9.6 drug and 7.8 pbo.NICE criterion for clinical significance was absolute ES of 3 HDRS points or standardized ES of d = 0.5 for AD-pbo difference.

      Overall nonstandardized effect size of 0.32 increases to 0.40 when corrected for baseline severity (authors do not discuss)
      Horder et al
      • Horder J.
      • Matthews P.
      • Waldmann R.
      Placebo, Prozac and PLoS: significant lessons for psychopharmacology.


      Reanalysis of Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      513335Absolute HDRS difference between AD and pbo = 2.70 (including negative unpublished studies)Reanalysis was based on (1) random effects rather than fixed effects model as in Kirsch et al and (2) pooling ES differences study by study rather than summing all studies and then ES difference. These changes produce a much larger ES near the NICE threshold.
      Davis et al
      • Davis J.M.
      • Giakas W.J.
      • Qu J.
      • et al.
      Should we treat depression with drugs or psychological interventions? A reply to Ioannidis.


      Narrative summary of MAs and RCTs
      Not reportedNot reportedMean acute difference between AD and pbo = 23.6% Mean maintenance difference between AD and pbo = 36%Uncritical about bias toward ADs in maintenance studies using the enriched design
      Fountoulakis and Möller
      • Fountoulakis K.N.
      • Möller H.J.
      Antidepressant drugs and the response in the placebo group: the real problem lies in our understanding of the issue.


      Reanalysis of Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      513335Mean AD ES was 10.05, not 9.60, as in Kirsch et al. AD-pbo difference was 2.18, not 1.80 as in Kirsch et al. Venlafaxine and paroxetine absolute HDRS ES were 3.12 and 3.22, respectively, exceeding NICE threshold. Nefazodone and fluoxetine did not.Reanalysis was based on weighting the mean difference by sample size.
      Andrews et al
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.


      MA of maintenance

      RCTs
      345446Risk difference AD-pbo for relapse = 0.20, meaning 20% increased rate of relapse with AD than with pbo.MA used an “enriched” design in favor of the pbo arm.
      Briscoe and El-Mallakh
      • Briscoe B.E.
      • El-Mallakh R.S.
      The evidence for the long-term use of antidepressants as prophylaxis against future depressive episodes.


      Reanalysis of maintenance RCTs
      44955 RCTs examined for AD efficacy after 6 mo. Four of 5 studies showed no benefit with AD over pbo.Only analysis to correct for enriched design, which is biased in favor of ADs. Removes relapses due to AD withdrawal.
      Vöhringer and Ghaemi (present study)

      Reanalysis of Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      MA to correct for statistical floor effect
      513335Relative effect size for mild depression was 5% (HDRS < 24), 12% for moderate (24 < HDRS > 28), and 16% for severe depression (HDRS > 28).NICE criterion is met by 11.5% relative difference between AD and pbo. This analysis disproves the claim by Kirsch et al that only severe depression has clinically meaningful ES. Moderate depression also met NICE criterion.
      AD = antidepressant; CI = confidence interval; ES = effect size; FDA = US Food and Drug Administration; HDRS = Hamilton Depression Rating Scale; MA = meta-analysis; MDD = major depressive disorder; NICE = National Institute for Health and Clinical Excellence (UK); pbo = placebo; RCT = randomized clinical trial.
      In acute depression RCTs, some reviews involve reanalysis of the US Food and Drug Administration (FDA) database of RCTs conducted by pharmaceutical companies. The major nonpharmacuetical industry study is the National Institute of Mental Health (NIMH)–sponsored Sequenced Alternatives for Treatment-Resistant Depression (STAR*D) project.
      • Rush A.J.
      • Trivedi M.H.
      • Wisniewski S.R.
      • et al.
      Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.
      The pharmaceutical trials have been analyzed and reanalyzed by different authors, with the most media attention being given to the analysis by Kirsch et al.
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      Other published analyses are also important.
      • Ioannidis J.P.
      Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials?.
      Maintenance RCTs for prevention of depressive episodes have been analyzed in the Cochrane database
      • Geddes J.R.
      • Freemantle N.
      • Mason J.
      • et al.
      SSRIs versus other antidepressants for depressive disorder.
      ; most of these studies were conducted by pharmaceutical companies. The most prominent and highly marketed and cited recent study of the topic was a 2-year RCT of the antidepressant venlafaxine.
      • Kornstein S.G.
      • Kocsis J.H.
      • Ahmed S.
      • et al.
      Assessing the efficacy of 2 years of maintenance treatment with venlafaxine extended release 75-225 mg/day in patients with recurrent major depression: a secondary analysis of data from the PREVENT study.
      A recent reanalysis of the maintenance RCT studies has also examined the impact of antidepressant discontinuation, concluding that antidepressant use may cause long-term biological harm.
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
      The STAR*D study also provides data for analysis regarding maintenance prevention of depressive episodes in MDD.
      • Rush A.J.
      • Trivedi M.H.
      • Wisniewski S.R.
      • et al.
      Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.

      Patients and Methods

      We analyzed recent prominent RCTs and meta-analyses that addressed antidepressant efficacy in MDD. We examined how assessment of effect sizes could clarify the controversies surrounding acute and maintenance efficacy of antidepressants in MDD. Effect estimates given by these studies are reported, along with their 95% CIs when available.

      Results

      Eleven prominent RCTs or meta-analyses of RCTs (2006–2011) are summarized in Table I. Each study is broken down in terms of the main aspects of its study design, clinical characteristics, and outcomes. Later those results are described in more detail in 2 sections—acute and maintenance studies—and are interpreted using effect size concepts. In Table II, we report our reanalysis of the results of a prominent meta-analysis
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      to correct for a statistical floor effect in mild depression. In doing so, we discovered that the claim that antidepressants are effective only in severe depression, not in moderate or mild depression, is wrong. They are also effective in moderate depression, as explained later.
      Table IIRelative effect size difference (drug/placebo) by depression severity in Kirsch et al's meta-analysis (n trials = 35).
      DrugPlacebo
      Depression Severity (% studies in database)
      Mild = at least 1 arm (drug or placebo) is rated <24 on HDRS; moderate = at least 1 arms is rated >24<28 on HDRS; severe = at least 1 arm (drug or placebo) is rated >28 on HDRS.
      Mean Baseline HDRS ScoreMean Final Change in HDRS ScoreRelative Effect Size Measure (%)
      Relative effect size = absolute mean HDRS change/mean baseline HDRS score.
      Mean Baseline HDRS ScoreMean Final Change in HDRS ScoreRelative Effect Size Measure (%)Relative Effect Size Difference (%) (Drug-Placebo)
      Mild (23%)22.68.83923.28345
      Moderate (54%)25.610.54125.47.42912
      Severe (23%)28.7512.04228.27.22616
      HDRS = Hamilton Depression Rating Scale.
      low asterisk Mild = at least 1 arm (drug or placebo) is rated <24 on HDRS; moderate = at least 1 arms is rated >24<28 on HDRS; severe = at least 1 arm (drug or placebo) is rated >28 on HDRS.
      Relative effect size = absolute mean HDRS change/mean baseline HDRS score.

      Discussion

      Acute Depression

      Analyses of the FDA Database

      The pharmaceutical industry is obligated to submit all data, positive or negative, regarding studies of drugs that receive FDA approval. Through the Freedom of Information Act, scholars have begun to get access to these FDA records. Previous systematic reviews of such studies of antidepressants in MDD have shown that many studies with negative results have gone unpublished. Turner et al showed that approximately 94% of the published literature on antidepressants in MDD demonstrates efficacy (positive studies), but when the unpublished FDA database is included, only 51% of all such studies (published and unpublished) show positive results. The standardized effect size fell from about 0.37 to 0.31 after including the negative unpublished studies, both effects being in the mild range.
      • Turner E.H.
      • Matthews A.M.
      • Linardatos E.
      • et al.
      Selective publication of antidepressant trials and its influence on apparent efficacy.
      The same year as the above analysis, another was published by Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      with a smaller sample of the FDA database (less than half the size of the analysis by Turner's group). It confirmed an unstandardized effect size of 0.32, similar to that for the previous analyses by Turner et al. The key difference was that Kirsch et al's meta-analysis
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      focused on a clinical significance criterion set in the United Kingdom by the National Institute for Clinical Excellence (NICE): a 3-point difference on the Hamilton Depression Rating Scale (HDRS) or a 0.5 standardized effect size difference. As shown in Table I, the results of this reanalysis fell short of those effect size cutoffs, except for severe depression.
      In follow-up popularizations, the first author of that meta-analysis
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      interpreted his analysis as indicating that, in general, antidepressants do not have clinically meaningful effects in MDD. In the scientific paper, the authors were more circumspect although still critical; they attributed antidepressant benefit to only “the most extremely depressed patients,”
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      although a HDRS cutoff of 28 is not, in clinical practice, descriptive of the most extremely depressed patients. Many such patients have HDRS scores in the 30s or higher. In this meta-analysis, the drug-placebo difference varied based on severity of illness, approximating 0 at a HDRS of 24, and reaching about 3 points at a HDRS of 28. The authors note that this effect was due to changes in the response to placebo, which fell with increasing severity, rather than the response to antidepressant, which was consistent. Although they noted this finding, the authors never grappled with its meaning. It would seem that mild depression is highly responsive to placebo but severe depression is not. The authors appear to conclude that antidepressants are not more effective in severe depression, but in fact they are. The loss of placebo “response” may not be the loss of a response to anything at all; placebo response reflects, in part if not in whole, the natural history of depressive episodes. Severe depression does not go away rapidly; if it is not treated, it remains. Antidepressants treat it and are effective. The authors do not see this because they have ignored the importance of the natural history of depressive episodes in assessing treatment effects.

      Reanalysis of the FDA Meta-Analyses: Correction for a Floor Effect Disproves Claims of Antidepressant Inefficacy

      A key statistical issue in comparisons of mild versus more severe depression, when using absolute effect sizes, is a floor effect. With a lower baseline HDRS score, the same drug-placebo effect (eg, 50% decrease in scores) produces smaller absolute differences (eg, 20 to 10 HDRS points—a 10-point difference) compared with a higher baseline HDRS score (30 to 15 HDRS points—a 15-point difference). In this meta-analysis, the drug-placebo difference, when adjusted for baseline severity of illness, increased in nonstandardized effect size from 0.32 to 0.40. In other words, some of the apparent lack of benefit of antidepressants in milder depression may be an artifact of this floor effect. Kirsch et al reported this result in a table but did not comment on it.
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      Another way to address this problem is to report the relative (not absolute) drug-placebo difference, dividing absolute change by baseline severity of depression. This was not reported in Kirsch et al's analysis. For the first time, we provide such an analysis in this article.
      Table II shows the percentage differences in drug effect, with the absolute change in the drug group divided by the baseline HDRS score. Using this relative effect measure, antidepressants were somewhat less effective in milder depression (HDRS with baseline scores at ≤24) than in severe depression (baseline HDRS ≥28); the relative antidepressant versus placebo benefit increased linearly from 5% in mild depression to 12% in moderate depression to 16% in severe depression. The studies used in the meta-analysis
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      had a weighted mean baseline HDRS of 25.5.
      • Horder J.
      • Matthews P.
      • Waldmann R.
      Placebo, Prozac and PLoS: significant lessons for psychopharmacology.
      Using that baseline and the absolute improvement rates near those reported in the study (9.6 for drug, 7.8 for placebo) but widened to meet the NICE criterion of ≥3 points difference (ie, >10 for drug vs <7 for placebo), we can calculate that the NICE criterion would have been met with relative drug improvement of 39.2% (10/25.5) versus relative placebo improvement of 27.5% (7/25.5), for a drug-placebo relative difference of 11.7%. With this definition of the NICE criterion, antidepressants still do not meet that definition in mild depression (HDRS < 24), but they do meet it for both moderate (HDRS 24–28) and severe (HDRS >28) depression.
      In this reanalysis, we used the same severity cutoffs as used by the authors of the meta-analysis: HDRS scores <24, 24 to 28, and >28. We labeled these 3 groups as mild, moderate, and severe, respectively. Despite analyzing their data in these 3 groupings, the authors of the meta-analysis claimed they had used the American Psychiatric Association's criteria for severity of symptoms (based on HDRS scores): mild (HDRS = 8–13), moderate (HDRS = 14–18), severe (HDRS = 19–22), and very severe (HDRS >22).
      In so doing, they ignored the fact that symptoms differ from episodes: the typical major depressive episode (MDE) produces HDRS scores of at least ≥18. Thus, by using symptom criteria, all MDEs are by definition severe or very severe. Clinicians know that some patients meet MDE criteria and are still able to work; indeed those around them frequently do not even recognize that such a person is clinically depressed. Other patients are so severely depressed that they function poorly at work, and their companions recognize that something is wrong. Some clinically depressed patients cannot work at all, and still others cannot get out of bed for weeks or months on end. Clearly, there are gradations of severity within MDEs, and the entire debate in the meta-analysis being discussed here is about MDES, not depressive symptoms, since all patients had to meet MDE criteria in all the studies included in the meta-analysis (conducted by pharmaceutical companies for FDA approval for treatment of MDEs).
      The question, therefore, is not about severity of depressive symptoms but the severity of depressive episodes, assuming that someone meets Diagnostic and Statistical Manual of Mental Disorders (Fourth Edition) (DSM-IV) criteria for a MDE. On that question, a number of prior studies have examined the matter with the HDRS and with other depression rating scales, and the 3 groupings shown in Table II correspond rather closely to validated and replicated definitions of mild (HDRS <24), moderate (HDRS 24–28), and severe (HDRS >28) MDEs.
      • Schmitt A.B.
      • Bauer M.
      • Volz H.P.
      • et al.
      Differential effects of venlafaxine in the treatment of major depressive disorder according to baseline severity.
      • Montgomery S.A.
      • Lecrubier Y.
      Is severe depression a separate indication? ECNP Consensus Meeting September 20, 1996, Amsterdam. European College of Neuropsychopharmacology.
      • Feinberg M.
      • Carroll B.J.
      • Smouse P.E.
      • Rawson S.G.
      The Carroll rating scale for depression III. Comparison with other rating instruments.
      In other words, if one corrects for the statistical floor effect (which was also shown in the data reported by the authors
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      in a regression model correcting for baseline severity of illness), then the claim that antidepressants are effective only in the most extreme depressive conditions is disproven. Antidepressants are effective in moderate as well as severe depression.
      In sum, one can revise the conclusions of Kirsch et al after considering the analysis presented here. Instead of antidepressants being generally ineffective except in “the most severely depressed patients,”
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      the reality is that antidepressants are generally effective except in the mildest depressive episodes.

      Other Reanalyses of the FDA Meta-Analyses: Correction of Pooling Methods Increases Effect Size to Clinical Significance

      Horder et al
      • Horder J.
      • Matthews P.
      • Waldmann R.
      Placebo, Prozac and PLoS: significant lessons for psychopharmacology.
      also reanalyzed the dataset in the above meta-analysis
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      and noted 2 errors in calculation of pooled effect size differences. In the original meta-analysis, the authors pooled all the antidepressant effect sizes (drug effect pre- and posttreatment), and then they pooled all the placebo effect sizes (pre- and posttreatment). They then subtracted these 2 pooled effect sizes. This is statistically incorrect. Pooled differences should be assessed within each study to maximally incorporate the benefits of randomization within each study. Thus, for a first study, the difference between drug and placebo should be calculated; for a second study, the same difference should be calculated, and so on. The pooled effect size for the meta-analysis should be the sum of each effect size difference between drug and placebo for each study, divided by the number of studies. Horder et al corrected the calculation using this approach to pooling effect size differences. They also used the absolute effect size difference on the HDRS, since all the studies used the same scales. They correctly noted that there is no need to use a standardized effect size measure (eg, Cohen's d) when all studies use the same outcome (HDRS); standardized effect sizes are used in an attempt to equalize different outcomes (eg, HDRS compared with different depression rating scales). By standardizing, the mathematical manipulations introduced may alter one's results somewhat, making them both less interpretable and less valid.
      Finally, Horder et al used the most valid measure of meta-analytic effect—the random effects model, as opposed to fixed effects, as in the original review.
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      Fixed effects models assume that all studies have similar variablities; when one is comparing studies of different drugs in different patient populations that vary in severity of illness, which the authors showed was an important predictor of response, then the fixed effects assumption is not valid. The random effects assumption includes the idea that studies differ from one another in important respects. Fixed effects models only correct for sample size and assume no other kinds of error, whereas random effects models introduce a second correction for presumed error.
      When making these 3 corrections—(1) pooling drug-placebo differences study by study, (2) using the absolute HDRS effect size difference only, and (3) using a random effects model for the meta-analytic summary—Horder et al found a much higher effect size (HDRS difference of 2.70, quite near the NICE cutoff of 3), as opposed to the clearly low HDRS difference effect size of 1.80 in the original meta-analysis.

      Other Reanalyses

      Fountoulakis and Möller
      • Fountoulakis K.N.
      • Möller H.J.
      Antidepressant drugs and the response in the placebo group: the real problem lies in our understanding of the issue.
      also reanalyzed Kirsch et al's meta-analysis.
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      They made one correction, a weighting of the mean difference in each study for sample size. In so doing, they found a slightly larger effect size of 2.18 but not one large enough to meet the NICE criterion. They also reported that when examined by drug type, venlafaxine and paroxetine met the NICE criterion of 3-point improvement, but nefazodone and fluoxetine did not. We would add that the nefazodone studies all involved mild depression (no baseline HDRS >25), and thus lack of benefit may reflect mildness of depression per se (when natural history leads to rapid recovery, as discussed below), rather than inefficacy of the drug itself.
      Other discussions of these meta-analyses have included a commentary by Ioannidis,
      • Ioannidis J.P.
      Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials?.
      who, like the group critical of antidepressants,
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      concluded that these agents are largely ineffective. Ioannidis added a quantitative simulated analysis of a situation in which, if one assumes that the true effect size is small (eg, 0.20), then, with moderate or larger variability (due to small sample sizes), reported effect sizes would always be larger than the real effect size of 0.20. In other words, most effect sizes are probably inflated estimates of the real effect sizes, especially if studies are not large. Thus, if the debate is over whether 2.70 is close enough to the NICE threshold of 3 points, compared with 1.80, Ioannidis suggested that we should adjust conceptually for somewhat lower effect sizes than these exact numbers.
      His review has been challenged by Davis et al.
      • Davis J.M.
      • Giakas W.J.
      • Qu J.
      • et al.
      Should we treat depression with drugs or psychological interventions? A reply to Ioannidis.
      They emphasized that however one analyzes the antidepressant literature, the effect size of benefit with antidepressants over placebo is not 0. An effect size of 0.31 is a small effect size, but it is still an effect in some people. They pointed out that oncology studies, for instance, support the use of treatments with much smaller effect sizes because the conditions are otherwise terminal. They emphasized that because of the notable morbidity and mortality of severe depression, at least, any drug benefit is valuable. They based their conclusions on a narrative review of prior meta-analyses and major RCTs; their discussion of maintenance efficacy studies was uncritical, as will be discussed later.

      STAR*D

      The sometimes rancorous debates about pharmaceutical industry studies are limited by the fact that they are pharmaceutical industry studies. They were conducted for FDA registration and to market drugs for profit, not to learn the truth in any economically disinterested fashion. This is why the huge NIMH-sponsored, double-blind, randomized STAR*D study is of major importance in addressing the question of antidepressant efficacy. It was conducted by academic sites that carefully organized and conducted their studies to meet NIMH standards, not by for-profit research groups that tried to meet pharmaceutical industry standards. The latter setting often involves paying patients to participate, sometimes at rather high rates, and there are well-known concerns about the misrepresentation of data to meet recruitment goals. Further, the FDA database involves pooling many different studies with different drugs in different study subjects, sometimes in different countries. The heterogeneity introduced by such differences is the bane of such large meta-analyses. Such heterogeneity is a type of confounding bias, making the results of these huge meta-analyses somewhat doubtful, since the pooled results of studies are not randomized. Only the data within each study is randomized and thus free of confounding bias.
      This heterogeneity of data is not a minor issue, but it is one that many of the debaters ignore. A meta-analysis can never be taken at face value, because it is not randomized; meta-analyses are always observational and thus biased to a greater or lesser degree. All things being equal, a large single RCT is more valid than a meta-analysis, because the former is randomized and the latter is not.
      Thus, a single huge RCT, such as STAR*D, is more valid, based on confounding bias concerns, than the huge FDA meta-analyses of multiple RCTs. The main limitation of STAR*D is the absence of placebo controls, which means it cannot be used to determine definitively whether antidepressants work better than placebo. However, if antidepressants were nothing but placebos, we would legitimately expect rather low response rates in STAR*D, especially in severe depression.
      The main purpose of STAR*D was to learn which antidepressant treatments were effective in those who failed to remit initially with a single antidepressant trial. In stage 1, the antidepressant chosen was citalopram, a typical serotonin reuptake inhibitor, and it was given open label initially to identify nonresponders who were then randomized to various steps of other treatments. Perhaps not too surprisingly, initial response to citalopram was approximately 50% and initial remission about 30%.
      • Trivedi M.H.
      • Rush A.J.
      • Wisniewski S.R.
      • et al.
      STAR*D Study Team
      Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice.
      The remaining subjects, who were all non-responders to stage 1, were then randomized to 3 sequential stages of treatment. They continued down the tree of options if they failed to remit in any phase for as long as they were willing to stay in the randomized studies. In the second stage of treatment (either switching to a different antidepressant or augmenting with one), a similar rate of acute response was seen (about 50%). However, by stages 3 and 4, despite using agents previously shown to be most effective (eg, tricyclic antidepressants and monoamine oxidase inhibitors or lithium augmentation), acute response rates ranged around 20%. Further, by stages 3 and 4, remission and response rates were about the same (ie, a better response was not seen with a more liberal definition of improvement than used for remission). As the authors of STAR*D comment, these results can be read as good news in the sense that one can conclude, with multiple phases of treatment, that about 60% or so of patients will respond acutely (>50% improvement in depressive symptoms).
      • Rush A.J.
      • Trivedi M.H.
      • Wisniewski S.R.
      • et al.
      Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.
      This seems much higher than one would expect from natural history.
      It should be noted that after the initial citalopram treatment phase, STAR*D was a double-blind, randomized study (though without a placebo arm). All stages from 2 onward involved randomized, not observational, data, and the results are as valid as any standard, randomized clinical trial.
      The results are not definitive, however, given the FDA database analyses, since the mean initial HDRS score was 21.8 in STAR*D, consistent with mild depression. In that group, one would expect much spontaneous recovery because of natural history or the nonspecific benefits of a placebo response. This possibility cannot be ruled out.

      Maintenance Efficacy of Antidepressants in Major Depressive Disorders

      Biases of the Enriched Design for Maintenance Efficacy

      Before examining analyses of maintenance studies in MDD, it is useful to understand how such studies are designed to appreciate why they are mostly biased in favor of antidepressants.
      Most maintenance studies of antidepressants begin, before the study begins, with patients who have an acute MDE and are treated with the antidepressant being studied. Patients who respond to the antidepressant are entered into the maintenance study, but those who do not respond or do not tolerate the antidepressant are excluded. Thus, the study is already biased in favor of the antidepressant. Then patients are followed for 1 to 2 years. The majority of patients relapse in the first 6 months of follow-up, however. This design does not prove maintenance efficacy, because the maintenance phase of treatment in MDD does not begin until 1 year after the acute episode ends, which is when the natural remission of an acute depressive episode occurs.
      • Kraepelin E.
      Manic-Depressive Insanity and Paranoia.
      • Frank E.
      • Kupfer D.J.
      • Perel J.M.
      • et al.
      Three-year outcomes for maintenance therapies in recurrent depression.
      • Goodwin F.
      • Jamison K.
      Manic Depressive Illness.
      Thus, in the depression literature, there is a clear consensus that 1 year or longer is the relevant time frame to assess the prevention of new episodes. Even if not everyone agrees on the 1-year period, it would be reasonable to say that at least ≥6 months after the acute episode is needed to assess maintenance efficacy. Most maintenance RCTs fail to pass this simple test.
      This problem has been much discussed in the bipolar disorder literature,
      • Goodwin F.
      • Jamison K.
      Manic Depressive Illness.
      and we have related it previously to the maintenance studies of neuroleptics in bipolar disorder.
      • Goodwin F.K.
      • Whitham E.A.
      • Ghaemi S.N.
      Maintenance treatment study designs in bipolar disorder: do they demonstrate that atypical neuroleptics (antipsychotics) are mood stabilizers?.
      The early literature on lithium included both prophylaxis and relapse prevention methodologies. In the prophylaxis design, “all comers” were included in the study. Any patient who was euthymic, no matter how that person got well, was eligible to be randomized to drug versus placebo or control, including those with recent manic or depressive episodes. In the relapse prevention design, typically only patients who responded acutely to the drug being studied were then eligible to enter the randomized maintenance phase. Those who responded to the drug were then randomized to stay on the drug or be withdrawn from it (usually abruptly, sometimes with a taper) and switched to placebo.
      The prophylactic and relapse prevention designs obviously do not address the same questions about drug efficacy. In the lithium studies in which the relapse prevention design was used (ie, only initial lithium responders to acute treatment were included), there was evidence in the placebo group of lithium withdrawal following acute treatment.
      • Suppes T.
      • Baldessarini R.J.
      • Faedda G.L.
      • Tohen M.
      Risk of recurrence following discontinuation of lithium treatment in bipolar disorder.
      • Cavanagh J.
      • Smyth R.
      • Goodwin G.M.
      Relapse into mania or depression following lithium discontinuation: a 7-year follow-up.
      By design, those who reach the maintenance phase and are treated with placebo are in fact persons who responded acutely to the study drug (lithium) and then were abruptly discontinued. Thus, if the placebo relapse rate is very high and almost exclusively limited to the first 1 to 2 months after study initiation, then one is observing a withdrawal effect involving a relapse back into the same acute episode that had just been treated rather than a new episode. The relapse prevention design methodology confounds prevention of relapse back into the index episode with prevention of a new episode.
      Besides the problem of withdrawal relapse, a key aspect of the relapse prevention design is that it is definitely biased in comparison with active controls and it is very likely biased against placebo as well. Although such studies are randomized, they are only randomized after preselecting all subjects to be randomized as responsive to only 1 of the 2 arms of the study. Thus, randomization is, in effect, instituted after the study has already been biased in favor of 1 of the 2 treatments. To put it simply, if some people like chocolate ice cream and others like vanilla and we preselect only those who like chocolate ice cream to be randomized again to receive chocolate ice cream or vanilla ice cream, we will find that most chocolate ice cream lovers will continue to prefer chocolate ice cream. This does not prove that chocolate ice cream is superior to vanilla ice cream.
      The same principle applies to studies in which patients are preselected to respond to the study drug and later randomized to stay on the study drug or receive placebo. Again, the study would be biased in favor of the study drug and would not prove the inherent superiority of the study drug over placebo. A truly randomized study would have to either preselect subjects to be responsive to both treatments being studied or, as in the traditional prophylaxis study, make no preselection at all.
      These inherent biases of the enriched maintenance design are key to analyzing meta-analyses of the maintenance antidepressant efficacy literature. None of those reviews, save one, addresses the relevance of the enriched design, and thus they draw incorrect conclusions, both for and against antidepressants.

      Maintenance Randomized Clinical Trials

      The standard review of the maintenance efficacy of antidepressants often involves reference to the Cochrane collaboration meta-analysis of published studies. In that report, 10 studies of serotonin reuptake inhibitors (n = 2080) and 15 of tricyclic antidepressants (n = 881), mostly with 1-year follow-up, showed maintenance benefit versus placebo.
      • Geddes J.R.
      • Freemantle N.
      • Mason J.
      • et al.
      SSRIs versus other antidepressants for depressive disorder.
      The longest follow-up with modern antidepressants was 2 years with venlafaxine.
      • Geddes J.R.
      • Freemantle N.
      • Mason J.
      • et al.
      SSRIs versus other antidepressants for depressive disorder.
      An obvious problem with simply stating the results this way is that this meta-analysis does not address the issue of publication bias. If the acute antidepressant studies are any indicator, it is likely that some negative results from maintenance studies with antidepressants in MDD exist but are unpublished, and they would reduce this reported effect size.
      A more important issue is the problem of enriched maintenance designs, which bias studies in favor of drug enrichment (or placebo, if analyses are enriched in the opposite direction, as discussed later). The only analysis of RCTs of antidepressants in MDD that has addressed the problem of enrichment is a recent paper by Briscoe and El-Mallakh.
      • Briscoe B.E.
      • El-Mallakh R.S.
      The evidence for the long-term use of antidepressants as prophylaxis against future depressive episodes.
      They address the problem of enrichment by limiting data analysis to ≥6 months after the acute depressive episode. By so doing, they exclude those who relapsed soon after the maintenance study started, right after the end of the acute episode. Those who received antidepressant and were switched to placebo would relapse rapidly in the first few months of the maintenance treatment. This discontinuation effect is an artifact of the enriched design and would not, in this view, demonstrate true recurrence of a new episode, but rather immediate relapse into the same episode that had been present in prior weeks. Only 5 RCTs provided data on relapse rates before and after 6 months. Limiting analyses to those studies, the researchers found that, given the biases of the enriched design, the majority of relapses (about two thirds) occurred in the first 6 months of follow-up. These were not new episodes of depression but withdrawal relapse into the same acute episode that had just occurred a few weeks or months earlier, before the maintenance study began. In the one third of relapses occurring after 6 months, and thus testing the proposition of whether new episodes were truly being prevented, 4 of 5 studies found no benefit with antidepressants over placebo.

      The Venlafaxine PREVENT Maintenance Study

      Many authors cite a recent, long, large study of venlafaxine as evidence for antidepressant maintenance efficacy in MDD.
      • Kornstein S.G.
      • Kocsis J.H.
      • Ahmed S.
      • et al.
      Assessing the efficacy of 2 years of maintenance treatment with venlafaxine extended release 75-225 mg/day in patients with recurrent major depression: a secondary analysis of data from the PREVENT study.
      This study purports to show major benefits with venlafaxine for maintenance treatment of MDD, but it really reflects what we might call super-enrichment. The study repeatedly picks out those who respond to venlafaxine and re-randomizes them to venlafaxine or placebo, thus repeatedly selecting a smaller and smaller group of highly venlafaxine-responsive patients. By 2 years, this small group is indeed very responsive to venlafaxine, but the findings from this group are hardly generalizable to a new patient who might be prescribed venlafaxine.
      The specific data are as follows: In that study, 1096 MDD patients initially received venlafaxine or fluoxetine for acute depression. A total of 715 responders were enrolled in 6-month blind continuation on the same treatment. After 6 months, 258 (35.9%, 258/715) of those acute responders remained well and entered maintenance phase A for 1-year treatment (randomized to venlafaxine vs placebo).
      • Kocsis J.H.
      • Thase M.E.
      • Trivedi M.H.
      • et al.
      Prevention of recurrent episodes of depression with venlafaxine ER in a 1-year maintenance phase from the PREVENT Study.
      After 1 year in maintenance phase A, 131 responders (83 venlafaxine, 48 placebo) entered phase B for a second year of maintenance (venlafaxine responders were re-randomized to venlafaxine versus placebo; placebo responders stayed on placebo, and fluoxetine responders stayed on fluoxetine).
      In the first year of maintenance treatment for the 258 responders, 23% of the venlafaxine-treated patients relapsed versus 42% of those receiving placebo. Thus, 77% of the venlafaxine group (n = 83) stayed well for 1 year after already preselecting those who had stayed well for 6 months (n = 258), who were selected after initially responding to treatment for an acute episode (n = 715), as described in the previous paragraph. This is only 11.6% (83/715) of initial sustained responders.
      Only 12.5% of placebo responders at 1 year relapsed at 2 years, but in re-randomized venlafaxine responders (another super-enrichment on top of all the prior enriched selection phases), 44.8% of the placebo group relapsed at 2 years versus 8.0% on venlafaxine. Or, as the pharmaceutical industry marketing emphasized, 92% of venlafaxine patients remained well at 2 years' follow-up. This 92% seems like a huge number, but because of super-enrichment it represents the repeated selection of a tiny group of patients who were highly responsive to venlafaxine. It is 92% of the 11.6% mentioned earlier (those who responded at 1 year), which is 10.7% of the original sustained responders. Once dropouts are included, the number of patients treated at 2 years, after the initial sample of >1000 patients, were 15 using placebo and 31 using venlafaxine—4.2% of the original sample.

      Antidepressant Discontinuation Meta-Analysis

      The most recent review of the maintenance MDD literature represents a unique analysis.
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
      Andrews et al essentially conducted an enriched study of placebo response, that is, they selected the data for analysis based on a sample enriched for placebo responders and biased against those who responded to drugs. They then concluded that drugs were ineffective and even harmful. All they really proved—once again—is that the enriched maintenance design is biased against whatever one wants to bias it against.
      This analysis is the converse of the standard enriched design maintenance study, as described previously, which is enriched for drug response and biased against placebo response. The same limitations apply in both cases: enrichment does not prove the inefficacy or harm of the treatment that is not being enriched, nor does it prove the efficacy or benefit of the treatment that is being enriched.
      In this review, Andrews et al
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
      collected 7 studies of maintenance treatment with antidepressants versus placebo in which initial acute treatment was provided with the 2 arms; in these 7 studies, the maintenance phase involved continuation of those patients who had responded to placebo acutely. In those acute placebo responders, relapse in the maintenance phase was (not surprisingly) uncommon (24.7%). In contrast, 39 trials involved acute treatment with antidepressant versus placebo, in which the reviewers selected patients who responded to antidepressants acutely and then were randomized to receive placebo in maintenance treatment. In this group, which reflected a discontinuation of antidepressant after acute response, there was a 42.1% relapse.
      The authors interpreted these results as indicating harm with the use of antidepressants—results that they speculatively relate to animal data on monoaminergic effects of these agents. They concluded that the biological effects of antidepressants actually increase the risk of relapse in long-term treatment, compared with the risk of no treatment (placebo). This interpretation ignores the problems of the enriched design, and, as a result, this kind of analysis highlights the importance of always comparing treatment results to what happens in the natural history of an illness.
      This meta-analysis
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
      enriches the results for placebo response. The patients treated acutely who responded to placebo stayed on placebo; the patients treated acutely who responded to antidepressants were taken off antidepressants. One should ask why these placebo responders responded to placebo. Did they actually respond to placebo, in the sense that the inert pill directly produced a response, or was placebo a stand-in for natural recovery–spontaneous remission, or part of the natural history of recurrent, episodic depression?
      The last is a possibility for part, if not all, of the placebo “response.” More than a century of natural history research, especially before the treatment era in past decades, has established the fact that recurrent unipolar depression follows an episodic course, in which there are periods of acute symptoms and periods of natural remission.
      • Davis J.M.
      • Giakas W.J.
      • Qu J.
      • et al.
      Should we treat depression with drugs or psychological interventions? A reply to Ioannidis.
      • Trivedi M.H.
      • Rush A.J.
      • Wisniewski S.R.
      • et al.
      STAR*D Study Team
      Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice.
      • Kraepelin E.
      Manic-Depressive Insanity and Paranoia.
      During periods of natural remission, patients stay well, often for years, without any treatment. The recovery of some patients on placebo, in those 7 studies, may well reflect natural cycling out of acute episodes in unipolar depression. Once patients have cycled out of acute episodes, they are in natural remission, which, in the case of recurrent unipolar depression, usually involves >1 year of remission before the next depressive episode.
      • Kraepelin E.
      Manic-Depressive Insanity and Paranoia.
      In the 7 placebo maintenance response studies, no study exceeded 12 months of follow-up; in reading the appendix attached to the meta-analysis, it appears that the mean duration of follow-up was <2 months in 6 of the 7 studies (range 1.4–1.9 months).
      In other words, the lack of relapse really means that a patient improved spontaneously from acute depression in a 2-month study (the usual duration of acute depression studies) and then remained well for another 2 months. This is not robust evidence of long-term stability on placebo but rather an indication that when spontaneous remission occurs from acute depression, it lasts at least 2 months (and indeed usually up to 1 year) without any treatment.
      In contrast, in the antidepressant discontinuation studies analyzed, all patients responded in the acute treatment phase (usually 2 months in duration), and then 42% relapsed during maintenance treatment after the antidepressant was discontinued. One might ask whether serotonin withdrawal syndrome, which can mimic depressive episodes, occurred in some cases. Aside from that issue, however, a century of natural history research has led to a clear consensus that the mean duration of a typical depressive episode in unipolar depression is 6 to 12 months.
      • Davis J.M.
      • Giakas W.J.
      • Qu J.
      • et al.
      Should we treat depression with drugs or psychological interventions? A reply to Ioannidis.
      • Trivedi M.H.
      • Rush A.J.
      • Wisniewski S.R.
      • et al.
      STAR*D Study Team
      Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice.
      • Kraepelin E.
      Manic-Depressive Insanity and Paranoia.
      If a patient is treated to recovery at 2 months and then the treatment is stopped, such a patient will relapse into the mood episode rapidly, because the 6- to 12-month period of the biological persistence of a mood episode has not yet elapsed. This finding has been reported repeatedly with antidepressants in depression and with neuroleptics in mania.
      • Goodwin F.K.
      • Whitham E.A.
      • Ghaemi S.N.
      Maintenance treatment study designs in bipolar disorder: do they demonstrate that atypical neuroleptics (antipsychotics) are mood stabilizers?.
      In sum, this creative analysis of the maintenance MDD literature suffers from a complete lack of awareness of the impact of the enriched design; the analysis is enriched for placebo response and thus biased against antidepressant effect. The most conceptually parsimonious and empirically well-supported interpretation of these findings, based on extensive clinical literature in human beings (as opposed to speculative biological extrapolations from animal studies), would be to view them as a result of the natural history of depression, not as a specific harm from antidepressants or a special benefit from placebo.

      Maintenance Data in STAR*D

      Although STAR*D is mainly reported in terms of acute data, it also provides maintenance data, which may be the best evidence to date on long-term efficacy with antidepressants in unipolar depression. Further, STAR*D was designed to be generalizable to the real world of complex, comorbid, recurrently depressed patients, as opposed to the cleaner populations studied in most RCTs (designed for FDA registration by the pharmaceutical industry).
      As noted previously, STAR*D is a double-blind, randomized study; all the maintenance data after the first phase of treatment (ie, with the dozen or so antidepressant treatments given besides citalopram) involve randomized, not observational, data.
      The basic results are as follows: Of subjects who responded acutely or remitted to antidepressants in STAR*D, only about one half stayed well at 1 year (sustained remission). In other words, by preselecting patients who have acute benefit with antidepressants, as noted earlier, one half will maintain benefit. Since 50% get acute benefit, and 50% of that group have sustained maintenance benefit, only 25% of the overall sample has long-term maintenance remission with antidepressants in unipolar depression.
      • Rush A.J.
      • Trivedi M.H.
      • Wisniewski S.R.
      • et al.
      Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.
      Based on STAR*D findings, the long-term benefit with antidepressants in unipolar depression appears to be much less than has often been assumed.

      Objections to Our Critique of Enriched Maintenance Designs

      The previous critique of enriched maintenance designs is neither widely known nor generally accepted. It is novel, rarely stated, and—when stated—strongly opposed by many researchers involved with maintenance studies in psychopharmacology.
      There has not been much published discussion of this topic, but one objection that could be raised is that the enriched design is not biased because those who respond acutely to a drug treatment are both “true drug responders” and “placebo drug responders,” meaning that some of them would have responded to placebo had they been given placebo. Thus, the design is not biased solely toward the study drug. This objection would make sense only if all patients were equally likely to respond to drugs or placebo; if 50% of patients “really” responded to drug (true drug response) and 50% would have responded to placebo had it been given (placebo drug response), then a maintenance randomization of those acutely responsive patients to drug versus placebo would be valid. Ironically, this would be the case only if the critique of Kirsch et al
      • Kirsch I.
      • Deacon B.J.
      • Huedo-Medina T.B.
      • et al.
      Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
      is correct, that is, if antidepressants are not more effective than placebo for acute depression.
      If antidepressants are more effective than placebo for acute depression in most patients, as we believe we showed earlier, then the percentage of true drug responders should be higher than the percentage of those who would have responded to placebo anyway (placebo drug responders). In a hypothetical group of acutely depressed patients treated with antidepressant X and later randomized to a maintenance study of X versus placebo, the reality is that there would not have been a 50–50 split between true drug responders and placebo drug responders before maintenance randomization. The split would be 60–40 or 70–30 or even higher in favor of drug X. In other words, because antidepressants are better than placebo acutely, enrichment for acute efficacy before maintenance RCTs is indeed biased in favor of antidepressants as opposed to later treatment with placebo. Enrichment entails bias.
      Interestingly, many psychiatric researchers appear to understand this critique fully as applied to the maintenance meta-analysis by Andrews et al
      • Andrews P.W.
      • Kornstein S.G.
      • Halberstadt L.J.
      • et al.
      Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
      ; they appreciate that such an analysis entails “apples and oranges,” picking out placebo responders and comparing how they fared later when continued on placebo versus choosing drug responders and comparing how they later did when switched to placebo. Placebo responders are different from drug responders, it is said. We agree. All placebo responders, by definition, respond to placebo, whereas probably only some would respond to drugs. Thus, such analyses are biased in favor of placebo response.
      Although this enriched method (a species of selection bias that is unique to maintenance clinical trial design
      • Goodwin F.K.
      • Whitham E.A.
      • Ghaemi S.N.
      Maintenance treatment study designs in bipolar disorder: do they demonstrate that atypical neuroleptics (antipsychotics) are mood stabilizers?.
      ) is rejected by many in our field in relation to the claim that placebo is as good as or better than an antidepressant, the same method is used to assert that antidepressants are more effective than placebo. The reason for such selectivity about accepting or rejecting the same research methodology is not entirely clear.

      Conclusions

      Numerous reviews and meta-analyses of the antidepressant literature in MDD, both acute and maintenance, appear to make larger claims than their research methods allow. Specifically, based on the available FDA database analyses, it is false to claim that antidepressants are, in a general sense, ineffective in acute depressive episodes. The claim that they lack such benefits is disproved by standard valid methods of pooling effect size differences and by using appropriate meta-analytic models. Correction of those effect size difference for a floor effect, so that relative (instead of absolute) effect size differences are calculated, shows that antidepressant benefit is seen not only in severe depression, but also in moderate depression. These analyses confirm lack of benefit of antidepressants over placebo in mild depression. One can turn around the attention-getting conclusions of the review by Kirsch et al: Instead of concluding that antidepressants are ineffective acutely except for the most extreme depressive episodes, correction for the statistical floor effect proves that antidepressants are effective acutely except for the mildest depressive episodes. The claim that antidepressants are completely ineffective, or even harmful, in maintenance treatment studies involves an unawareness of the enriched design effect, which has been used to analyze placebo efficacy. The same problem exists for the standard interpretation of those studies, however; they do not prove antidepressant efficacy either, since they are biased in favor of antidepressants. In sum, in trying to make an objective and statistically valid assessment, we conclude that antidepressants are effective for acute depressive episodes that are moderate to severe but not mild. For maintenance efficacy, the research designs used have been biased in their favor, and it would seem more objective to conclude that long-term antidepressant efficacy is not proved but neither is the conclusion that antidepressants are harmful.

      Conflict of Interest Statement

      In the past 12 months, Dr. S. Nassir Ghaemi has received a research grant from the NIMH and from Pfizer, Inc . He provided one-time research consultations to Pfizer, Inc. and Sunovion, Inc. Neither he nor his family hold equity positions in these or other companies. Dr. Paul A. Vohringer has no financial disclosures of potential conflicts of interest to disclose.

      Acknowledgments

      This work was supported partly by grant 5R01MH078060 from the National Institute of Mental Health (S.N.G.) and a scholarship from the National Commission for Scientific and Technological Research (CONICYT) of the government of Chile (P.A.V.). The authors acknowledge the helpful input of Barney Carroll MD and Maurizio Fava MD for part of the manuscript. Both authors contributed equally to the conduct of the study and creation of the manuscript.

      References

        • Rush A.J.
        • Trivedi M.H.
        • Wisniewski S.R.
        • et al.
        Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report.
        Am J Psychiatry. 2006; 163: 1905-1917
        • Kirsch I.
        • Deacon B.J.
        • Huedo-Medina T.B.
        • et al.
        Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration.
        PloS Med. 2008; 5: e45
        • Ioannidis J.P.
        Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials?.
        Philos Ethics Humanit Med. 2008; 3: 14
        • Geddes J.R.
        • Freemantle N.
        • Mason J.
        • et al.
        SSRIs versus other antidepressants for depressive disorder.
        Cochrane Database Syst Rev. 2006; (CD001851)
        • Kornstein S.G.
        • Kocsis J.H.
        • Ahmed S.
        • et al.
        Assessing the efficacy of 2 years of maintenance treatment with venlafaxine extended release 75-225 mg/day in patients with recurrent major depression: a secondary analysis of data from the PREVENT study.
        Int Clin Psychopharmacol. 2008; 23: 357-363
        • Andrews P.W.
        • Kornstein S.G.
        • Halberstadt L.J.
        • et al.
        Blue again: perturbational effects of antidepressants suggest monoaminergic homeostasis in major depression.
        Front Psychol. 2011; 2: 159
        • Turner E.H.
        • Matthews A.M.
        • Linardatos E.
        • et al.
        Selective publication of antidepressant trials and its influence on apparent efficacy.
        N Engl J Med. 2008; 358: 252-260
        • Horder J.
        • Matthews P.
        • Waldmann R.
        Placebo, Prozac and PLoS: significant lessons for psychopharmacology.
        J Psychopharmacol. 2010 Jun 22; ([Epub ahead of print])
      1. Rush A.J. First M.B. Blacker D. Handbook of Psychiatric Measures. American Psychiatric Press, Washington, DC2000
        • Schmitt A.B.
        • Bauer M.
        • Volz H.P.
        • et al.
        Differential effects of venlafaxine in the treatment of major depressive disorder according to baseline severity.
        Eur Arch Psychiatry Clin Neurosci. 2009; 259: 329-339
        • Montgomery S.A.
        • Lecrubier Y.
        Is severe depression a separate indication?.
        Eur Neuropsychopharmacol. 1999; 9: 259-264
        • Feinberg M.
        • Carroll B.J.
        • Smouse P.E.
        • Rawson S.G.
        The Carroll rating scale for depression.
        Br J Psychiatry. 1981; 138: 205-209
        • Fountoulakis K.N.
        • Möller H.J.
        Antidepressant drugs and the response in the placebo group: the real problem lies in our understanding of the issue.
        J Psychopharmacol. 2011 Sep 17; ([Epub ahead of print])
        • Davis J.M.
        • Giakas W.J.
        • Qu J.
        • et al.
        Should we treat depression with drugs or psychological interventions?.
        Philos Ethics Humanit Med. 2011; 6: 8
        • Trivedi M.H.
        • Rush A.J.
        • Wisniewski S.R.
        • et al.
        • STAR*D Study Team
        Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice.
        Am J Psychiatry. 2006; 163: 28-40
        • Kraepelin E.
        Manic-Depressive Insanity and Paranoia.
        (Barclay RM, trans)in: Robertson G.M. E & S Livingstone, Edinburgh, UK1921
        • Frank E.
        • Kupfer D.J.
        • Perel J.M.
        • et al.
        Three-year outcomes for maintenance therapies in recurrent depression.
        Arch Gen Psychiatry. 1990; 47: 1093-1099
        • Goodwin F.
        • Jamison K.
        Manic Depressive Illness.
        2nd ed. Oxford University Press, New York, NY2007
        • Goodwin F.K.
        • Whitham E.A.
        • Ghaemi S.N.
        Maintenance treatment study designs in bipolar disorder: do they demonstrate that atypical neuroleptics (antipsychotics) are mood stabilizers?.
        CNS Drugs. 2011; 25: 819-827
        • Suppes T.
        • Baldessarini R.J.
        • Faedda G.L.
        • Tohen M.
        Risk of recurrence following discontinuation of lithium treatment in bipolar disorder.
        Arch Gen Psychiatry. 1991; 48: 1082-1088
        • Cavanagh J.
        • Smyth R.
        • Goodwin G.M.
        Relapse into mania or depression following lithium discontinuation: a 7-year follow-up.
        Acta Psychiatr Scand. 2004; 109: 91-95
        • Briscoe B.E.
        • El-Mallakh R.S.
        The evidence for the long-term use of antidepressants as prophylaxis against future depressive episodes.
        in: Oral presentation at the American Psychiatric Association Annual Meeting, New Orleans, LaMay 22–26, 2010
        • Kocsis J.H.
        • Thase M.E.
        • Trivedi M.H.
        • et al.
        Prevention of recurrent episodes of depression with venlafaxine ER in a 1-year maintenance phase from the PREVENT Study.
        J Clin Psychiatry. 2007; 68: 1014-1023