A Critical Analysis of “What We Know” Projects Systematic Review on Transition Interventions
Introduction:
The What We Know Project at Cornell University published a literature review that has been widely cited and claimed that medical transition significantly improves mental health in transgender individuals. While the review has had high visibility and impact in academic, clinical, and policy conversations, it has not been peer-reviewed and has raised concerns regarding it’s methodology, study selection, and the accuracy of its conclusions.
Methods:
This paper conducts a comprehensive methodological audit and replication of the review. The authors re-evaluated each study cited by the review to determine if they qualified for the review's inclusion and whether the findings were categorized correctly. Each study was classified based on five reported mental health outcomes—depression, anxiety, suicidality, quality of life, and general psychopathology—into positive, null, mixed, or negative outcomes. The analysis also included peer-reviewed studies omitted from the review and analyzed findings by treatment type (hormones vs surgery), study design (cross-sectional vs longitudinal), and by demographic group (FTM vs MTF).
Results:
The literature analyzed in the review largely fails to demonstrate that medical transition reliably improves mental health outcomes. After correcting for misclassifications and adding omitted studies, only 30% of studies showed positive outcomes, while 62.5% reported null or mixed findings, and 7.5% reported negative results. Longitudinal studies, which offer stronger evidence of causality, yielded even fewer positive findings. A trend of more favorable results among FTM individuals in cross-sectional hormone studies was observed, likely influenced by testosterone’s intrinsic effects on mental health.
Conclusions:
These findings suggest the original review significantly overstates the strength of the evidence and highlights the need for more rigorous, transparent, and objective research in this area.
The What We Know Project, an initiative based at Cornell University, has created a widely cited review, the What We Know Project (2018) claims to synthesize scholarly research on the impact of gender transition on the well-being of transgender people. Cited frequently in both policy and academic circles, this review is often presented as supporting the claim that mental health outcomes are improved with medical transition. However, despite its impact, it has been widely critiqued for lack of mechanisms, peer review and for cherry picking conclusions.
A number of independent critiques have outlined issues that are damaging to the credibility of the review. For example, an assessment using PRISMA(Preferred Reporting Items for Systematic Reviews and Meta-Analyses) and AMSTAR 2 (A Measurement Tool to Assess Systematic Reviews) shows that the review does not comply with the requirements set for systematic reviews. In the words of an external analysis, “the authors provide no proof of a pre-registered protocol which means no evidence-based methodologies were used, it also lacks a clear justification within the review document stating reasons for not using standard processes” (Horvath, 2020). Moreover, the AMSTAR 2 evaluation claimed that, “the selection of participants was not carried out in duplicate, nor was it described in a clear manner that could be trusted, posing a greater risk of bias” (Blake, 2019)
Previously highlighted critiques focus on the following key issues:
This paper conducts a thorough re-evaluation of the What We Know Project’s review, analyzing both its methodology and the individual studies it includes. Unlike previous critiques that focused primarily on procedural flaws, this analysis replicates and reassesses the review’s findings to determine whether they accurately reflect the available research. By systematically reviewing each study, identifying misclassifications, and including omitted research, this paper aims to provide a more rigorous and objective assessment of the impact of medical transition on transgender well-being.
This paper systematically replicates and audits the What We Know Project's review of the evidence on gender transition and well being of transgender persons. The purpose is not to generate an entirely new meta-analysis or systematic review, it is to evaluate the classifications and conclusions of the original review to determine if they were consistent and accurately reflected studies, and that they followed a consistent and transparent methodology.
This re-evaluation includes:
The review uses the classification of the What We Know Project, which categorizes findings from studies into four groups:
Positive: A study reports statistically significant improvement on a mental health measure after transition-related intervention.
Null: A study finds no statistically significant change on any mental health outcome in regard to the transition.
Negative: A study reports statistically significant deterioration in a mental health outcome after the transition.
Mixed: A study reports both positive and null results; importantly these studies, and the findings of the studies, are usually inconclusive, weak, or do not lend themselves to definite generalizations.
Each study will be reclassified based on these criteria and we will assess the reliability and validity of the original review's classification.
Also,the studies in this review were classified as longitudinal or cross-sectional for the purpose of the review based on the methodological study design. Longitudinal studies were those with a pre-post design or multi-time point assessments of participant outcomes that allowed investigators to observe change over time within the same participants., which included prospective cohort studies and retrospective studies with a documented pre- and post-measure. Cross-sectional studies were defined as studies with a comparison of measurement at a single point in time — either across different groups (e.g., treatment versus no-treatment) or within a population — without consideration of within-subject change over time. Studies simply comparing groups (without a temporal dimension) were classified as cross-sectional even. This operational definition provided a consistent way to evaluate the level of evidence and implied causal inference across the literature reviewed.
For this audit data, were gathered from two principal sources:
The original What We Know Project review and the full text documents for all studies cited.
The 17 secondary reviews, meta-analyses, and guidelines that the What We Know Project cited as additional evidence.
Reclassification of studies - All referred citations were read in full and reclassified according to the definitions above, with the aim of determining whether the original review, represented an accurate portrayal of the study's findings, as well as coding it as a correct study outcome.
Bias analysis - Studies that have been published during the time period of the review and were additionally applicable to transition and mental health were identified and evaluated for whether their omission from the original review had incited bias.
Omission of non-validated measures - Studies that only evaluated subjective or aesthetic satisfaction (i.e., satisfaction about surgical results) and did not evaluate mental health outcomes using validated outcomes measures at all were excluded from this re-analysis, even though these articles were included in the original review.
Re-evaluation of secondary sources - All 17 articles including reviews, and guidelines, were read and assessed for whether they supported or mis-represented the claims made in the What We Know Project.
Using the same classification framework and reviewing each of the cited evidence, this analysis has allowed for a transparent replication and correction of the What We Know Project conclusions regarding medical transition and mental health.
When we fully reevaluated the entire set of 55 studies discussed in the What We Know Project's review, we found significant amounts of misclassification, studies that shouldn't have been included in the review, as well as inconsistencies in applying the systematic inclusion criteria for the review. From the 51 studies that the review identified as "positive evidence" for the mental health effects of gender transitions:
12 out of 51 studies (23.5%) were misclassified [7]-[18], the findings did not characterize a "positive" conclusion (i.e., they did not report statistically significant improvements on valid reliable mental health outcomes).
15 of the remaining 39 studies (38,4%) did not belong in the review as they did not examine mental health or well-being concerning transition [19]-[33], instead exploring unrelated or vague constructs, such as surgical satisfaction or self-reports of cosmetics results.
12 out of 24 (50%) studies that were not misclassified and retained as “positive” did not measure valid mental health outcomes [34]-[45], and relied on unvalidatied self-reports or indirect self-reports.
Two studies using the same participant sample were counted separately, inflating the count of “positive” findings.
In summary, at the end of the review, only 12 out of 51 (23.5%) identified as "positive" were classified correctly and relevant [46]-[57].
In comparison, the four studies classified as null or mixed were all correctly classified [58[-[61], were on a topic related to mental health, and included and met the stated inclusion criteria of the review.
When we took into account the 12 misclassified "positive" studies, the results change from the previously reported 93% positive and 7% null/mixed to:
Positive studies: 43%
Null/Mixed studies: 57%
If we focus only on longitudinal studies, which are a better design of studies for establishing causality, the results are following:
Positive (longitudinal): 38%
Null/Mixed (longitudinal): 62%
In addition to re-evaluating the studies cited in the What We Know Project’s review, this analysis also considered additional studies that were omitted from the original but met the review’s stated inclusion criteria: studies published in the relevant period from 1991-2017, focused on medical transition (e.g., hormone treatment or surgery), and reporting on clinically relevant mental health outcomes.
The omitted studies have been located to complete this analysis, which was accomplished by a purposive supplementary search of:
Examination of the 17 reviews, meta-analyses, and guidelines cited by the What We Know Project, which includes studies that were not included in the main review.
Review of frequently discussed or cited studies in academic discussions, research commentaries, and policy reports.
While the process did not follow a registered or exhaustive systematic search protocol, it prioritized studies that were commonly referenced in scientific or public debates, were cited within the secondary sources the What We Know Project relied on, or were highly relevant to the question of mental health outcomes post-medical transition.
The 12 omitted studies [62]-[73] were reviewed individually and categorized by their result type and study design:
After reclassifying and expanding with relevant omitted studies, the evidence base shifts substantially from what was presented originally in the What We Know Project review.It’s important to note that every omitted study reviewed showed null, mixed, or negative results, and did not report statistically significant positive results, which indicates that we may see some systematic selection bias with the original review.
Out of a total of 40 studies that meet the review’s inclusion criteria and focus specifically on valid mental health outcomes, only 12 (30%) provide statistically significant positive results associated with medical transition. In contrast, 25 studies (62.5%) reported null or mixed results, meaning either no statistically significant changes were reported or the findings were too weak or contradictory to support strong conclusions. In addition, there were 3 studies (7.5%) that found significant negative outcomes following medical transition.
When isolating the more robust subset of studies (n = 18) which were longitudinal (which are generally stronger to assess causal relationships), the results look similar. Only 5 of these studies (27.8%) reported positive effects, while 12 (66.7%) reported null or mixed outcomes, and 1 study (5.6%) reported negative outcomes. This more rigorous subset finds that the evidence strongly supports a null result, contrary to the overwhelming positive effect proposed by the What We Know project.
Cross-Sectional Studies
Surgery
Depression: Out of 7 studies, 2 reported positive effects[48,55], 4 null [16,15,10,64], and 1 negative[58]. Some of these included signs of pro-transition bias. Overall, the findings indicate surgery is ineffective for improving depression (null) in cross-sectional studies.
Anxiety: All 4 studies reported null results[64,16,15,58]. No positive associations were found. This outcome is therefore null.
Suicidality: Of 3 studies, 2 reported null results[10,66] and 1 reported a negative outcome[62]. No studies found improvement. This suggests null to negative results.
Quality of Life (QoL): Among 4 studies, 1 reported improvement[46], while 3 found null results[14,12,63]. Overall interpretation: null.
SCL / Psychiatric Distress: No cross-sectional studies assessed this outcome in relation to surgery.
Hormones
Note: Testosterone’s inherent psychological effects, especially in FTM individuals, may confound these findings.
To explore whether hormone therapy outcomes tend to be more favorable for female-to-male (FTM) individuals, we examined cross-sectional studies specifically addressing depression and anxiety, the two outcomes with the largest number of studies. While the overall number of studies remains limited, a pattern is observable. For depression, all three FTM-only studies reported positive results, while all five null results came from either mixed or male-to-female (MTF) samples. Similarly, for anxiety, two of the four positive studies were FTM-only, with the remaining two positive results from mixed and MTF groups, respectively. Neither of the null results for anxiety came from FTM-only samples. This suggests a lean toward more favorable outcomes in FTM participants. While the finding isn't definitive, it does align with the possibility that testosterone’s natural antidepressant and anxiolytic effects could contribute to these results independently of gender-affirming treatment, possibly confounding interpretations of treatment efficacy in these groups.
Depression: 10 studies examined this outcome. 5 reported positive results[, 5 reported null. Two of the positive studies had clear sample bias. Since many positive results come from FTM samples where testosterone may have independent effects, the overall result is best described as mixed.
Anxiety: Of 6 studies, 4 found positive effects and 2 null. Most positive results again come from FTM or mixed groups. The evidence suggests mixed results.
Suicidality: 3 studies assessed this outcome, 2 reporting null findings,1 reporting negative. None found positive associations. The outcome is null to negative.
QoL: Of 6 studies, 5 found significant improvement, 1 null. These results support a positive association between hormone therapy and perceived QoL in cross-sectional research.
SCL / Psychiatric Distress: 2 studies assessed this and both reported null results. Therefore, null.
Longitudinal Studies
Surgery
Depression: 3 studies analyzed this outcome. None found improvement, 2 no difference and 1 reported worsening symptoms. This indicates null to negative results.
Anxiety: 3 studies assessed anxiety. Two found no improvement, one found a negative association. Overall outcome is null to negative.
Suicidality: Only 2 studies assessed this. One found improvement, the other found worsening. Given the contradiction, this is best classified as null
QoL: 4 studies examined quality of life. Two found no change, one found mixed results, and one found negative outcomes. Collectively, this supports a null interpretation.
SCL / Psychiatric Distress: 3 studies assessed this outcome and all found no significant improvement. This confirms a null result.
General Mental Health: One study reported no overall improvement in mental illness after surgery, null result
Hormones
Possible mood benefits from testosterone itself should be considered, particularly in FTM samples, as already noted before.
Depression: 4 studies examined this; 2 found improvement, 2 found no change. This outcome is mixed.
Anxiety: One study found a positive result. No others addressed this outcome. Interpretation: positive, though based on limited evidence.
Suicidality: No longitudinal studies on hormones assessed this.
QoL: One study found mixed results. The evidence base is too limited to draw strong conclusions, so the outcome is mixed.
SCL / Psychiatric Distress: Two studies found significant improvement. This supports a positive association between hormone therapy and reduction in psychological distress.
cross sectional surgery | cross sectional hormones | longitudinal surgery | longitudinal hormones | |
Depression | null | mixed | null to negative | mixed |
Anxiety | null | mixed | null to negative | insufficient data |
suicidality | null to negative | null to negative | null | no data |
QoL | null | positive | null | insufficient data |
SCL | no data | null | null | positive(2 study) |
Above is a table showing the overall results of hormones and surgery impact on different mental health measures in longitudinal and cross sectional studies.
This analysis was done as an audit and re-evaluation of the What We Know Project’s review, not as a systematic review. Therefore, the following limitations should be noted. First, we did not discover omitted studies through a registered or comprehensive systematic search protocol. It is important to note that we tried to focus on peer-reviewed articles/reports with relevant studies that seemed fit with the review’s stated criteria and relied upon targeted literature searches, references from the reviews own secondary sources, and prominent studies that were commonly cited in public and academic conversations.
Second, while we made every effort to reclassify study outcomes using the same definitions (positive, null, negative, mixed), these categories are in part also subjective, even when study results were ambiguous or partial.
Third, this analysis did not implement any statistical weighting or meta-analytic synthesis but used a frequency-based approach to reveal trends in findings. A frequency-based approach does give a clearer picture of how studies were classified and subsequently interpreted, but does not account for sample size, effect size or study quality beyond a broad classification by design type (e.g., longitudinal vs. cross-sectional).
Finally, we prevented as much as possible from biasing our analysis; however, our critique was not yet peer-reviewed and should be interpreted as part of a larger dialogue about research standards and transparency.
The re-evaluation of the What We Know Project’s review reveals a substantially more mixed and inconclusive body of evidence than the originally claimed 93% positive and 7% null/mixed result.Our analysis showed that after correcting for misclassified studies, removing irrelevant studies, and adding studies that were omitted,only 30% of the relevant studies show positive mental health outcomes after medical transition. A substantial majority (greater than 60%) of the relevant studies have null or mixed outcomes and a small but interesting portion of studies showed negative outcomes, with 7% of studies having a negative result. Longitudinal studies had the same strong null outcome but with more convincing evidence of strength of causal effect.
These values contradict the often-stated simplification that the literature overwhelmingly shows positive mental health outcomes being achieved when medical transition is enacted. In contrast to the valid claim of having a significant amount of positive studies, this claim and review are weakened by the tendency to exaggerate positive outcomes, include studies that used an outcome measure unrelated to health or mental health, and cite unsystematic and other low-quality (e.g., grey) secondary sources. There is a need for stronger, more robust, systematic, and transparent practices.
Future work should include: designs that strive for longitudinal use with high-quality measures, clear definitions of outcomes that are consistent outcomes for the studies subject to the same review; more comprehensive and systematic review protocols that follow systematic review methods such as PRISMA; when reviews are completed then peer-reviewed; use of AMSTAR 2 as a tool to evaluate review quality; when policymakers and clinicians read the existing literature they should assess each study closely, and ensure strong evidence is presented to conclude general claims about clinical effectiveness.
1.What We Know Project, Cornell University, “What Does the Scholarly Research Say about the Effect of Gender Transition on Transgender Well-Being?” (online literature review), 2018.