Interpretation of clinical trial results: a committee opinion

This document provides guidance, background, and tips on how to recognize quality trials and focuses on evaluating the validity, importance, and relevance of clinical trial results. This document replaces the document of the same name, last published in 2008 (Fertil Steril® 2008;90:S114–20). (Fertil Steril® 2020;113:295–304. ©2019 by American Society for Reproductive Medicine.)

Evidence from clinical trials is fundamental to ethical medical practice. Along with patient preferences, circumstances, and clinical experience, evidence is central to effective clinical decision-making. Applying evidence to clinical questions requires filtering in the form of three questions. First, do the trial results reflect true effects of intervention, rather than artifactual ones (validity)? Second, do the results suggest that the intervention is clinically useful (importance)? Third, could the results apply to individual patients encountered in daily practice (relevance)? This document provides background and tips on how to recognize trials of quality and focus on evaluating the validity, importance, and relevance of clinical trial results (Table 1).

Table 1. Questions to help interpret study results using three filters: study validity, clinical importance, and clinical relevance.

Filter	Questions
Filter I: Are the Study Methods Valid?	Was the assignment of patients randomized? Was the randomization list concealed? Was follow-up sufficiently long and complete? Were all patients analyzed in the groups to which they were allocated?
Filter II: Are the Study Results ClinicallyImportant?	Was the outcome of sufficient importance to recommend treatment to patients? Was the treatment effect large enough to be clinically relevant? Was the treatment effect precise? Are the conclusions based on the question posed and are the results obtained
Filter III: Are the Results Relevant to Your Practice?	Is the study population similar to the patients in your own practice? Is the intervention reproducible and feasible in your own clinical setting? What are your patient’s personal risks and potential benefits from the therapy? What alternative treatments are available?

BACKGROUND

Chance, Bias, and Treatment Effect

There are three reasons why an intervention may appear to be effective: chance, an accidental event; bias, a systematic deviation from the truth caused by extraneous factors other than the intervention; and truth, a real treatment effect. Chance must always be considered when interpreting trial results and is explored in this document’s section on appropriate statistical interpretation. Bias may enter studies of all types but is least likely to be present in well-designed and executed clinical trials. Finally, although results from a valid study may be statistically significant, they may not translate into a clinically important benefit. A true effect may be too small or unimportant to help an individual patient.

Clinical Trials

Clinical trials are experimental studies that compare a specific intervention with an alternative intervention, placebo, or no treatment, with measurement of specific outcomes. Random allocation to intervention or control groups is a key step in trial design. Random allocation is designed to balance the distribution of prognostic factors between the groups. Prognostic factors that are linked to the outcome but independent of intervention may confound the study results if they are unevenly distributed between groups. In subfertility, female age and duration of subfertility are typical prognostic factors and potential confounders; examples in a menopause trial include severity of symptoms and time since menopause. A major strength of random allocation is its potential to distribute known and unknown confounders evenly between intervention and control groups. This balance is essential when the outcome of interest occurs independently of treatment, which is common with subfertility and menopausal symptoms.

Maximizing the Value of Time Spent Appraising Studies

Although clinical trials provide the most valid evidence for addressing therapeutic questions, their relevance and quality vary. The CONSORT (Consolidated Standards of Reporting Trials) guidelines were initially developed in the mid-1990s, and refined in 2010, to provide guidance for authors in an effort to improve the reporting of study results (1). Adherence to the CONSORT checklist provides authors with a comprehensive framework to improve the clarity and transparency of reporting study methodology, results, and conclusions (Table 2). The checklist can also serve as a guide for readers to assess the quality of reporting. Guidelines for efficient study interpretation have been published elsewhere (2, 3).

Table 2. CONSORT 2010 checklist of information to include when reporting a randomized trial* (taken from Schulz 2010 (1)).

Section	Topic	Item No.	Item checklist
Title and abstract		1a	Identification as a randomized trial in the title
Title and abstract		1b	Structured summary of trial design, methods, results, and conclusions (for specific guidance see CONSORT for abstracts [21,31])
Introduction	Background and objectives	2a	Scientific background and explanation of rationale
Introduction		2b	Specific objectives or hypotheses
Methods	Trial design	3a	Description of trial design (such as parallel, factorial) including allocation ratio
		3b	Important changes to methods after trial commencement (such as eligibility criteria), with reasons
	Participants	4a	Eligibility criteria for participants
		4b	Settings and locations where the data were collected
	Interventions	5	The interventions for each group with sufficient details to allow replication, including how and when they were actually administered
	Outcomes	6a	Completely defined pre-specified primary and secondary outcome measures, including how and when they were assessed
		6b	Any changes to trial outcomes after the trial commenced, with reasons
	Sample size	7a	How sample size was determined
		7b	When applicable, explanation of any interim analyses and stopping guidelines
	Randomization
	Sequence generation	8a	Method used to generate the random allocation sequence
		8b	Type of randomization; details of any restriction (such as blocking and block size)
	Allocation concealment mechanism	9	Mechanism used to implement the random allocation sequence (such as sequentially numbered containers), describing any steps taken to conceal the sequence until interventions were assigned
	Implementation	10	Who generated the random allocation sequence, who enrolled participants, and who assigned participants to interventions
	Binding	11a	If done, who was blinded after assignment to interventions (for example, participants, care providers, those assessing outcomes) and how
		11b	If relevant, description of the similarity of interventions
	Statistical methods	12a	Statistical methods used to compare groups for primary and secondary outcomes
		12b	Methods for additional analyses, such as subgroup analyses and adjusted analyses
Results	Participant flow (a diagram is stongly recommended)	13a	For each group, the numbers of participants who were randomly assigned, received intended treatment, and were analyzed for the primary outcome
		13b	For each group, losses and exclusions after randomization, together with reasons
	Recruitment	14a	Dates defining the periods of recruitment and follow-up
		14b	Why the trial ended or was stopped
	Baseline data	15	A table showing baseline demographic and clinical characteristics for each group
	Numbers analyzed	16	For each group, number of participants (denominator) included in each analysis and whether the analysis was by original assigned groups
	Outcomes and estimation	17a	For each primary and secondary outcome, results for each group, and the estimated effect size and its precision (such as 95% confidence interval)
		17b	For binary outcomes, presentation of both absolute and relative effect sizes is recommended
	Ancillary analyses	18	Results of any other analyses performed, including subgroup analyses and adjusted analyses, distinguishing pre-specified from exploratory
	Harms	19	All important harms or unintended effects in each group (for specific guidance see CONSORT for harms [28])
Discussion	Limitations	20	Trial limitations, addressing sources of potential bias, imprecision, and, if relevant, multiplicity of analyses
	Generalizability	21	Generalizability (external validity, applicability) of the trial findings
	Interpretation	22	Interpretation consistent with results, balancing benefits and harms, and considering other relevant evidence
Other information	Regstration	22	Registration number and name of trial registry
	Protocol	23	Where the full trial protocol can be accessed, if available
	Funding	24	Sources of funding and other support (such as supply of drugs), role of funders

In this summary, the elements of critical appraisal have been organized to first address study validity, then clinical importance, and finally, relevance to your practice (Table 1). It is logical to filter in this sequence because a trial that is of insufficient quality to meet validity criteria may be bypassed without an assessment of importance or clinical relevance. Validity can be assessed from a perusal of the methods (and sometimes the methods section of the abstract) without reading the entire paper, thus making the most of the limited and valuable reading time available to clinicians.

Does the research question specify the population, intervention, and outcomes? Good trials provide a succinct and clear statement of the research question which is paramount to interpreting the results. Subject characteristics, such as stage of disease, gender, age, and ethnicity must be defined before extrapolating from the trial to individual patients or populations. The dose and mode of administration of the intervention determines whether it is relevant to clinical practice. The choice of outcomes or endpoints should be clearly stated. A published clinical study will be used to illustrate this and other key points of this discussion.

Example: Among infertile women with PMOS, is clomiphene citrate or letrozole more effective in achieving live birth? (4) The cited report should clearly define the population, the intervention, and the primary outcome.

Is the question clinically important and unanswered? Good trials address questions that are important enough to involve human subjects, where the value of medical or other alternatives remains in doubt. Papers that are worth reading should also provide evidence that the question has not already been answered through a systematic literature review.

Example: Polyendocrine metabolic ovarian syndrome (formerly known as Polycystic ovary syndrome) is one of the most common causes of female infertility and affects 5%-10% of reproductive aged women. Clomiphene citrate has been used for decades as first line ovulation induction therapy. However, limitations of therapy include poor efficacy, high multiple pregnancy rate, and undesirable side effect profile. Previous studies of treatment have been limited by insufficient power and usage of surrogate endpoints including ovulation or hormone levels. This study sought to compare the safety and efficacy of clomiphene citrate compared to letrozole in achieving live birth, the most meaningful outcome in infertility studies, in women with PMOS (5).

FILTER I: ARE THE STUDY METHODS VALID?

Once it is determined that a study has a reasonable chance of addressing the clinical question, it is time to look closely at the quality of the methods to decide whether the results are valid.

1. Was the assignment of patients randomized?

Random allocation is the cornerstone of a clinical trial. Unless this process is truly impartial, maldistribution of important confounders between groups may occur. Open random number tables or pseudo-random methods such as chart or social insurance number are insecure and should not be trusted. The most secure methods blind the investigators to group assignment. Two further questions about the balance between groups after randomization are relevant to the overall validity of a trial.

Was randomization effective?

Randomization does not guarantee a balanced distribution of confounders. The number of subjects and the distribution of important prognostic factors should be similar between the groups. This information may be in the methods, but frequently is presented in the first results table. Significant imbalance may reflect insecure randomization or the play of chance. Both should be considered when assessing results.

Were interventions other than the one(s) under study evenly distributed between groups?

Co-intervention, the planned or unplanned exposure of subjects to a potentially effective maneuver other than the intervention under study, happens even in carefully executed trials. Reporting such exposures allows the reader to decide if results may be biased by uneven distribution of these post-randomization confounders.

Example: A total of 750 patients with polyendocrine metabolic ovarian syndrome were randomized to treatment. A total of 158 women dropped out or were excluded from further analysis; 85/376 (22.6%) in clomiphene group and 73/374 (19.5%) in letrozole group, P=.30. This suggests that the results of the study were not biased by differences in withdrawal between treatment groups (4).

2. Was the randomization list concealed?

Unless it is impossible for recruitment personnel to know which allocation is coming up next, conscious or unconscious steering of patients may introduce imbalance between the groups. The order of allocation must be concealed in addition to ensuring that patients, clinicians, and outcome assessors are blinded, because allocation concealment cannot always be achieved simply by blinding. Third-party randomization by phone or pharmacy is the most secure option. Numbered, opaque, sealed envelopes are less expensive and reasonably tamper-proof.

The importance of designs that conceal the order of allocation was illustrated by a systematic review of 250 trials. Those which did not describe the method of concealment, or employed an insecure method, reported treatment effects that were 33% and 41% higher, respectively, than studies reporting secure allocation methods (6).

Example: Subjects were randomized using a 1:1 treatment ratio using stratified randomization with permuted blocking via web-based secured randomization service (4).

Were subjects and assessors blinded to intervention and was a placebo used? Where decisions about treatment are made by caregivers and decisions about outcomes involve judgment, blinding is essential to prevent conscious and unconscious bias. Subfertility trials, particularly surgical ones, are rarely blinded. However, even objective outcomes such as pregnancy may be influenced by knowledge of exposure. For this reason, blinding and the use of placebo are both positive features of a trial.

Example: The study was a double-blinded, multicenter randomized trial. The primary outcome was live birth during the treatment period, defined as delivery of any viable infant (4). Live birth is the most relevant and meaningful primary outcome in an infertility trial and previous randomized studies of letrozole were limited due to small sample size and inconsistent study design.

3. Was follow-up sufficiently long and complete?

Loss to follow-up of more than 20% of subjects is likely to seriously undermine the validity of results; less than 5% loss is reassuring. For rates in between, it may be helpful to consider how study findings would vary if all lost subjects had either conceived or all had failed to conceive. This ‘‘sensitivity analysis’’ tests the robustness or reliability of findings. If similar proportions of subjects are lost from intervention and control groups, the effects of loss to follow-up are more likely to be balanced.

Example: A total of 750 patients with polyendocrine metabolic ovarian syndrome were randomized to treatment. Study participants were followed for up to five treatment cycles and were followed with visits to determine ovulation and pregnancy and this was followed by tracking of pregnancy outcomes. A total of 158 women dropped out or were excluded from further analysis; 85/376 (22.6%) in the clomiphene group and 73/374 (19.5%) in the letrozole group, P=.30. The authors acknowledge that drop-out rate was higher than expected in this study but the rates of drop out were similar in each group (4). This suggests that the results of the study were not biased by differences in withdrawal between treatment groups.

4. Were all patients analyzed in the groups to which they were allocated?

An important issue is whether all subjects randomized to intervention or control are included in an intention-to-treat analysis. Subjects who do not complete treatment and may therefore have a suboptimal response and those who switch to the alternate treatment are kept in their allocated group for analysis. In subfertility trials, subjects who have spontaneous pregnancies after randomization but before the intervention would be analyzed with the group to which they were allocated. An intention-to-treat analysis resembles clinical practice where patients frequently decide to stop or switch treatments. Therefore, the results of an intention-to-treat analysis are relevant to patients having their initial discussion about treatment when their treatment and follow-up are uncertain. If a study fails to include all randomized subjects in this way, it is likely to overestimate the size of the effect of the intervention.

Example: A total of 750 patients were randomly assigned to clomiphene citrate or letrozole in 1:1 permuted block of two, four, or six for up to 5 treatment cycles. The last enrolled patient finished study medication in July 2012 and the last birth was reported in February 2013. There were no significant differences in drop out or exclusion rate and no significant differences in reason for withdrawal. Patients were included in analyses, as assigned. No crossovers were reported (figure 1) (4).

Figure 1. Enrollment and outcomes of the trial (4). Reprinted with permission

FILTER II: ARE THE STUDY RESULTS CLINICALLY IMPORTANT?

Having established that the quality of the study design is sufficiently good to ensure that the results are valid, the next step is to look critically at the results and determine whether they are important enough to matter in clinical practice. In other words, would patients be interested in hearing about this outcome, and is the effect large enough to make a difference in their clinical management?

1. Was the outcome of sufficient importance to recommend treatment to patients?

Clinicians should make their own judgments about the clinical relevance of surrogate outcomes; for example, oocyte number, implantation rate, and positive pregnancy test are not clinically important outcomes in most circumstances. Such surrogate outcomes are often used incorrectly to increase study power and efficiency of follow-up.

In subfertility trials, live birth is the generally accepted primary endpoint. Secondary outcomes, such as multiple pregnancy and neonatal morbidity rates, should also be reported, since they are essential elements of effectiveness.

Example: Live-birth rate was the primary outcome assessed. This is the most relevant and meaningful outcome in an infertility study and was a major strength of the study design.

2. Was the treatment effect large enough to be clinically relevant?

A short summary of treatment effects would be useful before tackling this question. In assessing the occurrence or nonoccurrence of an event such as live birth or disease, four simple expressions are frequently used:

Relative risk (RR)—the ratio of the probability of success with experimental treatment over the probability with the control treatment;
Risk difference (RD)—the absolute difference between the probability of success with experimental treatment and the probability of success with the control treatment;
Number needed to treat (NNT)—the number of subjects that must be treated to achieve one more outcome with intervention than control;
Odds ratio (OR)—the ratio of the odds of success with experimental treatment over the odds with the control treatment. It is a measure of the probability of success over the probability of failure.

For an event that occurs in 6 of 10 individuals; the rate or probability is 6/10; the odds, however, are 6/4 (p/1-p). Odds ratios are easier to calculate but more difficult to interpret because odds are seldom used in clinical practice, where risks or rates are more intuitive. The odds ratio is mainly useful with retrospective case-control studies because the odds ratio in case-control studies approximate the risk ratio. However, in prospective studies, for the odds ratio to approximate the risk ratio, the rare disease assumption must be met in which the outcome of interest occurs in less than 10% of the study population. The treatment effect presented depends on study question, study design, and the findings the authors are trying to emphasize.

Example: In the PPCOS II trial, the authors presented the ratio of the cumulative incidence of live birth as the primary outcome measure. Patients were followed for up to 5 treatment cycles to allow sufficient time to achieve live birth. As shown in figure 2, the group of women who received letrozole had more live births than the group of women who received clomiphene (103/374; 27.5% vs. 72/376; 19.1%, P=.007). The number of live births in patients who took letrozole was 103 out of 374 patients (0.28) and following clomiphene was 72 out of 306 patients (0.19). The risk ratio is the ratio of these cumulative incidence rates per person. Women who took letrozole were found to have 1.44 times the rate of live birth as women who took clomiphene over 5 treatment cycles (4).

Using the PPCOS II trial data, we can calculate relative risk and odds ratios.

2 x 2 tables provide a template for calculating relative risk and odds ratios.

	Exposed (letrozole)	Control (clomiphene)	Total, n
Live birth	103 (A)	72 (B)	175
No live birth	271 (C)	304 (D)	575
Total	374	376

RR = (A / A+ C) / (B / B + D)
(103 /103 + 271) / (72 / 72 + 304) = 1.44

OR = (AD / BC)
(103 + 304) / (72 + 271) = 1.60

The measure of effect that makes the most sense in clinical practice is the RD, because it is a natural description of the difference between outcomes and has a straightforward interpretation. Also, RD is the clinically important difference that would be used to calculate sample size in the planning stage of the majority of clinical trials. More importantly, the inverse of the RD is the NNT, an estimate of how many persons would need to receive the experimental intervention before there would be one more or less event, as compared with the controls. The NNT is usually expressed according to a unit of time during which the treatment is given or effective. Absolute benefit and number needed to treat are crucial to patients choosing treatments because relative risk or benefit may be quite misleading.

RD = 27.5 – 19.1= 8.4% (letrozole live-birth rate – clomiphene live-birth rate)

NNT = 1 / 0.084 = 11.9 (n = 12)

Example: The absolute effect of treatment must be
calculated: the difference in live-birth rate between groups shows in the calculation above to be 8.4% (Table 2). In order to express this figure as a whole number, the reciprocal of 0.084 can be used to give a number needed to treat as shown above. Rounding upward, approximately 12 women must be treated with letrozole to achieve one additional live birth (4).

Figure 2. Outcomes with regard to live birth, ovulation, pregnancy , pregnancy loss, and fecundity. Reprinted with permission (4).

An additional attraction of the absolute measures (RD and NNT) is that they are free from the misinterpretations that accompany relative ratios (RR and OR). For example, a 35% increase in breast cancer risk (RR=1.35) before age 35 among oral contraceptive users may be misinterpreted as a 35% incidence of breast cancer (6).

This example highlights the importance for clinicians of focusing on absolute rather than relative effects, in reading study reports and talking to patients.

With this background on treatment-effect measurement, clinicians should ask two questions to determine whether the treatment effect was large enough to matter.

What was the size of the treatment effect?

The results are not clinically important unless the effect is both statistically significant and large enough to be clinically meaningful. The effect of the intervention on the primary outcome should be sufficiently different from the effect of the alternative that the average patient would have no hesitation in making a choice.

Example: The absolute difference in live-birth rate between groups was 8.4% (95% CI 2.4, 14.4) The live-birth rate was 27.5% in women who took letrozole versus 19.1% in women who took clomiphene (4).

What did the investigators consider clinically important?

If a trial is large enough, it may demonstrate statistically significant differences between intervention and control groups that are too small to have any clinical importance. Examine the methods section to see whether the authors have considered and defined a ‘‘clinically significant difference’’ and whether they used this difference to calculate the sample size for their study (7).

Example: In infertility patients with PMOS, a live-birth rate of 27.5% as compared to 19.1% is clinically meaningful and an increase of 8.4% in live birth would be clinically important to patients (4).

Clinicians can make their own judgment about clinically important differences because that is exactly how investigators arrive at the estimates for their sample-size calculations. If a clinician believes that the anticipated effect size is not clinically important, even statistically significant results would not be clinically useful.

3. Was the treatment effect precise?

Statistical tests are done in order to determine whether a given result might have happened by chance. Over time, the statistical test report has evolved into a yes/no answer centered on the conventional 5% probability, while 4% and 6% might be of similar importance. A more useful guide to probability is the confidence interval (usually 95%) because it shows the range of results that might be expected if the study were repeated frequently in the same setting. If the confidence interval is narrow, the study gives a more precise estimate of the true value of treatment. Better precision reduces the uncertainty that goes with applying estimates from a trial to patients, no matter how similar the patients may be to the trial subjects.

Are trial results statistically significant?

A statistically significant result is simply one that has an acceptably low risk of occurring by chance and is therefore likely to have resulted from intervention. The probability that a difference is due to chance (type I error, a) is commonly set at 1/20 or 5%. Statistical testing measures the likelihood that a type I error has occurred and expresses that likelihood as P values and/or confidence intervals. The confidence interval estimates the range of possible values within which the true population value would lie, typically with 95% probability. In the following example, confidence intervals for the risk differences between letrozole and clomiphene groups are provided and interpreted.

Example: The live-birth ratio in patients who took letrozole as compared to clomiphene was 1.44 (95% confidence interval 1.10, 1.87). Thus, the chance that the study would detect a difference of <1.10 or >1.87 is less than 5%. Another way of stating this is that there is a 95% chance that the true effect size lies between 1.10 and 1.87 (4).

If no difference is detected between intervention and control, some clinicians (often those interested in carrying out a similar study) will check whether the trial was large enough to detect a clinically significant difference before dismissing the intervention as useless (8).

Did the study have adequate power?

The probability that by chance, a study will fail to detect a real, statistically significant difference (b), is often set at 0.1 or 0.2. In other words, the investigators accept a 10% or 20% chance that a real treatment effect exists but will remain undetected (type II error).

Few clinicians need to take an interest in these post-hoc power estimates, but analysis programs are available on the Internet to simplify the calculations. If the power to detect a difference of the reported size were, say, less than 60%, then additional adequately powered studies are needed to answer the clinical question.

4. Are the conclusions based on the question posed and the results obtained?

Once study validity, clinical importance, and statistical significance have been evaluated, it is time to weigh conclusions. Has the primary question been answered, and how confident are the investigators of their answer’s validity? Be wary of trials that report no difference in the primary outcome but emphasize a (statistically significant) secondary endpoint. Remember that if enough comparisons are made, some will appear to be statistically significant by chance: one in 20, if a is set at 0.05. If comparisons are made between subgroups of patients after trial design and execution (post-hoc), chance findings that seem significant are more likely. Consider these post-hoc subgroup analyses to be hypothesis-generating, not hypothesis-testing. They are legitimate only to the extent that they point the way to a promising new study to test the finding in an independent setting.

FILTER III: ARE THE RESULTS RELEVANT TO YOUR PRACTICE?

1. Is the study population similar to the patients in your own practice?

Enrollment in a trial is based on explicit criteria that are often narrow. These criteria must be carefully considered before extrapolating trial results to individual patients.

Example: Age 18-40 years with polyendocrine metabolic ovarian syndrome defined using modified Rotterdam criteria, not taking confounding medications, had at least one patent fallopian tube and a normal uterine cavity, a male partner with sperm concentration of at least 14 million per mL and a commitment to have regular intercourse during the study with intent of pregnancy (4).

Those outside these boundaries may respond to treatment in different ways. One evidence-based medicine book (Strauss 2005) suggests a different question to achieve the same consideration: is your patient (or your practice) so different from the study patients or practices that the study results could not apply? When analyzing a study and the conclusions, you must assess if the study population is generalizable to the population of patients you see in your practice to determine if the intervention would likely have the same effect. In this case, the participants included were reflective of a typical patient population with polyendocrine metabolic ovarian syndrome and, as such, similar treatment effect would be expected.

2. Is the intervention reproducible and feasible in your own clinical setting?

The nature and components of the intervention should be clear enough to indicate whether the intervention is feasible. Is it available locally to be purchased or acquired? Is it affordable in monetary and time costs? Is it accessible without further training? Direct and indirect costs can be forbidding limitations on the feasibility of an intervention.

Example: Both drugs used in the PPCOS II study are oral medications readily available for prescribers. They are both designated as pregnancy category X by the FDA although clomiphene is approved for ovulation induction. Letrozole does not have FDA approval for ovulation induction but is commonly used as off-label indication. The authors did not provide any information regarding differences in cost between drugs (5).

3. What are your patient’s personal benefits and potential risks from the therapy?

Individual reckoning of benefits and risks may be necessary in some cases. Most often, the individual reckoning will be approximate and intuitive, but sometimes an explicit calculation can be made.

Example: The live-birth rate was higher with letrozole than with clomiphene and the rate of pregnancy loss, duration of pregnancy, birthweight and neonatal complications did not differ between groups. The twin pregnancy rate was lower in letrozole (3.9%) as compared to clomiphene (6.9%) although the authors acknowledge the study was underpowered to detect a between group difference. There were four major congenital anomalies in the letrozole group and one in the clomiphene group (4). These findings should be discussed with patients to guide treatment decisions.

4. What alternative treatments are available?

After the clinician has found the study that addresses the clinical question, ensured that the results are valid and clinically important, and estimated that the results are relevant to clinical practice, one question remains: is there an alternate treatment that might be considered in place of the now-proven intervention under study? More importantly, among the alternate treatments that are available, are there any that are supported by evidence which is as valid or important as evidence supporting the intervention under study?

Example: Ovulation induction is the most effective treatment for infertile women with PMOS to achieve conception. Alternative strategies including metabolic treatments such as metformin have been investigated without evidence of benefit (9). Further studies are needed to evaluate if a subset of patients may derive greater benefit or if other metabolic agents show more promise. Lifestyle modification including weight loss has also been evaluated with evidence of an increase in unassisted conception as well as conception following clomiphene (10, 11).

SUMMARY

Appropriate interpretation of study results involves the use of three filters:
1. Appraise the validity of the study.
2. Assess the clinical usefulness to your patients.
3. Make a judgment about the clinical relevance of the results to your patients.
If the methods of a study are not valid, it may be wise to move on to another report without wasting valuable time assessing importance or relevance.
Key elements of validity include the security of the randomization process, completeness of follow-up, and an intention-to-treat analysis.
The clinical importance is best evaluated based on the absolute treatment effects: the risk difference and the number needed to treat.
If the results are relevant to your practice, then cost and potential adverse effects are key issues when patients are making treatment choices.

Acknowledgments: This report was developed under the direction of the Practice Committee of the American Society for Reproductive Medicine as a service to its members and other practicing clinicians. Although this document reflects appropriate management of a problem encountered in the practice of reproductive medicine, it is not intended to be the only approved standard of practice or to dictate an exclusive course of treatment. Other plans of management may be appropriate, considering the needs of the individual patient, available resources, and institutional or clinical practice limitations. The Practice Committee and the Board of Directors of the American Society for Reproductive Medicine have approved this report.

This document was reviewed by ASRM members and their input was considered in the preparation of the final document. The following members of the ASRM Practice Committee participated in the development of this document. All Committee members disclosed commercial and financial relationships with manufacturers or distributors of goods or services used to treat patients. Members of the Committee who were found to have conflicts of interest based on the relationships disclosed did not participate in the discussion or development of this document.

Alan Penzias, M.D.; Kristin Bendikson, M.D.; Samantha Butts, M.D., M.S.C.E.; Tommaso Falcone, M.D.; Susan Gitlin, Ph.D.; Clarisa Gracia, M.D., M.S.C.E; Karl Hansen, M.D., Ph.D.; Micah Hill, D.O.; William Hurd, M.D., M.P.H.; Sangita Jindal, Ph.D.; Suleena Kalra, M.D., M.S.C.E.; Jennifer Mersereau, M.D.; Randall Odem, M.D.; Robert Rebar, M.D.; Richard Reindollar, M.D.; Mitchell Rosen, M.D.; Jay Sandlow, M.D.; Peter Schlegel, M.D.; Anne Steiner, M.D., M.P.H.; Cigdem Tanrikut, M.D.; and Dale Stovall, M.D.

REFERENCES

Schulz KF, Altman DG, Moher D, CONSORT Group. CONSORT 2010 statement: updated guidelines for reporting parallel group randomized trials. Obstet Gynecol 2010;115:1063–70.
Strauss SE, Richardson WS, Glasziou P, Haynes RB. Evidence-based medicine: how to practice and teach EBM. third edition. Edinburgh: Churchill Livingstone; 2005.
Guyatt GH, Sackett DL, Cook DJ. Evidence-Based Medicine Working Group. Users’ guides to the medical litera-ture. II. How to use an article about therapy or prevention. Are the results of the study valid? JAMA 1993;270: 2598–601.
Legro RS, Brzyski RG, Diamond MP, et al. Letrozole versus clomiphene for infertility in the polycystic ovary syndrome. NEJM 2014;371:119–29.
Legro RS,KunselmanAR,Bryzski RG,CassonPR,Diamond MP,SchlaffWD,et al. The Pregnancy in Polycystic Ovary Syndrome II (PPCOS II) trial: rationale and design of a double-blind randomized trial of clomiphene citrate and letrozole for the treatment of infertility in women with polycystic ovary syndrome. Contemp Clin Trials 2012;33:470–81.
Schulz KF, Chalmers I, Grimes DA, Altman DG. Assessing the quality of randomization from reports of controlled trials published in obstetrics and gynecology journals. JAMA 1994;272:125–8.
Lehr R. Sixteen S-squared over D-squared: a relation for crude sample size estimates. Statist Med 1992;11:1099–102.
UK National Case-Control Study Group. Oral contraceptive use and breast cancer risk in young women. Lancet 1989;1:973–82.
Legro RS, Barnhart HX, Schlaff WD, Carr BR, Diamond MP, Carson SA, et al. Clomiphene, metformin, or both for infertility in the polycystic ovary syndrome. NEJM 2007;356:551–66.
Clark AM, Ledger W, Galletly C, Tomlinson L, Blaney F, Wang X, et al. Weight loss results in significant improvement in pregnancy and ovulation rates in anovulatory obese women. Hum Reprod 1995;10: 2705–12.
Legro RS, Dodson WC, Kris-Etherton PM, Kunselman AR, Stetter CM, Williams NI, et al. Randomized controlled trial of preconception interventions in infertile women with polycystic ovary syndrome. JCEM 2015;100: 4048–58.

Practice Documents

ASRM Practice Documents have been developed to assist physicians with clinical decisions regarding the care of their patients.

Diagnosis and treatment of luteal phase deﬁciency: a committee opinion (2026)

Luteal phase deficiency (LPD) is a clinical diagnosis associated with abnormal luteal phase length of ≤10 days.

View the Committee Opinion

Artificial intelligence in the in vitro fertilization laboratory: a committee opinion (2026)

Artificial intelligence has already been portrayed as a tool that will impact different areas of laboratory function, most importantly embryo selection.

View the Committee Opinion

Fertility care and family building for LGBTQ+ individuals: a committee opinion (2026)

This ASRM Practice Committee Opinion provides clinicians with strategies and special considerations for the evaluation and treatment of LGBTQ+ individuals.

View the Committee Opinion

Transgender and gender-diverse care: a committee opinion (2026)

This ASRM opinion provides a comprehensive introduction to comprehensive transgender and gender-diverse care.

View the Committee Opinion

View all Practice Committee Documents

More Resources

ASRM Practice Documents

These guidelines have been developed by the ASRM Practice Committee to assist physicians with clinical decisions regarding the care of their patients.

View ASRM Practice Documents

ASRM Academy on the Go

ASRM MAC Tool 2021

The ASRM Müllerian Anomaly Classification 2021 (MAC2021) includes cervical and vaginal anomalies and standardize terminology within an interactive tool format.

View the MAC Tool

Practice Guidance

Coding Corner Q & A

The Coding Corner Q & A is a list of previously submitted and answered questions from ASRM members about coding. Answers are available to ASRM Members only.

View the Q & A

Practice Guidance

EMR Shared Phrases/Template Library

This resource includes phrases shared by ASRM physician members to provide a template for individuals to create their own EMR phrases.

View the library

ASRM Ethics Opinions

Ethics Committee Reports are drafted by the members of the ASRM Ethics Committee on the tough ethical dilemmas of reproductive medicine.

View ASRM Ethics Opinions

Practice Guidance

COVID-19 Resources

A compendium of ASRM resources concerning the Novel Corona virus (SARS-COV-2) and COVID-19.

View the resources

Couple looking at laptop for online patient education materials

Patient Resources

ReproductiveFacts.org provides a wide range of information related to reproductive health and infertility through patient education fact sheets, infographics, videos, and other resources.

View Website

Topic Resources

View more on the topic of research

July 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

ASRM Comments on Two Proposed Federal Rules

ASRM comments on two federal proposals affecting fertility benefits, patient protections, scientific peer review, and the future of federally funded medical science. Read ASRM's Comments

Fertility and Sterility On Air - Unplugged: May 2026

Fertility and Sterility Unplugged podcast reviews global reproductive medicine research, journal highlights, author discussions, and ASRM updatescoverage insights Listen to the Episode

June 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

ASRM Distinguished Researcher Award

This award honors an ASRM member with major reproductive science research contributions over the past decade and who has had a lasting impact on future scholars. View the Award Information

Ira And Ester Rosenwaks New Investigator Award

This award recognizes a member of ASRM who has made outstanding contributions to clinical or basic research in reproductive sciences published within 10 years after receiving the doctoral degree or completing residency training. View the Award Information

May 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

New Research Examines Range of Restorative Reproductive Medicine Practices from Evidence-Based Perspective

ASRM’s Fertility and Sterility series examines restorative reproductive medicine, IVF alternatives, and evidence-based fertility care amid growing policy debate. View the Press Release

April 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

Fertility and Sterility On Air - Roundtable: Influencing Ovarian Aging

Explore ovarian aging in reproductive medicine—experts discuss IVF research, emerging treatments, mTOR pathways, and why “ovarian rejuvenation” remains unproven. Listen to the Episode

Ethical considerations of in vitro gametogenesis: an Ethics Committee opinion ASRM (2026)

In vitro gametogenesis (IVG) represents a potentially transformative yet currently experimental frontier in reproductive science. View the Committee Opinion

Fertility and Sterility On Air - Unplugged: March 2026

Fertility podcast explores IVF research, PRP risks, and recurrent pregnancy loss, highlighting evidence gaps, patient safety, and emerging reproductive medicine trends. Listen to the Episode

March 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

ASRM President-Elect Dr. Amy Sparks Receives Michigan State University Outstanding Alumni Award

ASRM has proudly announced President-Elect Dr. Amy Sparks, Ph.D., as the winner of the 2026 Outstanding Alumni Award from the Michigan State University College of Agriculture and Natural Resources (CANR).

View the Press Release

February 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

"Fertility and Sterility On Air - Unplugged: December 2025

Listen to Fertility & Sterility On Air – Unplugged December 2025 for expert reproductive medicine discussions, journal highlights, clinical insights, and fertility research updates. Listen to the Episode

Fertility and Sterility On Air - TOC: January 2026

Listen to Fertility and Sterility On Air—the January 2026 podcast from ASRM—highlighting new fertility research, IVF studies, and expert insights shaping reproductive care. Listen to the Episode

January 2026: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

Journal Club Global at Turkish Society of Reproductive Medicine Meeting

Fertility & Sterility is proud to once again partner with the Turkish Society of Reproductive Medicine. The panel will discuss the evidence behind an association between endometrial thickness and chance of live birth.

View the Video

Journal Club Global: Emulated Trials - A New Research Method With Insights Into Fertility Vitamin Supplements

Explore how emulated trials reveal the impact of vitamin D on fertility, featuring ASRM experts and real-world research insights from the FAST trial. View the Video

Fertility Experts Publish New Research Highlighting Declining Fertility Rate, Causes and Global Impacts

Falling fertility rates could have detrimental impacts on global population, economic growth.
View the Press Release

December 2025: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

Catherine Racowsky, PhD, Embryology Education Scholarship Announced at ASRM Gala

Catherine Racowsky, PhD was elated to learn that her friends, family, and colleagues had planned a surprise in her honor: a new scholarship in her name. Learn More About the Scholarship Announcement

November 2025: What's New from the Fertility and Sterility Family of Journals

Here’s a peek at this month’s issues from our family of journals! As an ASRM Member, you can access all of our journals. Read More about the newest articles

ASRM Inaugural INNOVATE

ASRM INNOVATE spotlighted the energy of innovation in reproductive medicine and how collaboration will shape the future of fertility and reproductive health. Read about INNOVATE

Key Abstracts Presented at the ASRM 2025 Scientific Congress & Expo

ASRM 2025 reveals support for IVF access, wildfire smoke's fertility risks, and how insurance mandates improve outcomes in reproductive health care. View the Press Release

ASRM Announces $1 Million Gift from Dr. Kwang-Yul Cha to Fund Reproductive Research Grants

ASRM receives $1 M gift from Dr. Kwang‑Yul Cha to fund reproductive research grants — strengthening fertility science and innovation. View the Press Release

ASRM 2025 Scientific Congress & Expo is Underway in San Antonio, TX

The American Society for Reproductive Medicine (ASRM) is currently hosting the 2025 Scientific Congress & Expo in San Antonio, Texas, from October 25 - 29, 2025. View the Press Release

American Society for Reproductive Medicine Honors 2025 Awardees at Scientific Congress & Expo in San Antonio, TX

ASRM honors leaders in reproductive medicine with 2025 Scientific Congress Awards for research, service, education, and clinical innovation. View the Press Release

Fertility and Sterility: Celebrating 75 Years

For three-quarters of a century, this flagship journal has been at the heart of reproductive medicine, shaping the field and driving the discoveries that change lives. Fertilty and Sterility turns 75!

How to be the Best Abstract Reviewer

Learn how to review abstracts effectively with tips on novelty, relevance, quality, conclusions, rubrics, and scoring from Dr. Chevis Shannon. View the ASRMed Talk Video

How to Write a Well Crafted Abstract

Learn how to write a winning abstract. Follow instructions, highlight key findings, avoid jargon, and keep your message clear and concise. View the ASRMed Talk Video

Interpretation of clinical trial results: a committee opinion (2020)

Expert guidance from ASRM to evaluate clinical trial results—criteria for validity, importance, and relevance to improve evidence‑based reproductive care. View the Committee Opinion

Improving the Reporting of Clinical Trials of Infertility Treatments (IMPRINT): modifying the CONSORT statement (2014)

Clinical trials testing infertility treatments often do not report on the major outcomes of interest to patients and clinicians and the public. View the Guideline

SPARK Program

Creating opportunities for collaboration and resource-sharing among basic scientists, physician-scientists, and clinicians. Learn more about SPARK