Journal Club Global: Emulated Trials - A New Research Method With Insights Into Fertility Vitamin Supplements
Transcript
At ASRM 2025, the Fertility and Sterility Journal Club Global explored emulated trials, spotlighting vitamin D's role in fertility. Experts reviewed a secondary analysis of the FAST trial, showing that preconception vitamin D sufficiency in females was linked to higher live birth rates. Using target trial emulation, the study mirrored randomized trial design with observational data, enhancing causal interpretation. The panel emphasized design clarity, DAGs, and minimizing bias. Emulated trials were presented as a vital method when RCTs aren't feasible, offering a path to more accurate, actionable fertility research using real-world data.
Panelists:
- Dr. Enrique Schisterman
- Dr. Sunni Mumford
- Dr. Christos Venetis
- Dr. Kurt Barnhart
- Dr. Julie Di Tosto
- Dr. Allison Eubanks
Fertility and Sterility Hosts:
- Dr. Pietro Bortoletto
- Dr. Micah Hill
Right, welcome. Thank you to all of those who are here on time. Hopefully some more people come.
We're excited to have everyone here for Fertility and Sterility Journal Club Global Live at ASRM 2025 in San Antonio, Texas. I'm very excited to have a discussion today on emulated trials. This is a new research method that gives us insights into fertility and other things that we can study maybe in a little bit of a different way than we have previously.
Today we're going to be talking about vitamin D supplements. We have a world-class team of experts up here to learn from in the interest of time so we can spend most of our time on the discussion. I'll just go down the row quickly.
At the far end we have Dr. Allison Eubanks. She's the first ever editorial fellow for Fertility and Sterility. It's a new program we implemented a year ago to try to bring in young future potential editors for journals and just teach them about the editorial process and she's been doing a fantastic job so she got an extra year for her good work.
For those of you who don't know, next is Kurt Barnhart. He's the editor-in-chief of a little journal called Fertility and Sterility. He also is an expert epidemiologist and does a lot of clinical trials so he is on today as a panelist.
We have Enrique Shisterman. He is one of the authors on the article that we will be discussing today. Also an expert epidemiologist, the editor-in-chief of the American Journal of Epidemiology with his team from Penn.
We have first author of the paper, Julia de Tosto. She is also from Penn. Next to her is Sunny Mumford, also from Penn.
She is one of the lead methodologic editors for Fertility and Sterility, helping us elevate the quality of the studies that we publish. Next we have Christos Vinitis. He is also an expert in both research and clinical trials as well as having written some papers on emulated trials so he has his feet in both worlds.
He spends time in Greece and Australia, a worldwide expert. We thank you, sir, for spending time with us today. Next is my partner in crime, Pietro.
We do the media stuff for Fertility and Sterility. I'm Micah Hill, the media editor. So we're going to jump right in.
Allison will come and give us a five-minute setup of what this paper was so that we all understand what it is we're talking about. Thank you, Allison. All right.
Welcome, everybody. So just to introduce this concept, we all know that randomized controlled trials are the gold standard but many fertility questions can't be answered by randomized trials. They're often too expensive or ethically impractical.
Meanwhile, we already have rich observational data that we could use but the traditional analyses that we generally use don't give us causal data and answers. So target trial emulation bridges that gap, offers a framework that borrows the design logic of RCTs while using real-world data. A target trial emulation design is about designing first and analyzing second.
You start by specifying your hypothetical RCT, who's eligible, what the exposure is, how long follow-up lasts, and what you would measure. Then you apply the structure to the observational data, the same eligibility exposure definitions and analysis plan. It's a transparency tool that reduces bias and makes results interpretable as causal effects.
So these graphics are going to hopefully help understand what we're doing here. So on the left is a standard randomized controlled trial. Participants are enrolled, randomized into exposure groups, followed prospectively, and the causal contrast is built into the design.
On the right, a typical retrospective study that defines the exposures after outcomes have occurred, which risks immortal time and selection bias. The emulated trial shown in the middle sits between the two. It mirrors the structure of an RCT but uses existing observational data.
The design defines a clear time zero for eligibility, specifies the exposure as if it's randomized, and applies statistical adjustments like inverse probability weighting to recreate what randomization would have done. It brings the rigor of a randomized design to observational data. And this approach has become increasingly important in fertility research.
Patients' heterogeneity and the time-sensitive exposures makes RCTs difficult. So the first paper, we are lucky enough to have the author with us today. The paper seen here is a secondary analysis of the FAST trial.
That was 2370 infertile couples that were randomized originally to folic acid and zinc supplementation. For this paper, though, they asked a simple but important question, does preconception vitamin D status affect live birth outcomes? Running an actual randomized trial to answer that is nearly impossible. You can randomize supplements, but you can't truly randomize somebody's vitamin D status because it's influenced by too many other factors, exercise, diet, sun exposure.
So instead, the authors use target trial emulation to model what randomized trial would look like with this observational data. They used the serum vitamin D in both partners, categorizing levels as deficient, insufficient, or sufficient. Outcomes included live birth and secondary endpoints like pregnancy loss and semen quality.
And they adjusted for baseline confounders using inverse probability weighting, essentially recreating a randomized comparison. The study found that preconception vitamin D sufficiency, defined as levels over or equal to 30 nanograms per milliliter, was associated with about a 28 percent chance higher chance of live birth. When both partners were sufficient, alive birth rates were roughly 38 percent versus 29 percent where both were deficient.
These results were strongest in normal and obese BMI groups. The second paper proposes standards for how non-randomized fertility studies should emulate trials. The goal is to make observational research credible and actionable where RCTs are not possible.
They emphasize the clear design, define the question, pre-specify the protocol, and use DAGs to map confounding instead of relying on data-driven adjustments. The central idea here is that design, is to design observational studies to mirror the randomized trial that would be conducted if it was feasible, making results more transparent, reproducible, and less biased. Target trial emulations are designed to reduce the kind of bias that we often exist in observational data by enforcing structure, defining eligibility, exposure, and follow-up the same way we would in a randomized trial.
This approach strengthens causal interactions and interpretation, and it doesn't create new data, but it applies a disciplined design principles to the data we already have. And next, we'll look at how that's actually done with our panel. Great.
Thank you, Allison. So, we'll start with the first author. So, Julia, you know, we've had a lot of trials over, not trials, but observational data over the last 20 years that's looked at vitamin D and fertility outcomes.
It's looked at vitamin D with all sorts of outcomes across medicine. Why did you decide to look at this again, and why did you choose an emulated trial to address that question? How did you think it could improve upon the existing literature? Thank you for that. Thank you for that question.
I think that's a really important thing to consider when you set up an emulated trial is why is that method particularly impactful and beneficial for that question of interest. And specifically for vitamin D, we know that an individual can achieve their vitamin D levels through multiple sources, through their diet and physical activity or supplementation. And oftentimes in trials that use a method like supplementation to change vitamin D levels, it's not necessarily accounting for these other factors.
And so, we wanted to see how vitamin D levels within an individual's body, particularly looking at biomarkers, might impact fertility outcomes. This is not something that you can randomize individuals to because, like I said, individuals can achieve their vitamin D status through different sources. So, that's why it could have been particularly crucial to look at this using a target trial emulation framework.
In addition, our data source was particularly well suited to answer this question. We used a secondary analysis from a randomized control trial that was a preconception cohort. And as many people could probably understand, conducting a randomized control trial for a preconception exposure on an outcome is really difficult because you need to include individuals who both eventually do get pregnant and those who do not get pregnant.
So, the sample size needed in general is larger. And so, we decided to leverage the data from the FAST trial, which already created this cohort, clearly defined baseline as well as the outcomes measured. And so, it seemed like a particularly well-suited data set to answer our question.
For Enrique and Sunny, you know, I've sat in the room when you were at the NIH and you were teaching fellows and we were talking through, you know, how we were doing retrospective studies. Just tell me, how is your approach different now when you did this with an emulated trial versus when I was sitting in the room learning from you how to do retrospective analyses? So, first of all, the key component of what Julia was saying is one of the issues that we struggle a lot. When you have an exposure that can come from different sources, you wouldn't be able to do a randomized trial.
So, vitamin D will come from the sun, will come from diet, will come from supplements. So, how do you randomize, answer the question, vitamin D levels circulating in the body is really hard to study. And so, a biomarker is a way to get to the question of all the sources combined.
And so, Julia's approach was to be trying to answer that question. What's the level that will lead to, from all sources, to lead to either having a positive or negative outcome? The thing that has changed over the years is that this understanding that observational studies were not precise enough on answering the question or posting the question. The questions were too vague and therefore, when you have vague questions, the answers are also all over the map.
What this approach has brought us is to be much more precise with the questions, which trials are actually really good at that, right? So, we ask, is this level versus this level of some drug will lead to some positive or negative outcome? And in observational data, we used to ask the question, does smoking prevent pregnancy? And so, we never went to how much smoking, how many cigarettes per day, what's the brand? And the specificity of the questions in observational data was not there when we were back in NIH. And so, this is the framework that lets us think of observational data with that precision. And that's a huge change overall.
Sunny, when you're considering sources of observational data, is every observational data set amenable to emulated trial methodology, or are there certain data sets that are more amenable? I think that you could use almost any observational data set in the target trial framework. The key is really thinking about the question that you want to answer and thinking about what that trial looks like. Who would be randomized? What am I randomizing? Thinking about the inclusion exclusion criteria and then mapping that to your observational data.
So, we've done this in observational data from a randomized trial. We've also been doing this in OFDM, which is insurance claims data, and in electronic medical record data. And I think there, it really highlights some of the complexities of this, you know, because you can create a cohort in EHR data pretty easily.
But when you start to think about, if I was going to randomize someone, what would that person look like? How would I follow them up? And making sure that you had the cohort set up in that way. I feel like that's when it becomes really powerful. And so, really any data set can be leveraged in this way when you're setting up the questions as an emulated trial.
Knowing what you know now, having to make the observational data fit into this methodology, if you're in the audience planning an observational study, planning to put forth some efforts towards data collection, are there things you would prospectively add to make sure that you can then go backwards or others could go backwards and utilize your data for emulation? Yeah, for sure. Biostatisticians always want more data. The answer is always going to be yes.
Don't we all? Reasons for loss to follow up or discontinuation of the treatment exposure or the exposure is critical. So, if you're thinking on the trial framework, you will be collecting data that is what was the reason that somebody stopped taking the drug or side effects or everything that is related to the completion of the trial. The same things we want in the observational data.
And so, that's the framework. We need to be thinking that that observational data will mimic a trial. And so, we'll do the same thing.
One other thing about what Sonny was saying is one of the big changes is in observation. If I ask anybody here, would you remove anybody from your trial post-randomization? And it will be intuitively to all of you, all of us, that you don't remove on trial post-randomization anybody. In observational data, many of the analyses have removed individuals from the study post-randomization.
That correction alone has brought the reduction on the bias that we have seen to really large amounts. That's one of the main sources of bias that we have seen between the results that are conflicting between observational studies and clinical trials. So, that framework of thinking of an observational study as if it was a trial will reduce the way that we think to reduce the bias that we induce.
Male Speaker 1 Male Speaker 2 So, I don't often disagree with Enrique and Sonny, but it's not a disagreement. It's an elaboration. Sonny said that any observational data set would do, and I think it's just we have good quality ones.
Any data set will not do. It has to be a data set where the person of interest would be in it to begin with. You can't misfit the patients you want to emulate to a data set that doesn't treat them very well, and it has to have precision on what you're trying to measure and what your outcome is, and sometimes observational datas are limited because they just don't have the information or the specificity of the variable you want.
I know that was a threshold comment that you understood that, but that's what was so beautiful about your trial, and to go back to it, is that they had levels to begin with. That wouldn't have been in anybody's data set. The fact that you had a preconception cohort that had baseline levels allowed this emulation based on that level, and then it obviously followed people forward, because those are the people I wanted to know.
I wanted to know preconception. You couldn't take that question and add it to insurance claims database as easily. That's what I mean.
I was just trying to make that point. Male Speaker 3 Question. Julia, before we go on, can you just describe the main finding? What overall was the primary outcome or finding for the study? We envisioned three target trials, actually, in this analysis.
We envisioned one trial where we're randomizing the female partner to vitamin D levels, another trial where we're randomizing the male partner to vitamin D levels, and finally, a third trial that would randomize the couple to vitamin D levels. We found that female vitamin D status was particularly predictive of live birth, so among those who had a deficient vitamin D status preconception, they had a lower likelihood of having a live birth. We did not find any associations with pregnancy loss in the male, female, or couple models.
We also did not find any associations between male vitamin D preconception and semen quality. Our results are suggestive that vitamin D, particularly for the female partner, is important for live birth. A lot of the couple associations that we were finding seemed to be driven by the female partner.
You had an interesting line in the paper that said, this is akin to a behavioral modification trial rather than a trial trial. Can you just explain what that means? And what I'm trying to get at is, is this an association? Do we think vitamin D is causing this? And ultimately, I'm sure the audience wants to know, should I be putting my patients on vitamin D? Does this data justify that? I know I just asked three questions in one, but if you can sort of expand on that a little bit. Thank you for the opportunity to clarify that.
I believe that might have been a typo, actually, saying trial trial. I think I meant to say a supplementation trial, because I think when we think about these biomarkers, especially vitamin D, the initial response might be, well, this is the same as a supplementation trial, because that is potentially the easiest ways to change someone's levels of these. But we don't know, because we were looking at vitamin D status, we don't know the differential impacts of changing your vitamin D levels through supplementation or through physical activity or diet.
And each of those mechanisms might have a different impact on the outcome of interest, because it's doing so in a different mechanism. And that is something that we weren't able to tease out in this analysis, because we were looking at circulating levels of vitamin D rather than the change in vitamin D due to initiating some of these interventions. But I do hope that our data potentially could influence the future conduct of a randomized trial that tries different interventions to change vitamin D levels.
Yeah. Christos. I just want to add to that.
I've been really interested in vitamin D and thinking about what that trial would look like, and I've gotten stuck many times. Do we give this high-dose supplement of vitamin D? Do we randomize someone to go out in the sun for 15 minutes without sunscreen? What would that look like? And so this, in a way, I think is a good way to look at it, because we're thinking about all the potential behaviors that could influence vitamin D, whether it's supplementation, time in the sun, what does it take to get that level for different people. And so that's another reason why we— Which is why you use the term behavioral modification, all those things that play into vitamin D status.
Christos. So thank you, and thank you for the opportunity to be here. And obviously, fantastic work, and it was quite interesting actually going through the paper in detail.
But that was one of my first observations. I tried to put myself as a clinician in the view, okay, what have I learned from this emulated trial? Usually when we do a randomized controlled trial, we assume that there is an intervention that we're going to then start applying if the results are good and have the same outcome. As I think you very clearly described, we don't know whether there is such an intervention, because some people might start giving vitamin D, but we don't know whether exogenous administration of a vitamin D supplement helps, or maybe it's exposure to the sun, or maybe it's better diet.
So if you were to quantify, if you were to come up with some type of recommendation, the behavioral modification, since I think in the paper it says we don't even know whether some of these patients were taking vitamin D, what would you recommend? What type of behavioral modification would you recommend when you don't actually know the composition and whether within your cohorts there are subgroup of patients that might have benefited way more, and other subgroup patients that might have not benefited at all just because they got their vitamin D from exogenous sources or from supplementation and not from diet or sun exposure? That was one of my first sort of, what does that mean? Do you have any input? Based on this, we can't make a solid recommendation about what to do to change someone's vitamin D level. One thing that I found frustrating about the trials of vitamin D supplementation, they're pretty much null for many different outcomes. So it's not clear to me that the supplementation, if we go that route, will be the ticket because it hasn't been in many studies.
So it's something I've been struggling with, how we should encourage people to change their vitamin D. It seems the supplementation might not be the only component. Yeah, sure. And it's part of that same conceptual struggle that I have is when we're doing an emulated target trial, and that's what I've been taught by my colleagues here, a crucial time, a crucial determination is when is time zero? So when is time zero for you guys? When was this behavioral modification, when did it happen? And therefore we should start counting from, because you're essentially assessing potentially the effect of a modification, you're not assessing from when that modification started happening until you reach those levels of deficient, insufficient, or, you know, yeah, sufficient.
So we, go ahead. We consider the time zero is when they came to start fertility treatment. And so these behavioral modifications would need to start potentially sometime before that.
Is there any motor bias there? No, no. What I randomize is, is any of the possible ways that you get to that level. And so there are many different ways that you can get to that level and you randomize to be at that level or not, not at that level.
The other thing I wanted to add to your question, Chris, is that not even randomized trials answer all the possible questions. So from the data gap we had before, which was we didn't know if vitamin levels will affect reproduction, we got an answer to say it does. We still don't know the source.
And so this could lead us to say, well, maybe we need to do randomized trials on all the possible sources or different trials on the different sources. Or our next study, going back to what Micah was asking, will need to measure the different sources and supplement the type of questions that we're asking, such as, you know, diet, time on the sun, supplements, to get what were the factors that led you to get to that point. And so... So, Kurt, you have an example of a similar methodologic study you did that maybe is a easier question.
Do you want to tell us about that or do you have comments on the discussion we just had? Yeah, let me see if I can focus it. So we very quickly got to the, what's the recommendation, right? But I think we glossed over, like, what is the emulated trial here? Like, the beauty of this is finding a sample that you can look at. And remember, randomization or a randomized clinical trial is getting people randomly allocated to groups so that they're relatively equal.
The randomization put them so that they have the same circumstances, and now you can follow them going forward with a minimum of bias and a minimum of confounding. That's the goal here. This trial wasn't give them something, see how it works.
It was basically let me get two groups of people that are very much the same except for only their vitamin D levels preconceptionally. Now we can follow them forward over time and see if they had a higher chance of having a live birth, and they did. So it's kind of a strange methodology, but I hope you all understand what we're talking about.
The goal here is to use observational data to recreate a trial. And you made some allusions. We can show some slides.
We did a trial that I think might be conceptually easier to understand because it had an intervention, but I want to focus on what made it the same as this trial and what made it emulated. So a couple of years ago, we did an emulated trial of a randomized trial of fresh versus frozen. You can go back just to the first one.
So the fresh embryo transfer or frozen embryo transfer, and that's an intervention which is going to be better. And if you did a randomized trial, you theoretically could come up with an isolated group of very otherwise equal patients and find out whether the intervention isolated worked. So there are some randomized trials, and there was one done in China.
And one of the big problems with randomized trials, and you all have said this in your sleep, is, oh, but that was a different population. That wasn't my group. They treated it differently, that this particular trial was in Chinese women, but they're not American women.
They don't do IVF the same. So the generalizability is difficult. So we said, well, why don't we use the SART cores database to emulate the trial in the United States and see if we could pretend to randomize people to a fresh or a frozen trial.
Now, we use the CDC data set, not the SART data set. And by the way, I want to call out Dmitry Kissin, who won an award earlier today for his great work in all of this. But the problem was the SART database is not a randomized trial database.
It's what we do all together. And the biggest confounding factor is something called confounding by indication. I treat one patient different than another for sometimes reasons I'm not even aware of.
They have this condition, therefore they get this treatment. They have this condition, they get another treatment. So we have to take that out of the model.
And to get an emulated trial in this data set, if you go to the next slide, we took the randomized control trial criteria from another trial and we said, let's restrict the population to only the people that are a certain age, had a certain IVF protocol, had a certain number of embryos to freeze. So you're not just taking people that didn't have embryos to freeze. You're taking out hyper simulation risks.
You restricted the age, all these wonderful parameters of a randomized trial. And then you could whittle down this huge SART database to only people that met those conditions. And these boxes will show you how much we whittled it down.
I know you probably can't read the slides to say that we started with hundreds of thousands of people, but by applying, it's only their first transfer. It's only age between 20 and 35. It's only an elective track.
They didn't have pre-implantation genetic testing. They had at least four embryos to freeze, so they had a choice. You whittled the population from hundreds of thousands to hundreds.
And now you've got people that would have entered your randomized clinical trial. Now you can go forward. So the next slide shows, I want to show you the difference.
And this is what I highlight in every talk about observational studies versus randomized. If we just said, take the first transfer, everybody in the SART cores database, it would have showed the fresh embryo transfer was 45 percent. I'm sorry, the frozen embryo transfer was 45 percent better.
That was because of confounding. That was because of bias that we didn't understand in the trial. And we whittled it down to only young people.
Wait a minute. The odds ratio was getting smaller. When we whittled it down to only their first transfer in the elective one, it whittled down even more.
When we whittled it down to right age, right protocol, right indications, and you had four embryos to freeze, there was practically a null effect. So the effect that we saw in the database, if you looked at it as an observational, let me just compare the patients, was very different than if you applied an emulated clinical trial to remove all that extraneous bias confounded by indication, and you got a very different answer. So that's what we think that's what an emulated trial is doing for us.
It's not just compare two groups in observational data and try to control for confounding. It's the act of putting a trial on top of it that removes some of that confounding and bias that you might not even be in there. I'll leave it there.
There's a couple of slides I can get in more detail if we need to, but that was the concept I wanted to get across to everybody, and we go back to our vitamin D trial. How did you do that in the vitamin D trial? Can you get it to whittle down to only the people that were eligible as opposed to the outliers or the confounders or things like that? I think one of the things that was slightly different with the vitamin D paper is that we used data from a clinical trial that was a very strict preconception cohort. So the cohort by definition already had a pretty strict inclusion criteria, but for example, I'm doing a few other projects where I'm using insurance claims data to emulate target trials, and in that case, I need to recreate the exclusion criterias in this cohort of a general population.
And so I think when you have other trials to base it off of, like how you were talking about in your data set, you applied the criteria from the randomized control trial in China that you were trying to mimic in SART data. You had that baseline to be able to compare it with that, and I think when you have a cohort that is more general, like SART or EHR data or insurance claims, you need to be really intentional about who you are including in your cohort. You have to make sure that they're eligible for the specific treatments of interest, and I think that requires more data manipulation, which I think really is just a fundamental part of target trial emulation.
It's more about the setting up of the data and how you actually frame the research question and who's eligible and what time you're starting to follow people versus like using all these sophisticated methods in the analyses. I think the bulk of the work in target trial emulation is the setting up of the data and setting up of your question, and also one point that's kind of based off this that I wanted to point out is when I was first learning about target trials, one professor that I had, she stated that a target trial is not necessarily the ideal randomized control trial that you would like to conduct. It is a realistic trial that you would like to conduct, and sometimes it has to be an iterative process from a trial that you would like to do compared to the data set that you have available and what's actually feasible within that, and so it requires kind of a going back and forth of revisiting the target trial that you would like to conduct and how you're actually able to emulate that in observational data, and so perhaps we would have liked to conduct a trial on vitamin D that would have specified the specific intervention, but with the data available, we have to go back and iterate our target trial to be able to align with something that we're actually able to do, and I think that clarity and defining the research question, as Enrique's been talking about, is one of the most fundamental components of target trial emulation and how we work to better analyze observational data to give us answers that are closer to what the truth is.
I think for the people in the room that this is a newer concept to, the most one thing that was kind of highlighted here was you can do a target trial emulation both with an intervention and observational, so the vitamin D versus Dr. Barnhart's study, you know, is there an intervention in the same, in a setup, so I think that was a concept I think that was highlighted here as well that you can do it either way, and the other thing is always look at the supplementary attachments to studies because the setup for the vitamin D study is outlined very beautifully in the supplement work that's attached to the study, so to kind of get a better understanding of all that work that we were just talking about went into setup is all highlighted nicely in the supplement. So if people in the audience would like to ask questions for our panel, we will open up the mics as we continue the conversation here for the next 20 minutes or so, so if you have questions, just come up to the microphones and we'll ask that. While we're waiting for that, I heard a little bit of interplay between power and precision.
Can you just expound on that a little bit more? We were saying these target, one of the trials you did, Kurt, went from 100,000 patients down to fewer, but you got more precision and your answer changed. Is there a trade-off, and it sounds like that trade-off is maybe a good trade-off in this case? Well, many times you have an observational data, like especially like an administrative data that's huge, and you have way more power than you need, so I think it's more important to whittle down the population to the true population you're trying to look at, but and you're still going to end up with a lot of people that have power and precision, so I think it's the population that matters, not the size of your database, because then a lot of the people in your database are not going to inform your question because they're on the extremes or they have bias for those reasons. Kurt, do you think that by using, you know, the TARQR trial framework, does this allow you to potentially have better accuracy as well? Because you know with real-world evidence sometimes, you know, precision sort of gets too high, but accuracy is not necessarily there because there's too much clinical or epidemiological noise.
So do you feel just because you're applying the same inclusion and exclusion criteria, you can actually get much closer to the true effect size of the population? I see we have some questions, but I want to add on to that the other part. So even though we got the population whittled down, and I showed you that that was the majority of the confounding, or as I called confounding by indication, we were treating patients that had better prognosis and freezing their embryos, we were doing that innately. That's what was giving us the wrong answer, but my point is even though that we make an emulated trial, it's not randomized, so there's still going to be residual confounding, and that has to be treated as well.
So there's a second part of the emulated trial we haven't talked about yet, which is still the statistical considerations, once you get your population correct. Can I add one more thing to Kurt? And I know you're waiting for a while, I'm sorry. Kurt started to say there was a trial in China, and there was a problem of generalizability.
It wasn't, is the effect real or not? He wanted to see if the trial in China could be replicated in the U.S., and so if I wanted to do a real trial, I will have the same sample size and run a new trial. Nobody will ask why you used 2,000 people instead of 100,000. The population that mimicked that trial in China was around the size that was sufficient to answer the question that he was asking about generalizability, nothing else.
And so he was able to ask, answer that question. We will let him down mimicking that imaginary trial that never happened. Dr. Hanson, you've been waiting.
Yes. Thank you, Carl Hanson, University of Oklahoma. Thanks, this has been a fantastic presentation, and I'm very interested in the strategy.
So two questions, one has to do with the data. So it would seem like using RCT data in many cases would be ideal, because it's prospectively collected frequently, although not always of higher quality. But how do you manage it when you have the original trial had an intervention in it that had a significant impact on the outcome that you're interested in? So is that managed at the randomization stage in some fashion, where you try to have equal numbers of both groups in the two arms of your trial, or do you manage that statistically at the end? And so that's one question.
The second has to do with when you have these really large databases like SART, can you take two different samples from it? So maybe I want to, my first emulated trial is going to be a thousand patients, and I'm going to do another one with a thousand patients using that same database just to maybe validate your findings. Thank you. The second question is easier, I can take that.
I mean, what we did with the SART database, we whittled it down to, the original trial was hundreds, right? And we whittled it down to 40,000 in the group. So we clearly had more power than the randomized trial. But you're asking a very intriguing question, which is, what if I did multiple randomized trials in the same dataset? What if I did purposely trials of a thousand each, but then picked another thousand, and then picked another thousand, and did like a Monte Carlo simulation? That's an intriguing idea.
I don't know if that would get a different answer or a better answer than using the whole dataset, but an interesting comment. But then I'll let these guys answer the question about if you're using a dataset that was a randomized trial, there are some particular issues that you have to work through. To do an emulated trial is not assumption free.
And so nothing comes for free. And so we, and there is not a single answer for every situation. I think it's situation specific.
And so sometimes you use the randomization that originally applied will affect your results and how you think about that. Sometimes it wouldn't. I think on the case of Julia, she, in that case, it was randomized to zinc and folic acid, and that does not affect the levels of vitamin D. So she was relatively safe.
But in other cases, maybe. And so you got to be thinking that maybe it's not a pure randomized trial. It's a block randomized trial.
And so it depends specific in the question that you're asking. Hi, good morning. This is Hector Hernandez from Mexico.
I'm doing my fellowship in reproductive medicine. I have a question for you. Would you recommend using vitamin D as a daily supplement for every type of patient, the ones that are sufficient, insufficient, or deficient? And for how long? Until the patient reaches the pregnancy? Or how long? Who would like to tackle that? Well, based on this, we can't make those recommendations because we were not testing the supplement specifically and just looking at their vitamin D levels.
So unfortunately, I don't have good data to back up. So there's two different questions. So the trial that they eloquently presented was demonstrated, in my mind, that people with low levels have worse pregnancy outcomes than people with high levels.
And it wasn't just an observational question with lots of noise. It was very well done. The second question is, what do I do about that? And that trial hasn't been done yet.
It's intuitive to say I should supplement them, but we really don't know if that would work. Yeah. Thank you.
Chris, Julia. I just wanted to go back to the question about emulating multiple trials within the same data set. I think that was a really interesting question.
And I think what would potentially be better would be if you had two data sets that had similar variables of interest and the same potential in the two data sets. And then you were able to emulate a trial in each of those and compare those results rather than stratifying within one data set that you have, because you're likely to get the same answer if you're using the same data set. But I think it would be really interesting if you were attempting to emulate the same trial in different data sets.
I think if you got the same answer, it would increase the robustness of your findings and likely be more evidence of generalizability. But I think if you're doing it within the same data set, even if you got the same answer, I don't know if that would tell you much more. I think it would be more beneficial to use the biggest sample size that you could in that data set and then try and also apply it in another data set and see how the results compare.
We'll do one more question and then I want to get to this slide so you can just teach us a little bit about what Kurt was saying about how you actually set up the trial. Please. Oh, now it's coming up.
I'm from UMass. And so very interesting talk. I have some questions for the epidemiologists among the panel.
So for the really large data sets for everyday readers of the journals, and let's say one paper is presenting emulated clinical trials methodology and other papers using propensity score matching, propensity score weighing, which we know also take advantage of large sample size in general. How would you advise readers like differences, nuance, and pitfalls of each approach? So I think you need both. I think what we're advocating here very clearly is that the, and I want to go into the setting up the study, which is important.
So the emulation of the trial, setting it up is one aspect of it, but then it's still not a randomized trial. So you still need to adjust for confounding. So for example, in the paper that I presented, we first set up the study.
I showed you it took away a lot of the confounding, but then we applied logistic regression to see if we could control even more. And we also provide propensity score to see if we could do any more. And we also applied something called a propensity model for predicting future pregnancy.
Three different ways of controlling residual confounding was also necessary. Luckily they gave us similar answers, but just because you restrict the population doesn't take away the confounding that's so residual in your population. But I would say that the two steps are better than just doing a large data set with propensity score, because you didn't apply the metrics to your trial.
And that's why I often see large data set based trials where they say, well, we used logistic regression, we used propensity score, but I'm still not convinced that they focused the question enough and there could still be a lot of residual confounding in it. And to add what Kurt was saying, an emulated trial is you have to emulate all aspects of the trial. One part of emulating the trial is emulating randomization.
And so the statistical methods that we have, such as propensity score, inverse probability weighting, or regular adjustment, are a way to try to emulate the balance that exists between the treatment and the placebo in a randomized trial. So this idea of emulating that piece, in essence, is try to emulate all the pieces of a randomized trial. And so that's one of those pieces.
So we have a few minutes left, and so I just wanted to go to the primary team. Teach us what this DAG is, and how did this conceptually help you set up what we've been talking about for emulating the trial? So one of the first things that we did before actually beginning the analysis is creating a DAG, or a Directed Acyclic Graph. And the goal of a DAG is to display your assumptions about how different variables are connected.
So you'll see, so this is a little bit of a complex DAG because that's, I had mentioned earlier, we envision three different trials changing the unit of randomization. So I've displayed all of the variables that I thought were related to a female preconception vitamin D and the outcomes of live birth and pregnancy loss, as well as doing the same for the male partner. And then I thought of every single variable that might be associated with both the exposure of interest, so preconception vitamin D, as well as the outcome of interest, regardless of if we had that data available in our data set.
And so you'll notice that previous vitamin D is all the way on the left in italics. And the reason I had it in italics is because that's not a variable that we have measured in the data set, but it is associated with both the exposure and the outcome. And so this is just laying out all of your assumptions of how you think these variables are related.
And one thing that is particularly crucial in setting up your DAG is temporality. So you'll notice on the top of the figure, it goes from prior to baseline through follow-up, and where the variables are placed on this, you can envision it as an x-axis, is when they're actually measured. So each thing is happening after the other.
And something we want to be really conscientious about is not adjusting for post-baseline variables, because that's not something that we would do in a randomized trial, but it's unfortunately often something that's done in observational studies. So for instance, a type of fertility treatment in this cohort was occurring after preconception vitamin D was measured. And because of that, if we were to adjust for type of fertility treatment, that's actually on the pathway to the outcome of interest and might induce bias.
So instead, we decided to not adjust for that variable and outline our assumptions through writing this DAG and creating it and thinking through the temporality of variables. It led us to that conclusion that adjusting for type of fertility treatment might actually induce bias rather than remove it. Yeah, so the idea here on the DAG is, again, try to emulate randomization.
So I'm putting down in this type of graph all the possible variables and how they are related to each other. And therefore, there is some methodology involved on this, not complicated, that will allow me to know if I still have confounding or not. The key piece here is that based on all this displaying graph, I can tell if my model will have confounding and I will not be able to emulate randomization.
So that's another of these key discoveries that happened that allows us to get closer from a randomized trial to a clinical trial. One little thing that I will say is that this methodology will never overcome one of the strengths that has a randomized trial, which is I cannot control for things that I don't know. Meaning a randomized trial will balance between things that I measure and I know and things that I didn't measure, if the randomization worked well and was large enough.
In an observational study or in an emulated trial, I could not control on things I never knew about. And so when Kurt was talking about residual confounding, that's the key limitation on here of things that may exist in nature that I'm not aware at all. Enrique, is there a way that you would advise whoever is considering doing an emulated trial to test or potentially assess measure residual bias? Well, for things that I don't know, it's hard to do.
But I could say, well, what if my results, there isn't something that I don't know that exists out there that will be associated when with exposure and the outcome, and I can simulate how much of the effect will change. It's called the sensitivity analysis. But beyond that, I couldn't do more than what we know at this point.
Do you see the future, especially in a field like ours, where it can be very complex to do randomized controlled trials, do you see a future where people are building randomized controlled trials and then in anticipation of that study, building the DAGs in advance of a bunch of emulated trials that you can build around that randomized controlled trial? Do you see that that's kind of the way our field is moving? I think that's the only way to do it, because a randomized trial will answer one or two questions in randomization. But we all know we have multiple people asking, well, what about this? And what about that? And what about the combination of this? And what if you did this or that? And those additional questions are best suited for an emulated trial. So for example, what would have happened if nobody missed a single treatment? Or what would have happened if you changed the medication? Or all kinds of things that happen in reality in a trial could be answered in an emulated trial.
So I do think that that will supplement trials in an incredible way. Kurt, did you have anything you wanted to add on the DAGs? I know we sometimes talk about these on the podcast when we're looking at other studies that have used this stuff at all. Yeah, I think DAGs are underutilized partly because they're hard to understand.
You look at this, and it looks very complicated. But it actually is really not. You're trying to quantify what will affect your answer.
And then what am I able to control for? What am I not able to control for? And the biggest mistake people make when they submit papers to our journal is they over-control. And they're controlling for things that were actually post-randomization in your emulated trial, which you shouldn't do. So this lays it out for you, and I think everybody should take the time to learn it a little bit, because it really sets the stage.
And if you just take a data set and just say, here's my exposure and outcome, and start torturing it, you're going to make lots of assumptions that are incorrect, which potentially can lead to the wrong answer. Which leads to my final point that I can start throwing to a conclusion here is, are emulated trials going to give you the truth? No, we don't know that. Every trial is an approximation of the truth.
We like a randomized trial because it controls everything for us. But we don't know if emulated trials are the truth. It's just the answer we got.
If there's, again, lots of other reasons, if we did it again or a different data set, we might get a different answer. But that's the goal of research, is to get you the best approximation you can, given the ability of the data in front of you. Christos, final summary thoughts on emulated trials in this discussion? No.
I think it's a fantastic methodology and a framework to get much higher quality observational studies or non-interventional studies, and to have a much better approximation of the truth. And to many people who are purists, and I happen to be one of them, and say nothing is as good as a randomized controlled trial, what I always say is that there's something much better in the absence of a randomized controlled trial. You should always try to get as close to the truth as possible, and an emulated trial certainly seems to be the right way.
So I would invite anyone who has observational data of sufficient quality to start thinking of utilizing this framework, because we have so many questions that we are not going to have randomized data for. So we might as well utilize the actual non-interventional data we have. From the primary author team, your final summary take-home points, what should we have learned from this discussion today? I would just say be open to it.
I feel like it can be intimidating or challenging to start doing it, but it's something that you just take step by step. So with drawing the DAG, you start with your exposure, your outcome, and you build it from there. And with the trial, emulated trial, you start with each component and walk through it, and it's very doable.
So I would just encourage you to give it a try. I echo that. And recently, there were these guidelines that were published in JAMA and BMJ on what should actually be reported in a target trial emulation, and they go through step-by-step of everything that you should consider, both in how you conduct your target trial as well as what you're reporting.
And I think by including all of those assumptions and your decisions in your paper, whether it be in supplemental tables or supplemental appendices or figures, I think is really important for the clarity and transparency of research. Enrique? No, I will echo what Sonny and Julia said. I think this is a real opportunity.
There has been a big change in the landscape, and I think there are going to be even more and more advances in the field that will allow us to answer questions that we weren't sure we can, so I'm excited about what's coming. Dr. Wild, I'm sorry, we're out of time for questions at this point. I apologize.
And Dr. Barnhart, final take-homes, thoughts from you on this. I know you've said in the past that SART studies, we don't always do them the best way, and as a SART president, I'd like to know the best way we can look at these big data sets. What should we learn from the SART level and big data with this? Yeah, to be specific in our field, first of all, the BMJ and JAMA are spectacular resources for emulated trials, but there is one in that describes how to do emulated trials in our field, so look that one up as well.
But I would think that we do have data sets in our field that are amenable to this. The SART database is one, and I would much prefer a rigorous trial that was set out with parameters. This is who I'm trying to, this is the question I'm asking, this is the population I'm answering it in, this is all the specificity of it, and then do your control than just saying I used the whole data set and here's my answer, because I think that that could be wrong, and it's, what's the right way to say it? We've moved beyond that.
We need more sophisticated analysis than just a logistic regression, is I guess what I'm trying to say. Pietro, take us home. I'd like to thank our panelists and Dr. Eubanks, our first FNS editorial fellow for the wonderful presentation and discussion today.
If you're leaving here and you're interested in staying connected with fertility and sterility, this QR code here will allow you to be involved in the peer review process with us, but if you're not tired of my voice, Micah's voice, and Kurt's voice, there's always the FNS family of podcasts. We just did launch a brand new episode in the FNS podcast, FNS Roundtable, which is a new episode that dives into the views and reviews and the fertile battle section of the article, so trying to provide more content for the people who are fans of the podcast. And if you enjoy the journal club format, there is one more opportunity before the end of the year in November.
We're going to be discussing outcomes in SART clinics versus non-SART clinics in November, led by Dr. Steve Spandorfer from Cornell. That's all the time we have for today. Thank you all for being here during your lunch break, and we look forward to seeing you around the meeting.