Suddenly, Trait-Based Embryo Selection

3 months ago 1

[see footnote 4 for conflicts of interest]

In 2021, Genomic Prediction announced the first polygenically selected baby.

When a couple uses IVF, they may get as many as ten embryos. If they only want one child, which one do they implant? In the early days, doctors would just eyeball them and choose whichever looked healthiest. Later, they started testing for some of the most severe and easiest-to-detect genetic disorders like Down Syndrome and cystic fibrosis. The final step was polygenic selection - genotyping each embryo and implanting the one with the best genes overall.

Best in what sense? Genomic Prediction claimed the ability to forecast health outcomes from diabetes to schizophrenia. For example, although the average person has a 30% chance of getting type II diabetes, if you genetically test five embryos and select the one with the lowest predicted risk, they’ll only have a 20% chance. Since you’re taking the healthiest of many embryos, you should expect a child conceived via this method to be significantly healthier than one born naturally. Polygenic selection straddles the line between disease prevention and human enhancement.

In 2023, Orchid Health entered the field. Unlike Genomic Prediction, which tested only the most important genetic variants, Orchid offers whole genome sequencing, which can detect the de novo mutations involved in autism, developmental disorders, and certain other genetic diseases.

Critics accused GP and Orchid of offering “designer babies”, but this was only true in the weakest sense - customers couldn’t “design” a baby for anything other than slightly lower risk of genetic disease. These companies refused to offer selection on “traits” - the industry term for the really controversial stuff like height, IQ, or eye color. Still, these were trivial extensions of their technology, and everybody knew it was just a matter of time before someone took the plunge.

Last month, a startup called Nucleus took the plunge. They had previously offered 23andMe style genetic tests for adults. Now they announced a partnership with Genomic Prediction focusing on embryos. Although GP would continue to only test for health outcomes, you could forward the raw data from GP to Nucleus, and Nucleus would predict extra traits, including height, BMI, eye color, hair color, ADHD, IQ, and even handedness.

And this week, Herasight entered the space with the most impressive disease risk scores yet, an IQ predictor worth 6-9 extra points, and a series of challenges to competitors, whom they call out for insufficient scientific rigor. Their most scathing attack is on Nucleus itself, accusing its predictions of being misleading and unreliable.

Let’s start with the science, then move on to the companies and see if we can litigate their dispute.

Polygenic embryo screening is a natural extension of two well-validated technologies: genetic testing of embryos, and polygenic prediction of traits in adults.

Genetic testing of embryos has been done for decades, usually to detect chromosomal abnormalities like Down Syndrome or simple single-gene disorders like cystic fibrosis. It’s challenging - you need to take a very small number of cells (often only 5-10) from a tiny proto-placenta that may not have many cells to spare, and extract a readable amount of genetic material from this limited sample - but there are known solutions that mostly work.

But most traits are polygenic, requiring information about thousands or tens of thousands of genes to predict. These are too complicated to understand fully at current levels of technology, but some studies have chipped away at the problem and gotten a partial understanding. Often this looks like being able to predict a few percent of the variance in a trait, and determine whether someone’s genetic risk is slightly higher or lower than average.

Polygenic prediction of traits in adults is still young and full of hidden pitfalls. Last month, we discussed how some early studies unknowingly conflated direct genetic effects and various confounders - for example, they tended to pick up on genes associated with well-off ethnic groups or families who had good health outcomes for social reasons. Pinpointing the direct component requires an additional step where researchers validate their algorithms within families (for example, on pairs of siblings where one has a higher polygenic score than the other) to see how much predictive power remains. This is especially important for embryo selection companies, whose entire value proposition depends on comparing two genomes from the same family.

How have they done? It depends on the number of embryos they have to work with; the more embryos, the better you can do by selecting the best.

Here is a table of different companies’ reported risk reductions, slightly adjusted for different reporting conventions but otherwise taking all claims at face value (we’ll talk about how wise that is later).

Some people might genuinely want to select on a single condition. For example, people with a strong family history of schizophrenia might want to minimize the chance of their children getting the disease; for these people, reducing schizophrenia risk by 58% (while keeping everything else constant) sounds pretty good.

Everyone else probably wants a generically healthy embryo with low risk of all conditions. Exactly how this works depends on the customer’s own values - would they prefer an embryo with lower cancer risk to one who will have fewer heart attacks? - and the exact benefits will depend on how parents make that decision. Genomic Prediction and Herasight try to help by providing semi-objective measures of which embryo is overall healthiest according to different conditions’ effects on longevity and patient-rated quality of life.

For Genomic Prediction, that’s the “embryo health score” If you selected the single highest-health-score embryo from a set of five, here’s how they’d do:

For Herasight, it’s a “polygenic longevity index”. They don’t give exact risk reduction numbers for each disease, saying that it depends too much on a couple’s specific family history, but say that it most people gain 1-4 years of healthy life (when I test it on a set of twenty embryos, the the healthiest gets an extra 1.66 years).

How much would you pay to give your children an extra 1-4 years of healthy life?

This is no longer a hypothetical question. Here are the costs of the companies in this space:

Is it worth it?

If:

The claimed risk reductions are accurate
You value your kids’ health about as much as your own
You have a low time discount rate.
You’re well-off enough that these aren’t impossible sums of money for you
You’re okay using expected utility calculations where a 50% chance of preventing X is half as good as fully preventing X.

…then I’ll go out on a limb and say yeah, obviously it’s worth it.

Consider e.g. Genomic Prediction, which costs $3,250 for five embryos and claims to lower absolute risk of Type 2 diabetes by 12%. That implies that not getting Type 2 diabetes is worth $27,000. Ask anybody dealing with regular insulin injections (let alone limb amputations) whether it would be worth $27,000 to wave a magic wand and not have Type 2 diabetes! It’s not a hard question! And that’s just one of a dozen conditions you can lower the risk for! Other ones, like not getting breast cancer, might be so valuable that it’s hard to even attach numbers!

(but maybe the low time discount rate is a mistake? Suppose you invest the $3,250 in an index fund that makes 7% over inflation, then give it to your future child when they turn 45 (average age of type 2 diabetes diagnoses). Now it’s worth $75,000. Is this the “true” cost of the intervention? Does it matter that this counterfactual is fake and most people don’t do this?)

What about IQ? Six extra IQ points (Herasight’s estimate with five embryos) is about a quarter of the gap between the average person and the average Ivy League student. The benefits of intelligence are hard to quantify, but it’s been shown to have probably-causal positive effects on income, mortality, and achievement. Probably the income effects alone make up for the cost of intervention - again assuming total parent-child altruism and low discount rate.

So if we accept all of these claims and assumptions, the choice seems obvious. It’s probably even obvious for governments to pay for all citizens to get these, given how much they’d save on health care costs.

Critics have raised both scientific and ethical objections to polygenic embryo screening.

Polygenic embryo selection has been condemned by various bodies including the Society For Psychiatric Genetics, the European Society of Human Genetics, the Behavioral Genetics Society. Their statements are . . . not good. They tend towards vague language about how people are more than just their genes, or how no genetic test can be perfect, or how embryo screening is not exactly the same thing as some other form of screening which has a longer history and more proponents. “Although in general higher scores mean you are more likely to have a condition, many healthy people will have high scores; others might develop the condition even with a low score”, says the Society for Psychiatric Genetics, as if they have just blown the lid off some dastardly conspiracy. “Screening embryos for psychiatric conditions may increase stigma surrounding these diagnoses”, they continue - an objection which, taken seriously, could be used to ban every form of medical treatment. We will mostly ignore these people and try to imagine the objections that mildly competent critics might raise, some of which will coincidentally overlap with the content of the non-hypothetical statements.

Scientific Objection: Efficacy

Are we sure this works at all?

A typical polygenic score is created by collecting thousands or millions of adult genomes, then matching genetic information with surveys about who has the trait/condition of interest. Reputable studies then test these scores on holdout samples - adults who were not used to make the score - to see if they still accurately predict who has the trait/condition. Polygenic embryo selection depends on an assumption that the scores which work in these kinds of retrospective tests will also work prospectively on embryos. This assumption hasn’t been formally proven in studies (which would require years to decades to conduct), but seems common-sensical.

The strongest challenge to the application of polygenic scores for embryo selection comes from a recent body of research showing that most scores combine causal genetic effects with population stratification, and therefore can be expected to lose much of their predictive power when comparing two members of the same family (e.g. two embryos from the same couple). There is increasing agreement in the field that unless scores are validated within families, headline results like “decreases risk of X by Y%” will be large overestimates.

When I talked to company representatives, they all said that they took accuracy extremely seriously and had various white papers and journal articles where anyone could double-check their methodology. But I attended an industry conference a few months ago, and the gossip level was comparable to a high school cafeteria (minus the sex rumors - most of the attendees were having their own kids via IVF). Everyone had some story about someone being careless or fudging their numbers.

Some of the conflicts broke out into the open on Wednesday, when Herasight left stealth and published a white paper and associated blog post. They criticize Genomic Prediction for reporting between-family rather than within family results, and Orchid for smuggling a term for age into their Alzheimer’s predictor (unsurprisingly, this makes it work better). We’ll get to their accusations against Nucleus below. Note that this was recent enough that competitors haven’t had time to respond or to air their own criticisms of Herasight; if this happens, I’ll try to keep you updated.

Maybe this is cope, but my optimistic perspective is that this bounds the damage. This obviously isn’t a field capable of maintaining a conspiracy of silence. But aside from the Nucleus allegations, the complaints aren’t existential. Maybe some numbers are too high, maybe some predictors are slightly rigged. But the more we learn about these admittedly concerning problems, the more we can hope that we’d have heard about it if nothing worked at all.

Overall my strongest opinion on the scientific criticisms is:

Authorities on all sides have cited Alex Young as an authority on how polygenic scores can be confounded or misleading.
Last week Alex Young revealed that he had been working with Herasight while it was in stealth mode, and endorses their research.
LOL.
Probably that means Herasight’s products are okay.
That serves as proof-of-concept that this technology can work, and means other companies’ claims are at least plausible.

Scientific Objections: Antagonistic Pleiotropy

This is a fancy term for “sometimes genes that are good in one way are bad in other ways”. For example, there is a gene that decreases the risk of lung cancer, but increases the risk of leukemia. If you selected against lung cancer, you might give your child higher leukemia risk. Several of the professional societies raise this concern, and Sasha Gusev gives several examples here, including a correlation between education/IQ and anorexia.

When I think about these concerns, I consider the following thought experiment: suppose that I had a natural, unselected child, and that child became high school valedictorian and got into Harvard. Would my first reaction be “Oh no! This slightly raises her risk of anorexia!”? If not, why should this be our reaction to artificially increasing IQ? Genetic selection isn’t doing some different, magical thing. It’s just picking from within the natural IQ/anorexia variation. If you would be happy to have higher IQ (or lower breast cancer risk, or lower schizophrenia risk) naturally, you should be happy to get it through selection too.

(Objection one: suppose that the genetic component of IQ is net negative, but the environmental component is net positive to an even greater degree. Then IQ itself might be net positive - so you could still celebrate your valedictorian child - but since the genetic component alone is bad you wouldn’t want to select for it. I have never heard anyone seriously claim this, most studies suggest that genetic components of good things are good in the expected ways, and most critics don’t get this far. I mention it for the sake of completeness only.)

(Objection two: is the example above just saying that I value IQ more than non-anorexia? If so, couldn’t I give an alternate example of learning that my child isn’t anorexic, celebrating this seemingly-obviously-good fact, but actually this means they have lower IQ and based on my stated values I should be sad? I don’t think so. There is no claim that the increased anorexia risk from raising IQ is exactly as bad as the IQ increase is good - for example, you could imagine a world where going from moron to supergenius only raises anorexia risk 0.0001%. More generally - although not rigorously - selecting for X should usually increase X more than it increases tangentially-correlated construct Y. So selecting for IQ should be net positive, even though it might slightly increase anorexia risk, and selecting for anorexia should be net negative, even though it might slightly increase IQ. I think this is the intuition that drives parents to be happy both when they learn that their child is smart, and when they learn their child doesn’t have anorexia - not just an intuition that one trait matters more than the other)

But also, here’s the table of correlated genetic risks for psychiatric disorders:

…where blue means that lowering the risk of one disease also lowers the risk of the other, and red means the opposite (as in the IQ - anorexia example above).

Here’s the same table for other conditions, courtesy of Genomic Prediction (except I flipped the colors from the original, to match the one above):

Aside from two bright orange squares (gallstones vs. hypertension and hypothyroidism - I don’t know what’s up with this and it doesn’t seem to be a widely-appreciated result) we see that most correlations are zero or positive - that is, selecting against one disease selects against another or at worst does nothing. In this ocean of blue, worrying about those few orange squares feels a bit motivated.

Hans Jonas-ism says that no medical intervention may ever cause any harm, no matter how much benefit it produces. By this standard, perhaps slightly raising the risk of gallstones in the process of preventing various cancers and psychoses and other forms of human misery is unacceptable. To anyone with the more normal perspective where something with large benefits and tiny downsides is still pretty good, I don’t think the antagonistic pleiotropy argument carries much weight.

Ethical Objection: Cost

No way around this one: if these products work, they mean that rich people can have healthier/smarter/taller/prettier kids than poor people.

One might object that at least they’re in good company: other products which help rich kids get healthier/smarter/taller/prettier than poor kids include private tutors, gyms, hair salons, health insurance, clothing, books, and food. Is this really the time to declare ourselves against this kind of thing? But maybe we should fight against expanding this already-bloated category. Or maybe there’s something more final about a genetic advantage.

Maybe a stronger argument is that rich people get first crack at every new technology, but poor people usually follow close behind. The first cellphone, in 1982, cost $12,000 in today’s dollars. Now you can get something a thousand times better for $50, and Kenyan pastoralists use cell phones to get weather predictions from the local shaman. The trajectory of genetics has been even more striking: sequencing a single genome cost about $100 million in 2000 and is somewhere around $100 today.

Polygenic embryo selection has the potential to follow a similar path. There are two associated costs - sequencing the embryos, and running the analysis. Sequencing costs are decreasing and may eventually be comparable to the sorts of genetic screening (for e.g. Down Syndrome) that most families get anyway. Analysis costs are mostly the one-time expense of inventing the predictor; we might expect these to follow the same pattern as generic medications, where cutting-edge technology is jealously guarded and expensive, but last decade’s technology has made its way off patent and is cheap-to-free. A few groups have already created free open-source predictors; so far these are much worse than the private companies’ versions, but one of last year’s ACX Grantees is working on a better one.

Also, it would be crazy for any forward-thinking government not to cover this; it could save hundreds of thousands of dollars in future health care expenses. In countries with public health care, this comes directly out of the government treasury; even in the US, it’s covered by Medicare after age 65. The government should be begging people to select embryos.

The most persistent cost barrier is likely to be in vitro fertilization itself, a necessary precursor. In the US, 2-3% of babies are born through IVF. For those kids, this is a no-brainer - even if the cost never comes down, the cheaper products are only a fraction of total IVF expense. What about the other 98%? If those parents feel like they have to get embryo selection (and therefore IVF) to keep up, this could be a significant burden. IVF isn’t fun - it requires pumping a woman full of mind-altering hormones for weeks, extracting eggs in a minor surgery, and then implanting embryos in another minor surgery, all with a decent chance that some step will fail and you’ll have to do it all again. It also costs $15,000 in the US (less in poorer countries), and unlike the genetics, the cost has barely gone down in the past twenty-five years.

Some countries, including Israel, offer free IVF for anybody who wants it. And universal basic IVF is surprisingly popular even in the usually government-phobic United States - Donald Trump made it part of his campaign platform. So there’s a plausible path to embryo selection for everyone who wants it.

But it’s still going to take a while, it will hit different people at different times, and so far there’s no way around the month or two of various miserable medical procedures for women.

Ethical Objection: Personhood

Is it really correct to say that you have reduced someone’s risk of breast cancer by 46%, if what you’ve really done is closer to replacing them with a different person who is 46% less likely to have breast cancer? I cover this one in more depth here.

Ethical Objection: Race

This one is awkward: right now the technology works best for white people.

Most genetic data available for research/commercial use comes from the UK, US, and Europe - areas which are mostly white. Asian biobanks, and those serving US minority communities, have been more reluctant to share data. So we know a lot about the genetics of white people, and only a limited amount about the genetics of anyone else. Companies are suitably embarrassed about this, and researchers in the field are working hard to wring every ounce of information out of the minority data they have. But for now, white people are the clear winner.

Here’s data from Herasight:

A European family with five embryos and no family history can cut their diabetes risk by 47% and an African family 29%, with everyone else in between.

As usual, all companies say that they adjust their scores based on the couple’s genetic ancestry. As usual, Herasight challenges them to publicly release data on exactly how they performed the adjustments and how well they work. All companies say they are working as hard as they can to improve cross-ancestry portability, but that progress will remain limited until governments collect/release better genetic data on non-white populations.

Ethical Objection: Selection

At some point, you’ve got to choose.

Genomic Prediction and Herasight offer scores that aggregate overall health risks.

And this is the best case scenario! Herasight offers predictors for IQ, height and BMI; Nucleus offers those plus eye color and hair color. A parent might encounter a situation where the embryo with their favorite eye color also has the highest cancer and schizophrenia risk, and choose to doom their child to cancer and schizophrenia because they really want pretty eyes.

On average, even if everyone in the world selected for eye color, it wouldn’t raise cancer and schizophrenia risk. No not-deliberately-perverse polygenic selection choice can make your child worse off in expectation. Still, suppose you got cancer, and your mom admitted that she selected you for pretty eyes and didn’t even check the cancer column of the embryo selection report. How would you feel?

And would you feel better or worse than someone whose parents didn’t do embryo selection at all, and spent the money on a Caribbean vacation? What if they selected your brother for everything great, then had you naturally? What if they selected you for IQ, but actually you are very stupid, and you were one of the 20% of cases where a predictor that’s right 80% of the time gets it wrong?

Mark my words, one day there will be entire subfields of therapy dedicated to these issues.

Even as outsiders criticize the whole field, Herasight has launched a full-scale attack on competitor Nucleus.

Herasight’s white paper compares its own predictors (favorably) to those of Orchid and Genomic Prediction…

…but refuses to acknowledge Nucleus at all. In a supplementary note, the authors explain why: they accuse Nucleus of being so bad that it would “not yield a reliable or meaningful addition to our analysis”.

They say Nucleus has inflated the accuracy of their scores. This is most dramatic for a few conditions like ADHD, where the leading published polygenic score is based on 2,300,000 variants but explains only ~1% of variance in the condition. Nucleus’ score is based on 12 variants and (implicitly) claims to explain 3-6%. This doesn’t make sense.

Some of Nucleus’ other scores do use millions of variants. But many of these are 5-10 year old scores downloaded from open-source catalogs, whose accuracy statistics are easily available and far less than Nucleus claims. Here is what Herasight finds when they double-check Nucleus’ numbers:

On their Substack, Herasight also criticizes Nucleus’ monogenic screening product. They point out cases where it fails to properly screen for the conditions it claims. For example, the Nucleus website advertises screening for spinal muscular atrophy:

But on their gene list…

…they don’t screen for SMN, which causes 95% of spinal muscular atrophy cases. They only screen for UBA1, which causes a distinct and much rarer condition called x-linked infantile spinal muscular atrophy. Professional organizations publish guidelines for what genes need to be screened in a screening product, and Nucleus does not appear to be following them.

In further discussion, Herasight continued with exhaustive criticism of essentially everything Nucleus had ever done down to the smallest detail. Nucleus reports list the same baseline disease risk regardless of patient ancestry, but different ancestry groups should have different risks. Nucleus’ physician reports sometimes list lower-than-average risk for patients with positive polygenic scores. Nucleus’ age-based risk tables don’t distinguish between age and cohort effects (is this bad? see footnote). My favorite critique is that Nucleus wrote a blog post criticizing competing company Orchid…

…which included a section on how Orchid is a polygenic selection company, and polygenic selection companies are inherently “sketchy” and “honestly should be illegal”. But Nucleus is also a polygenic selection company! This is like Marlboro attacking Camel on the grounds that cigarettes are addictive and should be banned! Obviously something went wrong here - my guess is AI - and it’s a really bad look, especially when these scientific issues are so hard to litigate, and so many of us will have to go off gestalt impressions of corporate culture.

Nucleus states that they validate their models internally and intend to make their results public soon.

It’s hard not to love this technology. Lots of people (and the aforementioned professional organizations) manage anyway, but it’s hard.

If this were a single-use medical treatment, delivered by a doctor after someone got the relevant condition, it would be one of the biggest advances of the decade - imagine a drug that cures 10 - 40% of breast cancers with no side effects! But in fact, it works for breast cancer, and schizophrenia, and heart attacks, and approximately everything else. The only things comparable are antibiotics and GLP-1RAs.

And then there’s the IQ effects. Even after studying the literature, people have wildly different opinions about the importance of IQ. One of the most important debates is to what degree IQ differences are a cause of poverty, a consequence of poverty, or both. I lean towards both - a country with limited access to schools and medical care will have low average IQ, but as a consequence it probably won’t become the next big semiconductor hub. This technology could close half the IQ gap between poor and middle-income countries, or between middle-income and rich. Or it could give rich countries average IQs that have never been seen before, and let us see what kind of O-ring technologies (and new forms of social cooperation) lie just beyond the frontier.

(this is the nice quantifiable argument in favor of IQ enhancement, but I find myself more convinced by fuzzier things - how much is it worth to be able to enjoy great art and literature? To fully comprehend what we know of nature, and be able to fully appreciate the mystery of the rest? To have a sense of why society works the way it does, instead of feeling like you’re being blown back and forth by institutions you don’t really understand? Amateur psychoanalysts like to say that the only people who care about IQ are those looking for an excuse to boast about how high their own is, but my experience is the opposite: I care about IQ because I bang up against the limits of my own a thousand times a day, and I hate it. I fantasize about ways to make my children smarter than I am for the same reason a dog confined in a tiny crate might fantasize about getting her puppies adopted out to a nice house with a big grassy yard.)

My biggest qualm is that it might not matter. This is such a tiny foothill, flanking such a vast and foreboding range of mountains, that it might be a mistake to care about it at all.

Selecting the best of five or ten embryos is not a very effective way to get the genes you want. There are things in the pipeline that will make this look like Hippocrates draining black bile. By the time the first polygenically selected children are adults, they’ll be old news.

And then there’s AI. The average age at diagnosis for Type II diabetes is 45 years. Will there still be people growing gradually older and getting Type II diabetes and taking insulin injections in 2070? If not, what are we even doing here?

Many people in the transhumanist community are still bullish on this technology. They think - well, there’s still an outside change that something comes up and AGI takes another few decades. If we can enhance humans to be smarter, healthier, and more determined by the time it arrives, maybe we’ll have a better chance. Or maybe, if there’s a positive optimistic vision of a human-based high-tech future, people will be more willing to delay AI in the first place.

I like this argument, but I also think it’s worth stepping back. What’s the point of anything? Why have kids at all in a world that’s changing this fast? Why save for the future? At some point your answer has to be romantic and aesthetic - it’s never been clear whether anything you do matters in any ultimate sense, but you’ve got to act as if it does and hope for the best.

From that perspective, this is the most romantic technology of all. You’re not just giving a better life to your kids. Genes travel from generation to generation; you’re giving a better life your grandkids, your great-grandkids and so on to the point 1.77*log₂(population) generations from now when you are the ancestor of everybody and nobody. Somebody in Macaronesia in 3525 AD will avoid getting breast cancer because of you (if there is still cancer; if there are still breasts).

Some combination of reasonable cost-benefit analysis and romantic/aesthetic commitments makes me want to have children despite the uncertainty, and the same combination made me sign up to use this technology despite the same. More later on how that’s going.

Read Entire Article