India’s vast linguistic, cultural, and social diversity has long been evident, but only now are scientists beginning to uncover the genetic richness underpinning it. In a new study in Cell, researchers reported sequencing the genomes of 2,762 Indians from 23 States and Union Territories. The data captured variation across caste, tribal groups, language, geography, and rural-to-urban settings, offering the most comprehensive genomic map of India to date.
The findings are striking. The study reaffirmed the three primary sources of Indian ancestry and explored how this layered history, along with entrenched social practices, continues to shape health and disease risk today.
One migration, many mixtures
Using mutations as genetic clocks, the study confirmed that present-day Indians descend primarily from a single out-of-Africa migration around 50,000 years ago. Although archaeology suggests earlier human presence in the subcontinent, “those populations may not have survived or left lasting genetic traces,” said Elise Kerdoncuff, the study’s first author.
The researchers modelled Indian ancestry as a blend of three ancient populations: indigenous hunter-gatherers known as Ancient Ancestral South Indians; Iranian-related Neolithic farmers, best represented by fourth millennium BC herders from Sarazm in present-day Tajikistan; and Eurasian Steppe pastoralists, who arrived around 2000 BC and are associated with the spread of Indo-European languages.
While most Indians fall along a genetic spectrum reflecting different proportions of this admixture, individuals from East and Northeast India, and a subset from Central India, carry additional East Asian-related ancestry, with levels reaching up to 5% in West Bengal. This likely entered around 520 AD, after the Gupta Empire’s decline or with an earlier spread of rice farming.
Legacy of endogamy, kinship
India’s population structure reflects long-standing practices of marriage within communities. This has produced strong founder effects, where a small ancestral gene pool gets amplified over generations. As a result, Indians, especially in South India, have 2-9x more homozygosity than Europeans or East Asians, making them more likely to inherit the same version of a gene from both parents.
Every individual in the study had at least one genetic relative, indicating levels of relatedness far exceeding those seen elsewhere. This tight-knit structure may make recessive disorders caused by inheriting faulty copies of the same gene from both parents more common than currently recognised.
One example is a pathogenic BCHE variant linked to severe anaesthetic reactions found enriched in Telangana.
Like all non-Africans, Indians carry traces of ancient interbreeding with other hominins, with Neanderthal or Denisovan segments covering up to 1.5% of the genome in some Indians. They also have the widest variety of Neanderthal segments. “Multiple waves of migration, followed by caste-based endogamy, likely fixed archaic segments within specific groups, contributing to this high diversity,” Lomous Kumar, population geneticist at the Centre for Anthropobiology and Genomics of Toulouse, France, said.
Neanderthal-derived sequences are enriched in immune system genes. A region on chromosome 3 (linked to severe COVID-19) is especially common in East and Northeast India. Denisovan variants appear in immune-related pathways and regions such as the MHC, a key genomic region involved in detecting and fighting infections. “Enrichments in TRIM and BTNL2, involved in mounting immune responses to viruses, suggests that some variants were retained because they conferred an adaptive advantage,” Dr. Kerdoncuff said. “As humans moved into new environments, inheriting these variations from archaic populations likely helped them adapt to unfamiliar pathogens.”
Only a part of the story
The researchers uncovered 2.6 crore undocumented genetic variants. Of these, over 1.6 lakh were protein-altering variants absent from global databases and about 7% were linked to thalassemia, congenital deafness, cystic fibrosis, and metabolic disorders.
“This highlights how neglected Indians are in genomic surveys,” Dr. Kerdoncuff said, “limiting scientific discovery and reducing the accuracy of risk predictions. The promise of precision medicine for underrepresented populations ultimately suffers.” Dr. Kumar added: “Within India as well, population-specific rare and unique variants continue to make the scenario complex,” emphasising localised efforts are also imperative.
To help close this gap, Dr. Kerdoncuff said, the team is expanding the study to include more genetically isolated communities. They’re also studying proteins and metabolism to better understand how genes influence health outcomes. In parallel, they’re developing new tools to trace the origins of disease-linked genes in Indian populations.
To make medicine truly inclusive, India’s vast genetic diversity must be central to global research and matched by deeper, community-level efforts at home.
Anirban Mukhopadhyay is a geneticist by training and science communicator from Delhi.