The mutagenic forces shaping the genomes of lung cancer in never smokers

3 months ago 2

Data availability

Normal and tumour-paired CRAM files for the WGS data for the individuals in the Sherlock-Lung study and the EAGLE study have been deposited in dbGaP under the accession numbers phs001697.v2.p1 and phs002992.v1.p1, respectively. Detailed access information for the publicly available datasets is available in Supplementary Table 13. Data from the rnaturalearthdata v.1.0.0 (https://github.com/ropensci/rnaturalearthdata) were used to generate maps. Data on passive smoking and PM2.5 estimates of outdoor air pollution are available in Supplementary Table 14. Human reference genome GRCh38 was downloaded from the GATK resources at https://github.com/broadinstitute/gatk/blob/master/src/test/resources/large/Homo_sapiens_assembly38.fasta.gz.

Code availability

The WGS bioinformatics pipelines are available at https://github.com/xtmgah/Sherlock-Lung. The Battenberg SCNA calling algorithm is available at https://github.com/Wedge-lab/battenberg.

References

  1. Sun, S., Schiller, J. H. & Gazdar, A. F. Lung cancer in never smokers—a different disease. Nat. Rev. Cancer 7, 778–790 (2007).

    Article  CAS  PubMed  Google Scholar 

  2. Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74, 229–263 (2024).

    Article  PubMed  Google Scholar 

  3. World Health Organization & International Agency for Research on Cancer. Tobacco Smoke and Involuntary Smoking: IARC Monographs on the Evaluation of Carcinogenic Risks to Humans Vol. 83 (WHO & IARC, 2004).

  4. Turner, M. C. et al. Outdoor air pollution and cancer: an overview of the current evidence and public health recommendations. CA Cancer J. Clin. 70, 460–479 (2020).

    Article  Google Scholar 

  5. Ciabattini, M., Rizzello, E., Lucaroni, F., Palombi, L. & Boffetta, P. Systematic review and meta-analysis of recent high-quality studies on exposure to particulate matter and risk of lung cancer. Environ. Res. 196, 110440 (2021).

    Article  CAS  PubMed  Google Scholar 

  6. Senkin, S. et al. Geographic variation of mutagenic exposures in kidney cancer genomes. Nature 629, 910–918 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Poon, S. L. et al. Genome-wide mutational signatures of aristolochic acid and its application as a screening tool. Sci. Transl. Med. 5, 197ra101 (2013).

    Article  PubMed  Google Scholar 

  8. Hoang, M. L. et al. Mutational signature of aristolochic acid exposure as revealed by whole-exome sequencing. Sci. Transl. Med. 5, 197ra102 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Alexandrov, L. B. et al. Clock-like mutational processes in human somatic cells. Nat. Genet. 47, 1402–1407 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Proctor, R. N. Tobacco and the global lung cancer epidemic. Nat. Rev. Cancer 1, 82–86 (2001).

    Article  CAS  PubMed  Google Scholar 

  12. Siegel, D. A., Fedewa, S. A., Henley, S. J., Pollack, L. A. & Jemal, A. Proportion of never smokers among men and women with lung cancer in 7 US states. JAMA Oncol. 7, 302–304 (2021).

    Article  PubMed  Google Scholar 

  13. Lui, N. S. et al. Sub-solid lung adenocarcinoma in Asian versus Caucasian patients: different biology but similar outcomes. J. Thorac. Dis. 12, 2161–2171 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Gaughan, E. M., Cryer, S. K., Yeap, B. Y., Jackman, D. M. & Costa, D. B. Family history of lung cancer in never smokers with non-small-cell lung cancer and its association with tumors harboring EGFR mutations. Lung Cancer 79, 193–197 (2013).

    Article  PubMed  Google Scholar 

  15. Toh, C. K. et al. Never-smokers with lung cancer: epidemiologic evidence of a distinct disease entity. J. Clin. Oncol. 24, 2245–2251 (2006).

    Article  PubMed  Google Scholar 

  16. Yano, T. et al. Never-smoking nonsmall cell lung cancer as a separate entity: clinicopathologic features and survival. Cancer 113, 1012–1018 (2008).

    Article  PubMed  Google Scholar 

  17. Brennan, P. et al. High cumulative risk of lung cancer death among smokers and nonsmokers in Central and Eastern Europe. Am. J. Epidemiol. 164, 1233–1241 (2006).

    Article  PubMed  Google Scholar 

  18. Wang, P., Sun, S., Lam, S. & Lockwood, W. W. New insights into the biology and development of lung cancer in never smokers—implications for early detection and treatment. J. Transl. Med. 21, 585 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Koh, G., Degasperi, A., Zou, X., Momen, S. & Nik-Zainal, S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat. Rev. Cancer 21, 619–637 (2021).

    Article  CAS  PubMed  Google Scholar 

  20. Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).

    Article  CAS  Google Scholar 

  22. Wang, X. et al. Association between smoking history and tumor mutation burden in advanced non-small cell lung cancer. Cancer Res. 81, 2566–2573 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Lee, J. J. et al. Tracing oncogene rearrangements in the mutational history of lung adenocarcinoma. Cell 177, 1842–1857 (2019).

    Article  CAS  PubMed  Google Scholar 

  24. Zhang, T. et al. Genomic and evolutionary classification of lung cancer in never smokers. Nat. Genet. 53, 1348–1359 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Landi, M. T. et al. Tracing lung cancer risk factors through mutational signatures in never-smokers: the Sherlock-Lung study. Am. J. Epidemiol. 190, 962–976 (2021).

    Article  PubMed  Google Scholar 

  26. Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  PubMed  Google Scholar 

  27. Islam, S. M. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom. 2, 100179 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).

    Article  CAS  PubMed  Google Scholar 

  29. Zou, X. et al. A systematic CRISPR screen defines mutational mechanisms underpinning signatures caused by replication errors and endogenous DNA damage. Nat. Cancer 2, 643–657 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Steele, C. D. et al. Signatures of copy number alterations in human cancer. Nature 606, 984–991 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Everall, A. et al. Comprehensive repertoire of the chromosomal alteration and mutational signatures across 16 cancer types from 10,983 cancer patients. Preprint at medRxiv https://doi.org/10.1101/2023.06.07.23290970 (2023).

  32. Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Huang, K. L. et al. Pathogenic germline variants in 10,389 adult cancers. Cell 173, 355–370 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Nguyen, L., Martens, J. W. M., Van Hoeck, A. & Cuppen, E. Pan-cancer landscape of homologous recombination deficiency. Nat. Commun. 11, 5584 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Zhang, T. et al. Deciphering lung adenocarcinoma evolution and the role of LINE-1 retrotransposition. Preprint at bioRxiv https://doi.org/10.1101/2025.03.14.643063 (2025).

  38. Letouze, E. et al. Mutational signatures reveal the dynamic interplay of risk factors and cellular processes during liver tumorigenesis. Nat. Commun. 8, 1315 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016).

    Article  CAS  PubMed  Google Scholar 

  40. Swanton, C., McGranahan, N., Starrett, G. J. & Harris, R. S. APOBEC enzymes: mutagenic fuel for cancer evolution and heterogeneity. Cancer Discov. 5, 704–712 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Chen, Y.-J. et al. Proteogenomics of non-smoking lung cancer in East Asia delineates molecular signatures of pathogenesis and progression. Cell 182, 226–244 (2020).

    Article  CAS  PubMed  Google Scholar 

  42. Zhang, T. et al. APOBEC affects tumor evolution and age at onset of lung cancer in smokers. Nat. Commun. 16, 4711 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Morton, L. M. et al. Radiation-related genomic profile of papillary thyroid carcinoma after the Chernobyl accident. Science 372, eabg2538 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Lawson, A. R. J. et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020).

    Article  CAS  PubMed  Google Scholar 

  45. Degasperi, A. et al. Substitution mutational signatures in whole-genome-sequenced cancers in the UK population. Science 376, abl9283 (2022).

    Article  Google Scholar 

  46. Otlu, B. et al. Topography of mutational signatures in human cancer. Cell Rep. 42, 112930 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Jamal-Hanjani, M. et al. Tracking the evolution of non-small-cell lung cancer. N. Engl. J. Med. 376, 2109–2121 (2017).

    Article  CAS  PubMed  Google Scholar 

  48. Zhang, T. et al. Distinct genomic landscape of lung adenocarcinoma from household use of smoky coal. Am. J. Respir. Crit. Care Med. 208, 733–736 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. van Donkelaar, A. et al. Monthly global estimates of fine particulate matter and their uncertainty. Environ. Sci. Technol. 55, 15287–15300 (2021).

    Article  PubMed  Google Scholar 

  51. Mochizuki, A. et al. Passive smoking-induced mutagenesis as a promoter of lung carcinogenesis. J. Thorac. Oncol. 19, 984–994 (2024).

    Article  CAS  PubMed  Google Scholar 

  52. Yu, X. J. et al. Characterization of somatic mutations in air pollution-related lung cancer. EBioMedicine 2, 583–590 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Chan, W.-H. et al. Verifying the accuracy of self-reported smoking behavior in female volunteer soldiers. Sci. Rep. 13, 3438 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Landi, M. T. et al. Environment And Genetics in Lung cancer Etiology (EAGLE) study: an integrative population-based case–control study of lung cancer. BMC Public Health 8, 203 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Bergmann, E. A., Chen, B. J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor–normal pairs. Bioinformatics 32, 3196–3198 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Pedersen, B. S. et al. Somalier: rapid relatedness estimation for cancer and germline studies using efficient genome sketches. Genome Med. 12, 62 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Boot, A. et al. In-depth characterization of the cisplatin mutational signature in human cell lines and in esophageal and liver tumors. Genome Res. 28, 654–665 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Imielinski, M. et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150, 1107–1120 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Lee, J. K. et al. Clonal history and genetic predictors of transformation into small-cell carcinomas from lung adenocarcinomas. J. Clin. Oncol. 35, 3065–3074 (2017).

    Article  CAS  PubMed  Google Scholar 

  62. The Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).

    Article  PubMed Central  Google Scholar 

  63. Carrot-Zhang, J. et al. Whole-genome characterization of lung adenocarcinomas lacking the RTK/RAS/RAF pathway. Cell Rep. 34, 108707 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).

    Article  CAS  PubMed  Google Scholar 

  65. Sadedin, S. P. & Oshlack, A. Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data. Genome Biol. 20, 78 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).

    Article  CAS  PubMed  Google Scholar 

  68. Freed, D., Pan, R. & Aldana, R. TNscope: accurate detection of somatic mutations with haplotype-based variant candidate detection and machine learning filtering. Preprint at bioRxiv https://doi.org/10.1101/250647 (2018).

  69. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015).

    Article  PubMed  Google Scholar 

  71. Hasan, M. S., Wu, X., Watson, L. T. & Zhang, L. UPS-indel: a universal positioning system for indels. Sci. Rep. 7, 14106 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Mayakonda, A., Lin, D. C., Assenov, Y., Plass, C. & Koeffler, H. P. Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Res. 28, 1747–1756 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Jun, G. et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am. J. Hum. Genet. 91, 839–848 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Martinez-Jimenez, F. et al. A compendium of mutational cancer driver genes. Nat. Rev. Cancer 20, 555–572 (2020).

    Article  CAS  PubMed  Google Scholar 

  75. Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Sondka, Z. et al. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer 18, 696–705 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Muiños, F., Martinez-Jimenez, F., Pich, O., Gonzalez-Perez, A. & Lopez-Bigas, N. In silico saturation mutagenesis of cancer genes. Nature 596, 428–432 (2021).

    Article  PubMed  Google Scholar 

  78. Chakravarty, D. et al. OncoKB: a precision oncology knowledge base. JCO Precis. Oncol. 2017, 1–16 (2017).

    Article  Google Scholar 

  79. Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

    Article  CAS  PubMed  Google Scholar 

  81. Yuan, K., Macintyre, G., Liu, W., PCAWG-11 working group & Markowetz, F. Ccube: a fast and robust method for estimating cancer cell fractions. Preprint at bioRxiv https://doi.org/10.1101/484402 (2018).

  82. Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Yang, L. et al. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell 153, 919–929 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).

    Article  CAS  PubMed  Google Scholar 

  85. Yang, Y. & Yang, L. Somatic structural variation signatures in pediatric brain tumors. Cell Rep. 42, 113276 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Zhu, H. et al. Candidate cancer driver mutations in distal regulatory elements and long-range chromatin interaction networks. Mol. Cell 77, 1307–1321 (2020).

    Article  CAS  PubMed  Google Scholar 

  87. Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res. 42, e75 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Díaz-Gay, M. et al. Assigning mutational signatures to individual samples and individual somatic mutations with SigProfilerAssignment. Bioinformatics 39, btad756 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Otlu, B. & Alexandrov, L. B. Evaluating topography of mutational signatures with SigProfilerTopography. Genome Biol. 26, 134 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  92. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the Intramural Research Program of the National Cancer Institute, US NIH (project ZIACP101231 to M.T.L.); by the NIH grants R01ES032547-01, R01CA269919-01 and 1U01CA290479-01 to L.B.A.; and by a Packard Fellowship for Science and Engineering to L.B.A. The research performed in the L.B.A. laboratory was also supported by the Sanford Stem Cell Institute at the University of California San Diego. M.D.-G. and P.G.-G. were awarded fellowships within the Generación D initiative, Red.es, Ministerio para la Transformación Digital y de la Función Pública, for talent attraction (C005/24-ED CV1), funded by the European Union NextGenerationEU funds, through the Plan de Recuperación, Transformación y Resiliencia (PRTR). The funders had no roles in study design, data collection and analysis, decision to publish or preparation of the manuscript. The computational analyses reported in this manuscript used the Triton Shared Computing Cluster at the San Diego Supercomputer Center of the University of California San Diego. We thank the study participants; P. Kraft for reading and commenting on the manuscript; and the staff at Westat for their assistance with collecting samples and corresponding clinical data. This work used the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).

Author information

Author notes

  1. These authors contributed equally: Marcos Díaz-Gay, Tongwu Zhang

Authors and Affiliations

  1. Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA

    Marcos Díaz-Gay, Azhar Khandekar, Christopher D. Steele, Burçak Otlu, Shuvro P. Nandi, Raviteja Vangara, Erik N. Bergstrom, Mariya Kazachkova & Ludmil B. Alexandrov

  2. Department of Bioengineering, University of California San Diego, La Jolla, CA, USA

    Marcos Díaz-Gay, Azhar Khandekar, Christopher D. Steele, Burçak Otlu, Shuvro P. Nandi, Raviteja Vangara, Erik N. Bergstrom, Mariya Kazachkova & Ludmil B. Alexandrov

  3. Moores Cancer Center, University of California San Diego, La Jolla, CA, USA

    Marcos Díaz-Gay, Azhar Khandekar, Christopher D. Steele, Burçak Otlu, Shuvro P. Nandi, Raviteja Vangara, Erik N. Bergstrom, Mariya Kazachkova & Ludmil B. Alexandrov

  4. Digital Genomics Group, Cancer Genomics Program, Spanish National Cancer Research Center (CNIO), Madrid, Spain

    Marcos Díaz-Gay & Pilar Gallego-García

  5. Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA

    Tongwu Zhang, Phuc H. Hoang, Azhar Khandekar, Wei Zhao, Jian Sang, John P. McElderry, Caleb Hartman, Frank J. Colón-Matos, Mona Miraftab, Monjoy Saha, Olivia W. Lee, Kristine M. Jones, Jianxin Shi, Nathaniel Rothman, Qing Lan, Bin Zhu, Stephen J. Chanock, Jiyeon Choi & Maria Teresa Landi

  6. Department of Pathology, Centre Hospitalier de l’Université de Montréal, Montreal, Quebec, Canada

    Charles Leduc

  7. Department of Pathology and Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA

    Marina K. Baine, William D. Travis & Soo-Ryum Yang

  8. Department of Pathology, Brigham and Women’s Hospital, Boston, MA, USA

    Lynette M. Sholl

  9. Institut Universitaire de Cardiologie et de Pneumologie de Québec – Université Laval, Quebec City, Quebec, Canada

    Philippe Joubert & Yohan Bossé

  10. Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

    Burçak Otlu

  11. Cancer Evolution and Genome Instability Laboratory, Francis Crick Institute, London, UK

    Oriol Pich & Charles Swanton

  12. Cancer Research UK Lung Cancer Centre of Excellence, University College London Cancer Institute, London, UK

    Charles Swanton

  13. Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan

    Chao Agnes Hsiung

  14. National Institute of Cancer Research, National Health Research Institutes, Zhunan, Taiwan

    I-Shou Chang

  15. Queen Mary Hospital, University of Hong Kong, Hong Kong, China

    Maria Pik Wong

  16. Department of Pathology, University of Hong Kong, Hong Kong, China

    Kin Chung Leung

  17. Cancer Genomics Research Laboratory, Leidos Biomedical Research, Frederick National Laboratory for Cancer Research, Frederick, MD, USA

    Kristine M. Jones

  18. Ben May Department for Cancer Research, University of Chicago, Chicago, IL, USA

    Yang Yang, Xiaoming Zhong & Lixing Yang

  19. Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, USA

    Eric S. Edell

  20. Biobanco IBSP-CV FISABIO, Valencia, Spain

    Jacobo Martínez Santamaría

  21. Red Valenciana de Biobancos FISABIO, Valencia, Spain

    Jacobo Martínez Santamaría

  22. Department of Cancer Epidemiology, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA

    Matthew B. Schabath

  23. Thoracic Surgery, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA

    Sai S. Yendamuri

  24. Department of Cancer Epidemiology and Primary Prevention, Maria Skłodowska-Curie National Research Institute of Oncology, Warsaw, Poland

    Marta Manczuk & Jolanta Lissowska

  25. Department of Environmental Epidemiology, Nofer Institute of Occupational Medicine, Łódź, Poland

    Beata Świątkowska

  26. Department of Clinical Epidemiology, N.N. Blokhin National Medical Research Centre of Oncology, Moscow, Russia

    Anush Mukeria, Oxana Shangina & David Zaridze

  27. Institute of Public Health and Preventive Medicine, Second Faculty of Medicine, Charles University, Prague, Czech Republic

    Ivana Holcatova

  28. Department of Oncology, Second Faculty of Medicine, Charles University and Motol University Hospital, Prague, Czech Republic

    Ivana Holcatova

  29. Department of Occupational Health and Toxicology, National Center for Environmental Risk Monitoring, National Institute of Public Health, Bucharest, Romania

    Dana Mates

  30. International Organization for Cancer Prevention and Research (IOCPR), Belgrade, Serbia

    Sasa Milosavljevic

  31. Clinic of Pulmonology, Clinical Center of Serbia, Belgrade, Serbia

    Millica Kontic

  32. Sylvester Comprehensive Cancer Center, Department of Medicine, University of Miami Miller School of Medicine, Miami, FL, USA

    Bonnie E. Gould Rothberg

  33. Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA, USA

    David C. Christiani

  34. Department of Medicine, Massachusetts General Hospital, Boston, MA, USA

    David C. Christiani

  35. Genomic Epidemiology Branch, International Agency for Research on Cancer (IARC/WHO), Lyon, France

    Valerie Gaborieau & Paul Brennan

  36. Princess Margaret Cancer Center, University of Toronto, Toronto, Ontario, Canada

    Geoffrey Liu

  37. IHU RespirERA, Biobank-BB-0033-0025, Côte d’Azur University, Nice, France

    Paul Hofman

  38. Department of Human Genetics, University of Chicago, Chicago, IL, USA

    Lixing Yang

  39. University of Chicago Medicine Comprehensive Cancer Center, University of Chicago, Chicago, IL, USA

    Lixing Yang

  40. Department of Mathematics, Harvard University, Cambridge, MA, USA

    Martin A. Nowak

  41. Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA

    Martin A. Nowak

  42. Manchester Cancer Research Centre, University of Manchester, Manchester, UK

    David C. Wedge

  43. Manchester NIHR Biomedical Research Centre, Manchester, UK

    David C. Wedge

  44. Department of Pathology, Yale School of Medicine, New Haven, CT, USA

    Robert Homer

  45. Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy

    Angela C. Pesatori

  46. Fondazione IRCCS Ca’ Granda Ospedale Maggiore Policlinico, Milan, Italy

    Angela C. Pesatori & Dario Consonni

  47. Sanford Stem Cell Institute, University of California San Diego, La Jolla, CA, USA

    Ludmil B. Alexandrov

Authors

  1. Marcos Díaz-Gay
  2. Tongwu Zhang
  3. Phuc H. Hoang
  4. Charles Leduc
  5. Marina K. Baine
  6. William D. Travis
  7. Lynette M. Sholl
  8. Philippe Joubert
  9. Azhar Khandekar
  10. Wei Zhao
  11. Christopher D. Steele
  12. Burçak Otlu
  13. Shuvro P. Nandi
  14. Raviteja Vangara
  15. Erik N. Bergstrom
  16. Mariya Kazachkova
  17. Oriol Pich
  18. Charles Swanton
  19. Chao Agnes Hsiung
  20. I-Shou Chang
  21. Maria Pik Wong
  22. Kin Chung Leung
  23. Jian Sang
  24. John P. McElderry
  25. Caleb Hartman
  26. Frank J. Colón-Matos
  27. Mona Miraftab
  28. Monjoy Saha
  29. Olivia W. Lee
  30. Kristine M. Jones
  31. Pilar Gallego-García
  32. Yang Yang
  33. Xiaoming Zhong
  34. Eric S. Edell
  35. Jacobo Martínez Santamaría
  36. Matthew B. Schabath
  37. Sai S. Yendamuri
  38. Marta Manczuk
  39. Jolanta Lissowska
  40. Beata Świątkowska
  41. Anush Mukeria
  42. Oxana Shangina
  43. David Zaridze
  44. Ivana Holcatova
  45. Dana Mates
  46. Sasa Milosavljevic
  47. Millica Kontic
  48. Yohan Bossé
  49. Bonnie E. Gould Rothberg
  50. David C. Christiani
  51. Valerie Gaborieau
  52. Paul Brennan
  53. Geoffrey Liu
  54. Paul Hofman
  55. Lixing Yang
  56. Martin A. Nowak
  57. Jianxin Shi
  58. Nathaniel Rothman
  59. David C. Wedge
  60. Robert Homer
  61. Soo-Ryum Yang
  62. Angela C. Pesatori
  63. Dario Consonni
  64. Qing Lan
  65. Bin Zhu
  66. Stephen J. Chanock
  67. Jiyeon Choi
  68. Ludmil B. Alexandrov
  69. Maria Teresa Landi

Contributions

Conceptualization: M.D.-G., T.Z., L.B.A. and M.T.L. Methodology: M.D.-G., T.Z., L.Y., J. Shi, D.C.W., B.Z., L.B.A. and M.T.L. Formal analysis: M.D.-G., T.Z., P.H.H., A.K., W.Z., C.D.S., B.O., S.P.N., R.V., E.N.B., M. Kazachkova, J. Sang, J.P.M., C.H., O.W.L., K.M.J., P.G.-G., Y.Y., X.Z., L.Y., M.A.N., J. Shi, B.Z. and J.C. Pathology work: C.L., M.K.B., W.D.T., L.M.S., P.J., R.H. and S.-R.Y. Resources: O.P., C.S., C.A.H., I.-S.C., M.P.W., K.C.L., E.S.E., J.M.S., M.B.S., S.S.Y., M. Manczuk, J.L., B.S., A.M., O.S., D.Z., I.H., D.M., S.M., M. Kontic, Y.B., B.E.G.R., D.C.C., V.G., P.B., G.L., P.H., N.R., A.C.P., D.C., Q.L., S.J.C. and M.T.L. Data curation: P.H.H., T.Z., F.J.C.-M., M. Miraftab, M.S. and O.W.L. Writing (original draft): M.D.-G., T.Z., L.B.A. and M.T.L. Writing (review and editing), M.D.-G., T.Z., C.S., L.Y., M.A.N., D.C.W., B.Z., S.J.C., J.C., L.B.A. and M.T.L. Visualization: M.D.-G., T.Z., L.B.A. and M.T.L. Supervision: L.B.A. and M.T.L.

Corresponding authors

Correspondence to Ludmil B. Alexandrov or Maria Teresa Landi.

Ethics declarations

Competing interests

L.B.A. is a co-founder, CSO, scientific advisory member and consultant for io9, has equity and receives income. The terms of this arrangement have been reviewed and approved by the University of California San Diego in accordance with its conflict-of-interest policies. L.B.A. is also a compensated member of the scientific advisory board of Inocras. L.B.A.’s spouse is an employee of Biotheranostics. E.N.B. and L.B.A. declare a US provisional patent application filed with the University of California San Diego with serial number 63/269,033. L.B.A. also declares US provisional applications filed with the University of California San Diego with serial numbers 63/366,392, 63/289,601, 63/483,237, 63/412,835 and 63/492,348. L.B.A. is also an inventor of US patent 10,776,718 for source identification by non-negative matrix factorization. L.B.A. and M.D.-G. further declare a European patent application with application number EP25305077.7. S.-R.Y has received consulting fees from AstraZeneca, Sanofi, Amgen, AbbVie and Sanofi, and speaking fees from AstraZeneca, Medscape, PRIME Education and Medical Learning Institute. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Association of mutational signature prevalence and driver mutations with geographical region, biological sex and EGFR mutation status in LCINS adenocarcinoma cases.

a, DBS, ID, CN and SV mutational signatures enrichment analysis with geographical regions. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Blue-coloured signatures were enriched in North American and European patients, whereas red-coloured signatures were enriched in East Asian patients. Statistical significance was evaluated using multivariable logistic regression models for geographical regions and adjusted by age, sex, and tumour purity. b, SBS, DBS, ID CN, and SV mutational signatures enrichment analysis with biological sexes. Blue-coloured signatures were enriched in males, whereas red-coloured signatures were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. ce, Detail of the enrichment of EGFR (c), TP53 (d) and KRAS (e) driver mutations in North American and European versus East Asian LCINS adenocarcinoma cases. f, Driver mutations enrichment analysis with biological sexes. Blue-coloured genes were enriched in males, whereas red-coloured genes were enriched in females. Statistical significance was evaluated using multivariable logistic regression models for biological sex and adjusted by age, genetic ancestry, and tumour purity. g, Quantification of the tumour mutational burden for TP53 wild-type and mutant tumours across EGFR mutation status (n = 271 TP53 wild-type EGFR wild-type, n = 241 TP53 wild-type EGFR mutant, n = 81 TP53 mutant EGFR wild-type, n = 144 TP53 mutant EGFR mutant). Statistical significance was evaluated using a multivariable linear regression model for EGFR mutation status and adjusted by age, sex, ancestry, and tumour purity. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. h, SBS, DBS, ID, CN and SV mutational signatures enrichment analysis with EGFR mutation status. Blue-coloured signatures were enriched in EGFR mutant tumours, whereas red-coloured signatures were enriched in EGFR wild-type tumours. Statistical significance was evaluated using multivariable logistic regression models for EGFR mutation status and adjusted by age, sex, genetic ancestry, and tumour purity.

Extended Data Fig. 2 Genomic differences between lung cancers from smokers and lung cancers from never-smokers, and landscape of 56 LCINS tumours that exhibit SBS4 activity.

a, Differences between smoker and never-smoker lung cancer cases across SBS signatures. Volcano plot (top) indicating the enrichment of SBS signature prevalence in never-smokers (left) and smokers (right) with lung cancer. Statistically significant enrichments were evaluated using multivariable logistic regression models for smoking status and adjusted by age, sex, histology, genetic ancestry, and tumour purity. Firth’s bias-reduced logistic regressions were used for regression presenting complete or quasi-complete separation. P-values were adjusted for multiple comparisons based on the total number of mutational signatures considered, and adjusted p-values were reported as FDR values. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR levels (dashed red line). Bar plot (bottom) indicating prevalence by smoking history. b, Tumour mutational burden differences between SBS4-positive (n = 56) and negative (n = 815) LCINS tumours for SBS, DBS, ID, CN segments, and SV events. Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. ce, Mutational signature landscape for SBS (c), DBS (d) and ID (e) mutation types, including absolute and relative number of mutations assigned to each mutational signature, unsupervised clustering based on the signature contributions, and sample-level annotations of sex, genetic ancestry, passive smoking, and accuracy of signature reconstruction based on cosine similarity. f, Driver mutations landscape, including different types of genomic alterations, as well as sample-level annotations of sex, genetic ancestry, histology, and tumour purity. g, Enrichment of EGFR p.L858R hotspot driver mutations in SBS4-positive tumours from never-smokers compared to smokers using multivariable logistic regressions considering clinical and epidemiological covariates, including age, sex, genetic ancestry, histology, and tumour purity (n = 5 mutated non-smoker cases, n = 1 mutated smoker case, n = 51 non-smoker wild-type cases, n = 301 smoker wild-type tumours). Error bars indicate 95% CIs.

Extended Data Fig. 3 Topographical characteristics of 56 LCINS and 68 lung cancers from smokers exhibiting SBS4 activity.

a,b, Distribution of SBS4 mutations with replication timing in our cohort of never-smokers (a) and in the smokers from the PCAWG cohort (b). Data are separated into deciles, with each segment harbouring 10% of the observed replication time signal in the x-axis, and the normalized mutational density displayed in the y-axis. Black dashed lines represent the behaviour of simulated mutations. c,d, Association of SBS4 mutations with nucleosome occupancy in never-smokers (c) and smokers (d). The solid blue line represents real somatic mutations, whereas the dashed grey line indicates the distribution of simulated mutations. Both lines show the average nucleosome signal in the y-axis, using a genomic window of 2 kilobases centred around the SBS4-associated mutations in the x-axis. e,f, Strand asymmetry of SBS4-associated mutations in comparison to simulations and considering lagging and leading DNA strands, transcribed and untranscribed DNA regions and genic and intergenic genomic locations in never-smokers (e) and smokers (f). The number of circles represents the odds ratio and the colour the corresponding strand/region of statistically significant asymmetries.

Extended Data Fig. 4 Influence of passive smoking on the landscape of ID, DBS, CN and SV signatures in LCINS.

ae, Differences in DBS, ID, CN, and SV burden using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions considering clinical and epidemiological covariates (be), including age, sex, genetic ancestry, and tumour purity (n = 250 passive smokers, n = 208 non-passive smokers). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (be). f, Enrichment of mutational signatures derived from DBS, ID, CN, and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for passive smoking history and adjusted by age, sex, genetic ancestry, histology and tumour purity.

Extended Data Fig. 5 Effects of PM2.5 exposure in large genomic alterations in LCINS.

a,b, Differences in the number of CN segments and SV events using univariate comparisons based on two-sided Wilcoxon rank sum tests (a) as well as multivariable linear regressions, considering clinical and epidemiological covariates (b), including age, sex, genetic ancestry, histology, and tumour purity, for patients diagnosed in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points (a). Error bars indicate 95% CIs (b). c, Volcano plots indicating enrichment of mutational signatures derived from CN and SV alterations. Horizontal lines marking statistically significant thresholds were included at 0.05 (dashed orange line) and 0.01 FDR value levels (dashed red line). Statistical significance was evaluated using multivariable logistic regression models for PM2.5 exposure levels and adjusted by age, sex, genetic ancestry, histology, and tumour purity.

Extended Data Fig. 6 Assignment of mutational signatures and estimation of telomere length using data from control and PM2.5-exposed mice.

a,b, Box plots comparing the mutations assigned to SBS5 (a) and the estimations for the telomere length ratio between the tumour and normal samples (b). Two-sided Student’s t-tests were used to calculate statistical significance. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, and whiskers show 1.5 × interquartile range. n = 5 (control mice) and n = 5 (PM2.5-exposed mice); data from a previous study49.

Extended Data Fig. 7 Mutagenic effects of PM2.5 exposure in LCINS cases excluding SBS4 contributions.

a, Quantification of SBS burden excluding SBS4 mutations for patients living in geographical regions with high and low PM2.5 exposure levels (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Statistical significance was evaluated using two-sided Wilcoxon rank sum tests. The line within the box indicates the median, the upper and lower ends indicate the 25th and 75th percentiles, whiskers show 1.5 × interquartile range, and values outside are shown as individual data points. b, Forest plot corresponding to a multivariable linear regression considering high or low PM2.5 exposure group, age, sex, genetic ancestry, histology and tumour sample purity as covariates and SBS burden as independent variable (threshold defined at 20 μg m−3; n = 440 high-pollution group, n = 413 low-pollution group; only samples for which the country of origin was known are included). Error bars indicate 95% CIs. c, Scatter plot showing a significant correlation between individual sample estimates of PM2.5 exposure and SBS burden. Statistical significance was evaluated using a multivariable linear regression of the individual PM2.5 estimates per sample and mutation burden (log10 scale), and adjusted by age, sex, genetic ancestry, histology and tumour purity. Blue lines and bands indicate univariate linear regressions and 95% CIs for average mutation burden versus average PM2.5 estimates.

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Díaz-Gay, M., Zhang, T., Hoang, P.H. et al. The mutagenic forces shaping the genomes of lung cancer in never smokers. Nature (2025). https://doi.org/10.1038/s41586-025-09219-0

Download citation

  • Received: 16 May 2024

  • Accepted: 30 May 2025

  • Published: 02 July 2025

  • DOI: https://doi.org/10.1038/s41586-025-09219-0

Read Entire Article