RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.
A robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data
Zhu, T., Faragó, T., Bollepalli, S., Heikkinen, A., Hukkanen, M., Raitakari, O., Lehtimäki, T., Korhonen, T., Kaprio, J., Fang, F., Lawrence, K. G., Sandler, D. P., Spildrejorde, M. R., Gervin, K., Pan, Y., Costeira, R., Bell, J. T., & Ollikainen, M. (2026). EpiSmokEr2: A robust epigenetic classifier for smoking status inference using Illumina EPIC methylation data. Epigenomics, 1-11. Advance online publication. https://doi.org/10.1080/17501911.2026.2630841
AIM: Tobacco smoking induces persistent DNA methylation (DNAm) changes in blood that can serve as long-term biomarkers for smoking exposure. We aimed to develop and validate a DNAm classifier of smoking status using Illumina EPIC array data.
METHODS: We built Epigenetic Smoking status Estimator2 (EpiSmokEr2), a Least Absolute Shrinkage and Selection Operator (LASSO) regression-based DNAm classifier using 511 CpGs from Illumina Infinium MethylationEPIC array (EPIC) data. The model was trained on 1343 samples from the Young Finns Study cohort and validated across six independent datasets from four cohorts and two array platforms (EPIC and EPICv2).
RESULTS: EpiSmokEr2 achieved an average sensitivity of 0.87 and specificity of 0.86 in distinguishing current from never smokers. Predicted smoking status correlated strongly with established DNAm smoking scores and GrimAge, indicating its ability to capture biologically relevant smoking effects. Simulation analysis showed EpiSmokEr2 was robust for up to 10% missing CpGs.
CONCLUSION: EpiSmokEr2 provides a reliable DNAm-based estimator of smoking status. It is available as an open-source R package on GitHub, facilitating broad use in epidemiological and clinical research.
RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.