Assessing gene-environment interactions in genome-wide association studies Statistical approaches

By Phillip Cooley, Robert Clark, R Folsom

In this report, we address a scenario that uses synthetic genotype case-control data that is influenced by environmental factors in a genome-wide association study (GWAS) context. The precise way the environmental influence contributes to a given phenotype is typically unknown. Therefore, our study evaluates how to approach a GWAS that may have an environmental component. Specifically, we assess different statistical models in the context of a GWAS to make association predictions when the form of the environmental influence is questionable. We used a simulation approach to generate synthetic data corresponding to a variety of possible environmental-genetic models, including a “main effects only” model as well as a “main effects with interactions” model. Our method takes into account the strength of the association between phenotype and both genotype and environmental factors, but we focus on low-risk genetic and environmental risks that necessitate using large sample sizes (N = 10,000 and 200,000) to predict associations with high levels of confidence. We also simulated different Mendelian gene models, and we analyzed how the collection of factors influences statistical power in the context of a GWAS. Using simulated data provides a “truth set” of known outcomes such that the association-affecting factors can be unambiguously determined. We also test different statistical methods to determine their performance properties. Our results suggest that the chances of predicting an association in a GWAS is reduced if an environmental effect is present and the statistical model does not adjust for that effect. This is especially true if the environmental effect and genetic marker do not have an interaction effect. The functional form of the statistical model also matters. The more accurately the form of the environmental influence is portrayed by the statistical model, the more accurate the prediction will be. Finally, even with very large samples sizes, association predictions involving recessive markers with low risk can be poor


Cooley, P., Clark, R., & Folsom, R. (2014). Assessing gene-environment interactions in genome-wide association studies: Statistical approaches. (RTI Press Publication No. RR-0022-1405). Research Triangle Park, NC: RTI Press.

© 2019 RTI International. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


Phillip CooleyPhilip C. Cooley, MS, Senior Fellow in bioinformatics and high-performance computing, is a principal scientist with more than 50 years of experience developing computer models for the study of environmental health and infectious and chronic disease. Cooley has designed and implemented a series of influenza transmission models for the study and management of pandemic flu. He has also designed a model to study the double burden of malnutrition in Indonesia. His current research includes an assessment of statistical methods for biomarker explorations in the context of genome-wide-analysis studies.

Robert ClarkRobert F. Clark, PhD, is a senior genetic epidemiologist in RTI International’s Genetic Epidemiology and Omics research program. Throughout most of his career, he has focused on multidisciplinary work in the omics and systems biology of various multifactorial disorders, and since 1992, he has conducted many genetic studies of neurodegenerative diseases; breast, bone, and brain cancers; nicotine, heroin, and cocaine addiction; and aortic aneurysms.

R FolsomRalph Folsom, PhD, is an expert in the design and analysis of complex probability samples. Working on the nation's largest household survey (the National Survey on Drug Use and Health or NSDUH), Dr. Folsom initiated innovative weight adjustment methods based on his logistic response propensity and exponential poststratification models. This pioneering work led to the sophisticated GEM weight adjustment methods currently employed for NSDUH. Dr. Folsom also introduced model-based imputations for missing frequency of use and income data items, and he has been an influential collaborator in the development of NSDUH's current Predictive Mean Neighborhoods (PMN) imputation methodology. Dr. Folsom has recently led RTI's innovative work in small area estimation research. In addition to his innovative work on many complex survey efforts, Dr. Folsom has made significant contributions to the development of RTI's computer software for survey data analysis, SUDAAN.

Contact RTI Press

To contact an author, request an exam or review copy, or seek permission to use copyrighted content, contact our editorial team.