The influence of errors inherent in genome wide association studies (GWAS) in relation to single gene models
Nearly one thousand human genome wide association studies (GWAS) have examined over 210 diseases and traits and found over 1,200 SNP associations. With improved genotyping technologies and the growing number of available markers, case-control Genome Wide Association Studies (GWAS) have become a key tool for investigating complex diseases. This study assesses the influence of genotype and diagnosis errors present in GWAS by analyzing a synthetic gene dataset incorporating factors known to influence association measurement. Monte Carlo methods were used to generate the synthetic gene data, which incorporated factors including gene inheritance, relative risk levels, disease penetrance, genotype distribution, sample size, as well as the two error factors that are the focus of this study. The resulting dataset provides a truth set for assessing statistical method performance and association sensitivity.
While previously understood, these results quantify and document the extent of the relationship between genotype and diagnosis error measures and statistical power loss. Our results also demonstrate that for low risk non-recessive loci, sample sizes in the range of 1,000 - 2,000 cases will achieve 80% power thresholds for error type I error levels of 10-8 even with realistic genotype and phenotype error assumptions. Nevertheless, compensating for power loss due to the presence of genotype and diagnosis errors by increasing sample size should not be underestimated. Our estimates indicate that sample size increase requirements are in the range of 20% to 40%, depending on the gene inheritance model assumed.