An integrative data mining approach to identifying adverse outcome pathway signatures
The Adverse Outcome Pathway (AOP) framework is a tool for making biological connections and summarizing key information across different levels of biological organization to connect biological perturbations at the molecular level to adverse outcomes for an individual or population. Computational approaches to explore and determine these connections can accelerate the assembly of AOPs. By leveraging the wealth of publicly available data covering chemical effects on biological systems, computationally-predicted AOPs (cpAOPs) were assembled via data mining of high-throughput screening (HTS) in vitro data, in vivo data and other disease phenotype information. Frequent Itemset Mining (FIM) was used to find associations between the gene targets of ToxCast HTS assays and disease data from Comparative Toxicogenomics Database (CTD) by using the chemicals as the common aggregators between datasets. The method was also used to map gene expression data to disease data from CTD. A cpAOP network was defined by considering genes and diseases as nodes and FIM associations as edges. This network contained 18,283 gene to disease associations for the ToxCast data and 110,253 for CTD gene expression. Two case studies show the value of the cpAOP network by extracting subnetworks focused either on fatty liver disease or the Aryl Hydrocarbon Receptor (AHR). The subnetwork surrounding fatty liver disease included many genes known to play a role in this disease. When querying the cpAOP network with the AHR gene, an interesting subnetwork including glaucoma was identified. While substantial literature exists to support the potential for AHR ligands to elicit glaucoma, it was not explicitly captured in the public annotation information in CTD. The subnetwork from this analysis suggests a cpAOP that includes changes in CYP1B1 expression, which has been previously established in the literature as a primary cause of glaucoma. These case studies highlight the value in integrating multiple data sources when defining cpAOPs for HTS data.