RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.


A Cloud-Based Data Democratization Tool

New NIH data platform makes it easier for underserved groups and researchers to study biomedical topics, including COVID-19.


To accelerate efficient biomedical research by making innovative data more accessible.


In partnership with the University of North Carolina (UNC) Renaissance Computing Center (RENCI), we serve as the coordinating center for BioData Catalyst, a cloud-based platform to advance therapeutics and prevention for heart, lung, and sleep disorders.


BioData Catalyst is providing an accessible platform to access data, making studies more available for researchers and clinicians studying contributing factors of disease, including COVID-19. It sets a new standard for federally funded data democratization tools and modernization.

The field of biomedical research has generated an enormous and increasing amount of data in recent years. Somewhere in the data, scientists will find valuable answers to the greatest health challenges of our time, including COVID-19.

However, researchers will never find these answers if they aren’t able to access and interpret the data. Currently, they must operate across a vast data field to develop new tools for testing, treating, and preventing disease. Researchers must rely on relatively new, high-capacity computing equipment, leaving certain underrepresented groups and smaller research centers unable to participate in studies. In addition, once a researcher has downloaded large files, they often lack the tools to properly analyze and use the data in a timely manner. In short, there is more information than ever before, but it is harder to find and analyze, even for experienced investigators.

The National Heart, Lung, and Blood Institute (NHLBI), part of the NIH, confronted this issue. They funded high-value data sets but realized the data wasn’t usable for most researchers and clinicians because they lacked the computing infrastructure to analyze the data, or even the storage space to download it. Heart, lung, blood, and sleep (HLBS) researchers and clinicians also saw a drastic increase in the volume of genomic data largely driven by the genomic revolution, a study on the human genome and DNA. The NHLBI recognized this lack of accessibility and formulated a plan to “democratize” data—make digital information available and usable for average non-technical users in the scientific community. The solution had to be cost-effective and compatible with standard computers.

Creating BioData Catalyst for Data Democratization

To address this data accessibility gap, the NHLBI developed BioData Catalyst, a user-friendly platform that operates in a cloud environment, allowing partners to share and utilize large sets of information with relative ease. The platform functions similarly to an online shopping cart. It allows scientists to select and utilize various tools and sets of data within a single user-friendly system. The BioData Catalyst platform offers services for both early-career scientists and experienced principal investigators. While the platform is available to anyone, most of its data is granted to those who apply for access because the information consists of controlled human studies.

The BioData Catalyst mission is to “develop and integrate advanced cyberinfrastructure, leading-edge tools, and FAIR—findable, accessible, interoperable, and reusable—data to support the NHLBI research community and accelerate discovery,” said Rebecca Boyles, RTI Director, Center for Data Modernization Solutions. “It is exciting to be part of an innovative effort to create a data ecosystem that will expedite scientific discovery and hasten the development of new treatments and prevention strategies.

RTI Co-Leads Coordination for BioData Catalyst Platform

The BioData Catalyst ecosystem is a joint effort of the NHLBI and data science experts in academic institutions and research organizations. The NHLBI funded five teams, including the Renaissance Computing Center (RENCI) and an RTI team led by the Research Computing Division that together serve as the BioData Catalyst Coordinating Center.

“We have a long history collaborating with RENCI to create data organization and solutions for the NIH,” said Boyles. “As a team, we oversaw the data collection and organization for the NIH Helping to End Addiction Long-term (HEAL) Initiative.”

As Co-Principal Investigator of the BioData Catalyst Coordinating Center, Boyles works with Stan Ahalt, RENCI Director, to bring together scientists who usually compete but now work together for BioData Catalyst’s success. “We talk a lot about putting aside our own affiliations to drive BioData Catalyst forward,” said Boyles, “We operate on the principle of inclusion, respecting all voices, and making all work as transparent as possible.”

BioData Catalyst now offers over 1,200 different analytical tools and hosts more than three petabytes of data. As part of this collaborative team, we successfully hosted NHLBI’s precision medicine data sets, called TOPMed. Our team also uploaded COVID-19 clinical trials and data on sickle cell disease. We oversaw a peer review for scientific career investigators to utilize the platform and have funded three rounds of fellows in a program to engage early-career researchers with the use of BioData Catalyst platform, and its datasets and tools.

BioData Catalyst Provides Information to Treat COVID-19

Currently, RTI and its partners are working towards increasing the types of data made available on the BioData Catalyst platform. The most immediate updates to the platform addressed the COVID-19 pandemic. Because the novel virus affects the respiratory system, it is essential for researchers and clinicians to have quick access to lung images and chest CT scans. These scans are often too large for sharing through email, resulting in researchers mailing these scans on hard drives. BioData Catalyst is working to address this challenge with our team seeking to drive innovation to enable novel analysis with our NIH client.

BioData Catalyst Setting the Foundation for Similar Data Democratization Tools

Our collaborative work with BioData Catalyst sets a new standard for federally funded data democratization and data modernization. Its tools and services also work towards including more diverse participants and audiences. Researchers are no longer dependent on advanced computing systems, funds, or tools to analyze large sets of data.

BioData Catalyst’s platform directly resulted in users accelerating their research. Prior to its launch, gathering information for a study could take months to download and analyze. Now, it only takes a few days. For instance, one researcher processed over 5,000 samples in six days.

“Analyzing large sequencing datasets in a short amount of time is not only possible but also straightforward with the NHLBI BioData Catalyst,” said Dr. Jean Monlong, a former BioData Catalyst fellow from the University of California, Santa Cruz. “Without it, I would have to spend months downloading huge amounts of data and potentially flooding our computer servers with my analysis.”

BioData Catalyst’s success also led our experts and collaborators at RENCI to assist the NIH with similar cloud-based data programs. While our team constantly tests and monitors BioData Catalyst to make sure it operates smoothly for researchers, we also look for ways to address gaps within the platform and create solutions. For example, BioData Catalyst’s initial platform did not support researchers with finding data based on components of the study method. The team envisioned a search application, DUG, which allows users to understand the scope of different studies. This tool is being implemented across different NIH cloud platforms to enable a researcher to navigate between various NIH platforms seamlessly and conduct analysis.

These and other aspects of BioData Catalyst create the foundation for future cloud-based services in the biomedical research space. We share NIH’s goal of making data available across various platforms and welcoming a wider audience of researchers and clinicians. Data modernization and democratization is a step forward for creating health solutions that lead to a better future by increasing the number of people able to ask essential questions in the field of biomedical research.