Latest Census Data Add Realism to Infectious Disease Modeling

RESEARCH TRIANGLE PARK, N.C. — Researchers at RTI International have added a new level of realism to simulation tools used by researchers and public health officials nationwide to study and deal with infectious disease outbreaks in the United States.

Simulations of the type that use the new data are called agent-based models or ABMs. These models rely on computerized representations of populations—or "synthetic" populations—that accurately reflect the size and demographic characteristics of communities across the nation.

RTI's U.S. Synthetic Population Database, which is free to the public, was originally developed in 2007 by a team of geographers and computer scientists collaborating on the Models of Infectious Disease Agent Study (MIDAS), funded by the National Institute of General Medical Sciences. The original synthetic database used 2000 census data, while the updated database uses information from the 2005–2009 American Community Survey, an ongoing survey by the U.S. Census Bureau.

By basing the new synthetic population on recent U.S. census data, RTI researchers have significantly enhanced the accuracy of the populations used by these types of models.

"The new data, based on recent census information, represent more realistically the attributes of American communities and the characteristics of people and households within them," said Bill Wheaton, director of RTI's Geospatial Science and Technology Program. "The more accurate the underlying synthesized agent-based population is, the greater the potential that models accurately simulate social processes. With this synthetic population update, researchers can be confident that they are representing the U.S. population accurately in their models."

The updated database contains a computerized representation of every household in the United States (more than 112 million). The data also include a modeled latitude/longitude coordinate for each household, household income, household size, and the sex, race, and age of household occupants. Although the database is not an actual map of U.S. households, it represents the distribution of U.S. households.

That database is derived from publically available data sources and does not contain any information that would allow specific families or individuals to be identified.

"These synthetic populations, which were developed by RTI with support from the MIDAS program, have proven to be extremely valuable tools for examining policy options for preparing for and responding to infectious disease outbreaks," said Irene Eckstrand, Ph.D., who helps oversee the MIDAS program at the National Institutes of Health's National Institute of General Medical Sciences. "We are pleased that this new database is broadly available and will contribute to making better decisions to protect human health."

Agent-based models are used to simulate large-scale social systems and to be able to test the effects of different policy scenarios on how people behave. While the U.S. Synthetic Population Database was created to support simulations of infectious diseases, the data can also be used in agent-based models simulating the effects of policy changes on residential choice, on how people might modify their transportation choices given different options for public transportation, or how changes in insurance coverage might impact adoption of various screen tests for disease.

RTI researchers recently used synthetic data to simulate the potential rate of electric vehicle purchases depending on the number and locations of charging stations.

"Data that capture the variability and detail of the real world are often confidential, but important for models to reflect complex processes," said Diane Wagener, Ph.D., a principal investigator at RTI, who helped develop the user interface. "The synthesized data that RTI has developed overcome this hurdle and enable researchers to develop useful models."

Phil Cooley, who has developed several RTI simulation models, said, "We simulate complex, large-scale social systems by assigning behaviors and activities to the synthetic agents to assess the consequences of the interactions of agents with each other and their environments."

Available free of charge and easily downloadable at, the data are provided to researchers and the public health community to support the development of complex computational models that will improve the human condition, RTI's fundamental mission.

"The MIDAS research program has shown that detailed and realistic computer simulation models can be important new tools to evaluate ways to respond to public health challenges such as the H1N1 pandemic," said John Grefenstette, Ph.D., director of University of Pittsburgh's Public Health Dynamics Laboratory, a MIDAS research group. "Our team has performed several studies using the RTI U.S. synthetic population, evaluating vaccination policies, school closure policies, and the effects of subway commuting. These studies rely on having an up-to-date model of the U.S. population, and we are delighted that RTI has continued to provide this critical database to the research community."

The researchers also plan to enhance this version of the database by creating linkages between school-aged children in the database and actual schools. Creation of the synthesized databases has been streamlined to the point where a new database may be generated each year to coincide with yearly American Community Survey five-year estimates.


  • RTI International has updated the U.S. Synthetic Population Database using data from the 2005-2009 American Community Survey
  • The update will improve accuracy of simulations that test the effects of different events and policy scenarios on how people behave
  • The database contains a computerized representation of every household in the United States