RTI uses cookies to offer you the best experience online. By clicking “accept” on this website, you opt in and you agree to the use of cookies. If you would like to know more about how RTI uses cookies and how to manage them please view our Privacy Policy here. You can “opt out” or change your mind by visiting: http://optout.aboutads.info/. Click “accept” to agree.

Sandy Preiss Headshot (510x429)

Sandy Preiss

Research Data Scientist


MS, Analytics, Institute for Advanced Analytics at North Carolina State University
MA, International Studies, North Carolina State University
BA, Psychology, University of North Carolina at Chapel Hill


Alexander “Sandy” Preiss is a research data scientist who combines expertise in machine learning, natural language processing, and simulation modeling with domain knowledge in public health and international development. His knowledge and insights have been crucial in collaborations with federal and state agencies, including the National Institutes of Health (NIH), National Science Foundation (NSF), and the North Carolina Department of Health and Human Services (NC DHHS).

As a contributor to the NIH’s National COVID Cohort Collaborative (N3C) and Researching COVID to Enhance Recovery (RECOVER) Initiative, Sandy uses billions of electronic health records to help understand, treat, and prevent Long COVID. He uses causal inference methods like target trial emulation to study the real-world effectiveness of Long COVID treatments and preventatives. In addition, Sandy helped build machine learning models to predict whether and when patients develop Long COVID. In 2020, Sandy was part of an RTI team that developed weekly hospital demand forecasts for the NC DHHS, which were reviewed by the Secretary of Health and the Governor.

Sandy also applies modern natural language processing techniques to a wide variety of text data, including social media, scientific documents, and clinical notes. For an NIH-funded grant, he used Reddit data to study how people self-treat opioid withdrawal symptoms. He is lead data scientist for an NSF initiative to infer a taxonomy of research topics from dissertation abstracts. As generative language models have advanced rapidly in recent years, Sandy has used these models in several project contexts, including generating names for text clusters and extracting information from journal articles for a systematic review.

In the simulation realm, Sandy has worked on several agent-based and microsimulation projects. He ported a colorectal cancer microsimulation model from proprietary software to open-source, and is using the model to study the cost effectiveness of screening tests and guidelines for the Centers for Disease Control and Prevention and National Cancer Institute. Sandy also extended an agent-based model of the North Carolina health care system to simulate nursing home staff networks, and used the model to study how staffing policies affect COVID-19 infection rates in nursing homes.

Prior to joining RTI, Sandy managed the evaluation of public health interventions at the American Cancer Society and Ipas. He helped diverse stakeholder groups use data to make decisions, develop strategies, and answer research questions.

Get in Touch

To speak to this expert or inquire about RTI services, you can reach us at +1 919 541 6000 or use the contact form below. For media inquiries, please reach out to our Media Relations team at news@rti.org.

Blue background circle graphics