Peter Baumgartner is a senior data scientist with expertise in computational social science, focusing on natural language processing (NLP), machine learning (ML), and software development. He was the lead data scientist on an R21 grant from the National Institute on Drug Abuse (NIDA) to understand how people self-treat opioid withdrawal symptoms, using over 3.5 million narrative reports from reddit.com. For this project, Peter engineered a named entity recognition pipeline to identify substances and effects of those substances. He also worked on the Survey of Earned Doctorates, developing an approach using state-of-the-art NLP models to analyze open-ended survey questions regarding the impact of COVID-19 on the experience of survey respondents. Peter also collaborated with the NC DHHS-funded Rapid COVID-19 Hospital Capacity Scenario Modeling and Forecasting project, where he was responsible for increasing the runtime speed of an agent-based model by 72 percent. In addition, Peter led the development and evaluation of a custom warrant management software application developed in partnership with the Greensboro, North Carolina Police Department, which helped the agency prioritize warrant service based on agency policy.
Open-source software developed by Peter includes tools for calculating annotator agreement, discovering multi-word expressions, quickly generating Voronoi diagrams, calculating bootstrap statistics, performing few-shot classification, tokenizing text from HTML documents, and calculating corpus statistics. He also co-developed a machine learning model and coding tool that allows researchers and analysts to classify verbatim criminal offense texts into the National Corrections Reporting Program (NCRP) offense code classification.
Peter most recently worked at Explosion AI as a Consulting Services Lead and Machine Learning Engineer. In this role, he led a team of consultants and developers, providing technical guidance and support to clients in NLP and ML. He developed and delivered machine learning models for clients across a range of industries, including transportation, health care, and legal. His contributions to the spaCy natural language processing library include improvements to debugging utilities, data visualization, model training speed, and documentation.
Before joining RTI in 2015, Peter worked as a consultant at Deloitte in its Advanced Analytics and Modeling practice. While there, he worked on several analytics initiatives in insurance underwriting, workforce analysis, veteran career services, health care and life sciences, and business development.