Predicting age groups of Twitter users based on language and metadata features

Antonio A Morgan-Lopez; Annice E Kim; Robert F Chew; Paul Ruddle

Predicting age groups of Twitter users based on language and metadata features

Morgan-Lopez, A. A., Kim, A. E., Chew, R. F., & Ruddle, P. (2017). Predicting age groups of Twitter users based on language and metadata features. PLoS One, 12(8), Article e0183537. https://doi.org/10.1371/journal.pone.0183537

Copy citation

Abstract

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research.

Publications Info

To contact an RTI author, request a report, or for additional information about publications by our experts, send us your request.

publications@rti.org

RTI shares its evidence-based research - through peer-reviewed publications and media - to ensure that it is accessible for others to build on, in line with our mission and scientific standards.

Meet the Experts

Navigate to Robert Chew

Robert Chew

Navigate to Annice Kim

Annice Kim

Navigate to Antonio Morgan-Lopez

Antonio Morgan-Lopez

Recent Publications

Article

Patient-reported outcome improvements following scalp hair regrowth among patients with Alopecia Areata: analysis of the ALLEGRO-2b/3 trial

December 2025

Article

Plain language summary of mortality rates of patients with Parkinson’s disease psychosis who were treated either with pimavanserin or with different second-generation (atypical) antipsychotics

December 2025

Article

Biological parenthood rates among men with sickle cell disease

December 2025

Article

Patterns of felt stigma among rural-dwelling people who use drugs: A latent class analysis

December 2025

Article

Estimating community-level prevalence of opioid use disorder: Extrapolating from Medicaid claims data and other publicly available data sources in Ohio, USA

December 2025

Article

Experiences of parents who receive a false-positive CK-MM screening for their newborn

December 2025

Article

Evaluating the efficacy and safety of milrinone for prevention of post-patent ductus arteriosus closure syndrome (the MIDAS trial) in extremely preterm infants: A multicentre, double-masked, randomised, placebo-controlled trial

December 2025

Article

Progression of vertebral fractures in metastatic melanoma and non-small cell lung cancer patients given immune checkpoint inhibitors

December 2025

View All Publications