• Article

Methodological considerations in analyzing Twitter data

Twitter is an online microblogging tool that disseminates more than 400 million messages per day, including vast amounts of health information. Twitter represents an important data source for the cancer prevention and control community. This paper introduces investigators in cancer research to the logistics of Twitter analysis. It explores methodological challenges in extracting and analyzing Twitter data, including characteristics and representativeness of data; data sources, access, and cost; sampling approaches; data management and cleaning; standardizing metrics; and analysis. We briefly describe the key issues and provide examples from the literature and our studies using Twitter data to understand public health issues. For investigators considering Twitter-based cancer research, we recommend assessing whether research questions can be answered appropriately using Twitter, choosing search terms carefully to optimize precision and recall, using respected vendors that can provide access to the full Twitter data stream if possible, standardizing metrics to account for growth in the Twitter population over time, considering crowdsourcing for analysis of Twitter content, and documenting and publishing all methodological decisions to further the evidence base


Kim, A., Hansen, H., Murphy, J., Richards, A., Duke, J., & Allen, J. (2013). Methodological considerations in analyzing Twitter data. Journal of the National Cancer Institute Monographs, 2013(47), 140-146. https://doi.org/10.1093/jncimonographs/lgt026