Success story

Penn Medicine Center for Digital Health

See how the Penn Medicine CDH uses Twitter data to understand the COVID-19 health crisis.

Man wearing a surgical mask and surfing on the mobile with purple abstract background and computer code snippets

Here's the tl;dr

Do research
Build for good

When the COVID-19 pandemic hit, Penn Medicine Center for Digital Health was uniquely positioned to make sense of the evolving public health crisis and people’s perceptions of it. Using many of the same techniques in natural language processing from previous work, they created a web-based COVID-19 Twitter map detailing public sentiment, symptoms reported, state-by-state data views, and more to help inform local community response.


The team at the Penn Medicine Center for Digital Health has worked with social media data on a number of projects to improve health care delivery. Among them is their collaborative project with the World Well-Being Project, which establishes a real-time baseline of mental health and well-being in the US by identifying feelings of loneliness, psychological stress, and other early signs of mental health issues through linguistic markers in Tweets.

“The CDH was created by Penn Medicine in part to apply insights derived from the study of social media to the way care could be delivered in the future,” said Dr. Raina Merchant, Founding Director of the center and Penn Medicine Associate Vice President/Digital Health. “These Tweets and pictures, posted articles, and Retweeted content may give us important context and insight into how patients live their lives and consequently what health problems may arise or be exacerbated.”

So when COVID-19 emerged, the team was quick to spot changes.

“It became obvious very early on in the pandemic that there were spikes of psychological symptoms happening beyond a few standard deviations,” recalls Dr. Sharath Guntuku, Assistant Professor (Research) in the Department of Computer and Information Science, and lead research scientist at the center. “That got us thinking, we needed to understand more of what was happening.”


In the time since social media has emerged, there’s never been a global event as significant as COVID-19, and the center was uniquely qualified to make sense of this evolving public health crisis.  “With a new challenge like COVID, a systematic study of social media can help to uncover risk factors, shed light on public sentiment and even inform how to frame public health messaging or combat misinformation,” Merchant noted. “So when the COVID crisis started, it made sense for our team to pivot rapidly and begin studying COVID-related content.”

Using many of the same techniques in natural language processing from previous work, the center’s research team quickly created a web-based COVID-19 Twitter map to examine aggregate increases in reported symptoms or anxiety levels.  

“The way people talk online really informs us of their concerns. So we thought there would be a lot of value in understanding what people were saying in different areas, particularly the variances in places that were harder hit compared to others,” Dr. Guntuku shared. The goal is to use this data to provide current, local, and actionable information for patients, providers, health systems, and policy makers. “We think that the COVID-19 Twitter map could help policy makers and health systems predict outbreak hotspots or a second wave coming in the fall, when traditional cold and flu season is also emerging,” adds Elissa Klinger, Center for Digital Health Assistant Director.

When Twitter launched its COVID-19 stream in April 2020, one of the first applications came from the Center for Digital Health and World Well-Being Project collaboration. The team was interested in how this data could help augment their Twitter map. The main challenges the team faced had to do with the volume of the data -- the COVID-19 stream was at least 10x as large as datasets they had previously worked with -- and their ability to perform data validation at scale. The team also anonymized all the data, and ran it through several custom machine learning tools to develop sentiment scores. “Processing and analyzing this amount of data on a daily basis was an exciting challenge,” says Garrick Sherman, Senior Data Scientist at the World Well-Being Project. “We are employing systems and models that have been developed over several years, but this was our first opportunity to apply them to a public health emergency in real-time.”

Undaunted by the extra effort, analysts at the center see incredible value in mining social media data. Compared to the traditional survey and interview datasets that often inform health systems and public policy, there is possibly more predictive power in observing the conversations on Twitter. “We see people’s own thoughts and feelings, shared in their own words,” says Lauren Southwick, a research manager at the center. “This lets us uncover different terminologies, expressions, or sentence structures that help us better identify indicators of illness, anxiety, or isolation.” 

With the ability to understand population-level moods and symptoms, the team can quickly validate new information, like emerging symptoms, in order to rapidly iterate on their predictive models. “When the CDC added six new COVID-19 symptoms on April 17, we could go back to see those symptoms being discussed on Twitter as far back as early March,” notes Dr. Gunktuku. The linguistic characteristics of people talking about these symptoms can then be applied to study the conversation in real-time, so analysts can potentially identify regions that may see a surge in cases.

"With a new challenge like COVID, a systematic study of social media can help to uncover risk factors, shed light on public sentiment and even inform how to frame public health messaging or combat misinformation."

Dr. Raina M. Merchant, Director, Penn Medicine Center for Digital Health


The COVID-19 Twitter map includes charts detailing sentiment, symptoms reported, state-by-state data cuts, and border data on the COVID-19 outbreak. Beyond the physical effects of the pandemic, the research team notes how this dashboard can illustrate the emotional impacts of the pandemic, as well.

“The language people across the US are using in Tweets in 2020 indicates much higher levels of stress, anxiety, and loneliness when compared to language of Tweets in the previous year,” shares Dr. Lyle Ungar, World Well-Being Project’s Principal Investigator and Professor of Computer and Information Science at the University of Pennsylvania. "This provides empirical evidence of the psycho-social effects of the pandemic. You have to stop the chain of transmission and provide testing and intensive medical care to all who need it, but we can also start thinking about how we provide for the mental health and social needs of a nation undergoing a stressful, long-term health care emergency.”

Beyond population-level insight and understanding, the team is particularly interested in how this data can help communities and individuals rapidly receive local, relevant information and resources to cope with a major public health crisis. A recently launched initiative, Penn Medicine With You, uses aggregate regional information from Twitter to inform their website and text-messaging service. The service uses this information to disseminate relevant and timely resources.

Explore all the work this team is doing related to the pandemic at the Center for Digital Health COVID-19 Hub.

Further reading


Ready to start your next project?

Discover resources and inspiration for your next research study, or learn more about our Academic Research access.

Explore more

See how others have used the Twitter API.