Clinically-relevant COVID-19 tweets authored by health-care professionals from January to June 2020

The rapid evolution of the COVID-19 pandemic has underscored the need to quickly disseminate the latest clinical knowledge during a public-health emergency. One surprisingly effective platform for healthcare professionals (HCPs) to share knowledge and experiences from the front lines has been social media (for example, the "#medtwitter" community on Twitter). However, identifying clinically-relevant content in social media without manual labeling is a challenge because of the sheer volume of irrelevant data. This dataset attempts to automatically extract tweets authored by HCPs and then filter for clinically relevant content. The dataset is derived from a large set of English tweets related to COVID-19 (retweets and bots removed) from January to June 2020 (version 14). We utilize a regex based filter on user names, screen names, and bios to identify likely HCPs, narrowing down from around 52 million tweets to around 1 million. We augment the dataset by including any additional tweets in threads for which at least one tweet is present in the dataset. This results in tweets_level_0.csv. Note that this set contains almost all self-declared HCPs, but also includes some false positives; therefore, we develop an iterative relevance filtering pipeline that uses topic modeling and MetaMap concept annotation to identify and enrich clinically-relevant content. Subsequent files represent the outputs of each iteration of filtering. Please see our preprint for more details about our filtering method. Each CSV file includes the following fields: "id" (the tweet ID, accessible using the Twitter API), "thread_id" (a generated value that is shared by multiple tweets in the same thread), and "date" (the date that the tweet was posted). Due to Twitter policies, we cannot provide the contents of the tweets, and ask that you "hydrate" the tweets using a Twitter API tool such as twarc. Note that some tweets may have been deleted since the collection of our dataset and will no longer be available.

Clinically-relevant COVID-19 tweets authored by health-care professionals from January to June 2020

Description

Citations (0)

No citations found

Mentions (0)

No mentions found

Metrics

Metrics Over Time

Publication Details

Assigned Domain

Keywords

Normalization Factors