Site is currently under maintenance
Some features may be unavailable or limited during this time. We apologize for any inconvenience and appreciate your patience.

Published on 12 June 2020

Coronavirus (COVID-19) Geo-tagged Tweets Dataset

View Dataset
Lamsal, Rabindra

Description

This dataset contains the IDs of geo-tagged tweets. The tweets were captured by an on-going project deployed at https://live.rlamsal.com.np. The geolocation data was extracted from the tweets which mentioned anything aboutnbsp;“corona”, coronavirus, covid and possible variants of sarscov2, nCov,nbsp;covid-19, ncov2019.nbsp;Complying with Twitter's content redistribution policy,nbsp;only thenbsp;tweet IDs are shared. You can re-construct the dataset by hydrating these IDs. The tweet IDs in this dataset belong to the tweets tweeted providing an exact location. Please note that this dataset should be solely used for non-commercial research purposes (ignore every other LICENSE category given on this page).Note: I started sharing the IDs of the tweets that had exact 'point' location information, only since April 28, 2020, with some genuine requests coming in from academic researchers who did not want to hydrate the whole lists of IDs (above 170+ million tweets) shared in thenbsp;Coronavirus (COVID-19) Tweetsnbsp;Dataset.If you need the geolocation-based data starting March 20, 2020, then use thenbsp;Coronavirus (COVID-19) Tweetsnbsp;Datasetnbsp;and hydrate the IDs while adding the following condition:data = json.loads(data)if data[coordinates]:nbsp; nbsp; nbsp; nbsp;longitude, latitude = data[coordinates][coordinates]The data is available in two formats: CSV and JSON.nbsp;I'll be sharing new files every day, and the files will be named period-wise. For example, april28-june5.zip will contain tweet ID and sentiment score of the tweets (in CSV and JSON formats) that were created between April 28, 2020, and June 05, 2020.Why are only tweet IDs being shared?nbsp;Twitter's content redistribution policy restricts the sharing of tweet information other than tweet IDs and/or user IDs. Twitter wants researchers always to pull fresh data. It is because a user might delete a tweet ornbsp;make their profile protected. If the same tweet has already been pulled and shared on a public domain, it might make the user/community vulnerable to many inferences coming out of the shared data which currently does not exist or is private.

Citations (0)

Mentions (0)

Metrics

Dataset Index

1.4

FAIR Score

58%

Citations

0

Mentions

0

Metrics Over Time

Publication Details

DOI

Publisher

IEEE DataPort

Assigned Domain

Subfield

Infectious Diseases

Field

Medicine

Domain

Health Sciences

Confidence Score

57%

Source

Scholar Data Model

Keywords

COVID-19Machine LearningCorona Tweets DatasetCOVID-19 Tweets DatasetCorona TweetsCOVID-19 TweetsCorona Twitter SentimentCOVID-19 Twitter SentimentSARS-CoV-2 Tweets DatasetSARS-CoV-2 Twitter Sentiment

Normalization Factors

FT

13.46

CTw

1.00

MTw

1.00