Dataset

Computational journalism had been much depending on in-house, private data, and only a few researchers benefited from such data. Recently, there have been significant efforts to make valuable datasets publicly accessible for the research community. We hope that NECO will be the place for researchers to discuss such datasets.

Coronavirus (COVID-19)–More will come…

For those interested in tracking the media narratives around COVID-19,
GDELT-Television news on COVID-19
GDELT-Online news on COVID-19

For those interested in understanding public around COVID-19,
COVID-19-TweetIDs

For those interested in real-world statistics of COVID-19,
PyCOVID Package 2019-nCoV Data Processing Pipelines and datasets COVID-19 Dashboard

Yahoo News Feed
Yahoo News Feed dataset, version 1.0
Yahoo has recently released a massive collection of user interactions on the news feeds of Yahoo services, including Yahoo News. We appreciate Yahoo’s contribution to the research community and hope to see much inspiring works.

The GDELT Project
The GDELT Project
The GDELT project is one of the largest data collection of news. Supported by Google Ideas, it monitors “the world’s broadcast, print, and web news” around the world (see details GDELT 2.0). From February 2015, GDELT continuously adds experimental features, such as categorizing “all of the news imagery” (see details Visual GKG).