I want to know about recently available datasets for fake news analysis
-
3There are methods/algorithms to automatically identify fake news, see e.g. 10.1002/pra2.2015.145052010082. I don't know whether there are ooen datasets. – FuzzyLeapfrog Feb 05 '17 at 20:46
-
So, you want samples of fake news so that you can analyze these news articles, right? – Nicolas Raoul Mar 23 '17 at 07:34
-
I need an annotated dataset with fake and real news articles with their links – Paramie.Jayasinghe Mar 31 '17 at 06:36
4 Answers
Buzzfeed News has been doing work on this, and has published data related to fake news, news patterns, and social media patterns on their Github: https://github.com/BuzzFeedNews/everything. Might be a good repo to browse.
- 351
- 1
- 6
Here are some of the datasets available for fake news detection:
LIAR dataset: https://www.cs.ucsb.edu/william/data/liar_dataset.zip
BS Detector: https://github.com/bs-detector/bs-detector
- 21
- 2
-
-
The LIAR dataset is at: https://www.cs.ucsb.edu/~william/data/liar_dataset.zip – James O'Brien Mar 14 '19 at 21:17
You should check out the Observatory on Social Media (OSoMe) at Indiana University. The team have been been archiving 10% of public activity on Twitter for the last 10 years. The data isn't directly available to people not affiliated with the University they have a number of algorithms and visualization tools that you can run against the data.
- They have a service called 'BotSlayer' which you can set up yourself on a free AWS instance and track certain hashtags and key phrases.
- There is also 'Botometer'which will assess any twitter user name and socre it based on how 'bot-like' it is.
- Finally, they have a tool called 'Hoaxy' which allows you to visualize the spread of a news or fake-news story across twitter to see which accounts are sharing/re-tweeting it.
- 211
- 1
- 6
Kaggle hosts a dataset where the CSV has URL, title, text, and a flag "reliable" or "unreliable"
https://www.kaggle.com/c/fake-news/data
id: unique id for a news article
title: the title of a news article
author: author of the news article text: the text of the article; could be incomplete
label: a label that marks the article as potentially unreliable
1: unreliable
0: reliable
accessing the data requires registration