25

There are some very good sites showing the state of infections 'now', but I would like to find a data set of COVID-19 infections, deaths and number of tests by day and by country. These are the best I can find, but it doesn't quite meet what I'm after.


Notes:

Marcus D
  • 1,119
  • 1
  • 9
  • 26

15 Answers15

16

The JHU dashboard you linked has their data available as CSV on Github:

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data

in particular these CSV files, which contain all historical timeseries

https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series


Here is a coincidental Python3 scraper for the "confirmed cases", which reads the online CSV and does some parsing.

https://gist.github.com/philshem/fb60c1697f46f66b184c1f624283fd6a

#!/usr/bin/python3

# pip3 install pandas matplotlib
import pandas as pd
from matplotlib import pyplot as plt

# this gets the worldwide confirmed cases from JHU
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv'

# read/download csv
df = pd.read_csv(url)

# filter on Swiss data
# df = df[df['Country/Region'] == 'Switzerland']

# remove unneeded columns
df.drop(['Province/State', 'Lat','Long'], axis=1, inplace=True)

# unpivot aka "melt"
df = df.melt(id_vars=['Country/Region'], var_name='date', value_name='confirmed_cases')

# rename column
df.rename(columns={'Country/Region': 'country'}, inplace=True)

# convert date column to date-type
df.date = pd.to_datetime(df.date)

# not needed
#df = df.set_index(df.date)

df.to_csv('confirmed_coronavirus.csv',index=False)
#print(df)

# write to plot, save as file
plt.plot(df.date, df.confirmed_cases, '-')
_ = plt.xticks(rotation=45)
plt.savefig('confirmed_coronavirus.png')

There is also an API source for the JHU data:

https://covid19api.com/#details

Access data on COVID19 through an easy API for free. Build dashboards, mobile apps or integrate in to other applications. Data is sourced from Johns Hopkins CSSE

Documentation: https://documenter.getpostman.com/view/10808728/SzS8rjbc

philshem
  • 17,647
  • 7
  • 68
  • 170
  • This dataset seems stopped updating as of March 22, 2020. Any ideas of an alternative? – David Nehme Mar 25 '20 at 20:45
  • @DavidNehme seems just a lag to update the files. See yesterday e.g. https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_reports/03-25-2020.csv – philshem Mar 26 '20 at 19:28
11

Index of Countries (community wiki)

Global

Europe

North America

Asia

Oceania

Marcus D
  • 1,119
  • 1
  • 9
  • 26
philshem
  • 17,647
  • 7
  • 68
  • 170
  • For Germany, I'd like to point to https://github.com/jgehrcke/covid-19-germany-gae -- official time series data for individual counties / states. Proper CSV files for robust parsing. Updated daily. With improving history as historical data gets better over time. – Dr. Jan-Philip Gehrcke Dec 27 '20 at 16:15
9

The Italian data from the Civil Protection Agency is updated daily at

https://github.com/pcm-dpc/COVID-19

There are a few data sets (CSV) in there. Aggregated data is published at http://opendatadpc.maps.arcgis.com/apps/opsdashboard/index.html#/b0c68bce2cce478eaac82fe38d4138b1

Bruce Becker
  • 211
  • 1
  • 5
6

I successfully parse WHO situation reports and convert them to CSV, from march 1. Older reports are too unstructured. See here: https://github.com/gibello/whocovid19 (reports in data/csv). On the site, also links to ECDC and Johns Hopkins data.

6

The CDC has US-wide cases, per date

https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/testing-in-us.html

It's an HTML table, and I haven't found a more machine readable source. But with Python & Pandas you can easily read the data into a dataframe

import pandas as pd
df = pd.read_html('https://www.cdc.gov/coronavirus/2019-ncov/cases-updates/testing-in-us.html')

print(df[0])

And to dump to a local CSV file, which will require some post-processing

df[0].to_csv('cdc.csv',index=False)
Date Collected,CDC Labs,US Public Health Labs
1/18,4,0
1/19,0,0
1/20,7,0
1/21,3,0
1/22,10,0
1/23,36,0
1/24,53,0
1/25,101,0
1/26,79,0
1/27,77,0
...

You'll need these packages

pip install pandas html5lib lxml
philshem
  • 17,647
  • 7
  • 68
  • 170
6

German data can be found here:

https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Fallzahlen.html

Parseltongue
  • 221
  • 1
  • 4
  • German data on the county level are linked here: https://opendata.stackexchange.com/a/17304/23668 – Peter Mar 29 '20 at 14:22
6

German Covid-19 data from RKI (Robert Koch-Institut) on the state (Bundesländer) and county (Kreise) level can be found here.

Data on the county level include patient information such as age group and gender. County level data can be merged with county level information from the INKAR database via county id (Kreiskennziffer).

Update (2020-03-30): There is now a register of intensive care capacities in German hospitals.

Update (2020-04-06): An overview of the situation in Austria, can be found here. Find the official dashboard of the Ministry of Health (Gesundheitsministerium) here.

Update (2020-04-06): A timeseries of cases and deaths in Germany can be found here (updated every couple of days).

Update (2020-11-12): Find an overview of antibody studies in Germany here.

Peter
  • 206
  • 1
  • 9
  • 1
    A plot and .csv of covid-19 cases and nr. hospitalized in Germany per week, 2 March to 23 August, are under https://gist.github.com/denis-bz/. Also jgehrcke/covid-19-germany-gae had .csv s with RKI data per day and city / Landkreis, updated twice a week or so. – denis Aug 28 '20 at 15:36
5

Several places for data of Switzerland Covid-19 cases

quasi-official: https://github.com/openZH/covid_19

unofficial: https://github.com/daenuprobst/covid19-cases-switzerland

Comparison between Swiss sources, including links to sources : https://observablehq.com/@republik/sars-cov-2-covid-19-data

more to come (hopefully data, not cases)

philshem
  • 17,647
  • 7
  • 68
  • 170
5

Spain data can be found here:

https://www.mscbs.gob.es/profesionales/saludPublica/ccayes/alertasActual/nCov-China/situacionActual.htm

Parseltongue
  • 221
  • 1
  • 4
5

My take on UK data: https://github.com/sainnr/covid19-uk-data-capture. The goal is to represent daily updates published by UK official bodies in the machine-readable format.

Currently, it consists mainly of two parts:

  • number of cases within the UK (fatal/recovered/positive/total) from 30 Jan till today
  • number of confirmed positive cases within regions (local authorities) breakdown

I'm a sole maintainer at the moment and do my best to update it within 24h after the publication of daily numbers.

Hope this helps someone to build anything useful on top!

4

Chinese data: https://github.com/Avens666/COVID-19-2019-nCoV-Infection-Data-cleaning-

It is incomplete (from February to March 19th) and the explanations are in Chinese.

Vicky Ding
  • 151
  • 1
4

ECDC (European centre for disease prevention & control) world data in CSV format, frequently updated: https://opendata.ecdc.europa.eu/covid19/casedistribution/csv

(From https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide - there's also xml and json)

casper.dcl
  • 141
  • 2
3

I have made a repository on github to crawl the data from wikipedia about southeast asia countries (Singapore, Malaysia, Vietnam, Thailand etc), as data for all other countries are widely collected.

https://github.com/caiyundong/covid19-sea

The approach of crawling can be used for other countries, and the data can be generated for API usage or file (json or csv).

3

Novel COVID 2019 Datasets by country, state in CSV format, frequently updated: WorldData.AI

Catherine
  • 31
  • 1
2

ESRI has their own Open Data

https://hub.arcgis.com/search

It has shapefiles that you take a look before download it

PROBERT
  • 1,295
  • 8
  • 11