In [1]:
import sys
from generallib import *

#connection = getConnection()

startDate = '2020-01-01'
thisYearRolling = 1
lastYearRolling = 7

thisYearRollingLabel = ''
lastYearRollingLabel = ' (rolling 7d avg)'
In [2]:
display(md('# COVID-19 Pandemic'))

COVID-19 Pandemic

In [3]:
display(md('This is an analysis of the effects of COVID-19. Dates covered are '+startDate+' to yesterday.'))
display(md("Compiled ""%Y-%m-%d %H:%M:%S")+" UTC."))

This is an analysis of the effects of COVID-19. Dates covered are 2020-01-01 to yesterday.

Compiled 2020-03-28 12:00:04 UTC.

Why Should I Care?

Some may be tempted to downplay the severity of this pandemic. After all, a single-digit mortality rate doesn't seem to warrant the kind of panic we may see in a Hollywood blockbuster based on a Michael Crichton novel.

Let's get some perspective by really looking at the numbers.

In 2018, out of a group of 100,000 randomly selected Americans, about 724 died.(1) This is an average annual mortality rate of 0.72%. The average American knows around 600 people.(2) Of course, this varies by location, profession, and personality. But on average, you may learn that someone you know personally has died once every two years or so.

Now, let's compare this to our current situation. Mortality rate for COVID-19 has been calculated to be anywhere from 1% to 4%.(3,4) The consensus seems to be that the real mortality rate for COVID-19 will be around 2%.

In recent years, the annual flu infects and produces symptoms in around 35,000,000 Americans each year, or a little over 10% of the total population.(5) In your average-size circle of acquaintances, then, around 60 people may show flu symptoms. Now imagine that COVID-19 spreads at the same rate as the common flu. On average, at least one of those people will die. And all of this goes down in the span of a couple months.

The situation may actually be worse, as recent research indicates that there is a high rate of asymptomatic COVID-19 infections, meaning people are walking around with it and have no idea.(4) This is what leads to the higher mortality calculations of 3% or 4%, because the deaths are carefully recorded, while total infections may be under-counted by a large margin. If those estimates are true, two or three people you know may die.

COVID-19 Statistics

Data sourced from Johns Hopkins CSSE and is available here.

Infections and Deaths Overview

In [4]:
covidinfdf = pd.read_csv('')
covidinfdf = covidinfdf.set_index(['Country/Region','Province/State'])
#covidinfdf = covidinfdf.drop('China',level=0)
covidinfdf = covidinfdf.drop(columns=['Lat','Long'])

coviddeathdf = pd.read_csv('')
coviddeathdf = coviddeathdf.set_index(['Country/Region','Province/State'])
coviddeathdf = coviddeathdf.drop(columns=['Lat','Long'])

covidrecovdf = pd.read_csv('')
covidrecovdf = covidrecovdf.set_index(['Country/Region','Province/State'])
covidrecovdf = covidrecovdf.drop(columns=['Lat','Long'])
newcols = {}
for sub in covidrecovdf.columns:
    newcols[sub] = sub.replace('/2020','/20')
covidrecovdf = covidrecovdf.rename(columns=newcols)
In [5]:
covidtotdf = pd.DataFrame(data=None, columns=covidinfdf.columns)
covidtotdf.loc['TotalInfections'] = covidinfdf.sum(axis=0)
covidtotdf.loc['TotalDeaths'] = coviddeathdf.sum(axis=0)
covidtotdf.loc['TotalRecovered'] = covidrecovdf.sum(axis=0)

covidtotdf = covidtotdf.T = 'Date'
covidtotdf = covidtotdf.reset_index()

covidtotdf = covidtotdf[-30:]
covidtotdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in covidtotdf['Date']]
In [6]:
fig = generateGenericGraphDF('Total Infections, Deaths, and Recovered Worldwide',covidtotdf,['TotalInfections','TotalDeaths','TotalRecovered'],labels=['infections','deaths','recovered'])

Total COVID-19 infections worldwide.

In [7]:
fig = generateGenericGraphDF('Total Deaths Worldwide',covidtotdf,['TotalDeaths'],labels=['deaths'],ylabel='Deaths')

Total deaths caused by COVID-19 worldwide.

Infections and Deaths in Specific Regions

In [8]:
covidusdf = pd.DataFrame(data=None,columns=coviddeathdf.columns)
covidusdf.loc['usdeath'] = coviddeathdf.loc[['US']].sum(axis=0)
covidusdf.loc['usinf'] = covidinfdf.loc[['US']].sum(axis=0)
covidusdf.loc['usrecov'] = covidrecovdf.loc[['US']].sum(axis=0)

covidusdf.loc['chinadeath'] = coviddeathdf.loc[['China']].sum(axis=0)
covidusdf.loc['chinainf'] = covidinfdf.loc[['China']].sum(axis=0)
covidusdf.loc['chinarecov'] = covidrecovdf.loc[['China']].sum(axis=0)

covidusdf.loc['italydeath'] = coviddeathdf.loc[['Italy']].sum(axis=0)
covidusdf.loc['italyinf'] = covidinfdf.loc[['Italy']].sum(axis=0)
covidusdf.loc['italyrecov'] = covidrecovdf.loc[['Italy']].sum(axis=0)

covidusdf.loc['spaindeath'] = coviddeathdf.loc[['Spain']].sum(axis=0)
covidusdf.loc['spaininf'] = covidinfdf.loc[['Spain']].sum(axis=0)
covidusdf.loc['spainrecov'] = covidrecovdf.loc[['Spain']].sum(axis=0)

covidusdf = covidusdf.T = 'Date'
covidusdf = covidusdf.reset_index()
covidusdf = covidusdf[-30:]
covidusdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in covidusdf['Date']]
covidusdf = covidusdf.set_index('Date')

covidusdf['usdailydeath'] = covidusdf['usdeath'] - covidusdf['usdeath'].shift()
covidusdf['usactive'] = covidusdf['usinf'] - covidusdf['usrecov'] - covidusdf['usdeath'] + covidusdf['usdailydeath']
covidusdf['usdeathprob'] = covidusdf['usdailydeath'] / covidusdf['usactive']

covidusdf['chinadailydeath'] = covidusdf['chinadeath'] - covidusdf['chinadeath'].shift()
covidusdf['chinaactive'] = covidusdf['chinainf'] - covidusdf['chinarecov'] - covidusdf['chinadeath'] + covidusdf['chinadailydeath']
covidusdf['chinadeathprob'] = covidusdf['chinadailydeath'] / covidusdf['chinaactive']

covidusdf['italydailydeath'] = covidusdf['italydeath'] - covidusdf['italydeath'].shift()
covidusdf['italyactive'] = covidusdf['italyinf'] - covidusdf['italyrecov'] - covidusdf['italydeath'] + covidusdf['italydailydeath']
covidusdf['italydeathprob'] = covidusdf['italydailydeath'] / covidusdf['italyactive']

covidusdf['spaindailydeath'] = covidusdf['spaindeath'] - covidusdf['spaindeath'].shift()
covidusdf['spainactive'] = covidusdf['spaininf'] - covidusdf['spainrecov'] - covidusdf['spaindeath'] + covidusdf['spaindailydeath']
covidusdf['spaindeathprob'] = covidusdf['spaindailydeath'] / covidusdf['spainactive']
In [9]:
fig = generateGenericGraphDF('Infections in Epicenters',covidusdf,['usinf','chinainf','italyinf','spaininf'],labels=['infections in US','infections in China','infections in Italy','infections in Spain'],ylabel='Infections')

COVID-19 infections reported in epicenters.

In [10]:
fig = generateGenericGraphDF('Deaths in Epicenters',covidusdf,['usdeath','chinadeath','italydeath','spaindeath'],labels=['deaths in US','deaths in China','deaths in Italy','deaths in Spain'],ylabel='Deaths')

COVID-19 deaths reported in epicenters.

Daily Death Probability

Actively infected total is calculated by subtracting total previous deaths and total recovered from total infections. Daily Death Probability (DDP) is the ratio between deaths on a particular day and that total. This metric is a good indicator of quality of healthcare and is less likely to be affected by external factors than other metrics like daily mortality. It does require that its constituent components---infections, deaths, and recovered---be accurately counted.

If this is not the case, then second-best scenario is consistent inaccuracy, in which case different countries cannot be compared to each other, but a sudden or gradual change in this metric for a particular country is meaningful.

In [11]:
fig = generateGenericGraphDF('Daily Death Probability (DDP) in Epicenters',covidusdf,['usdeathprob','chinadeathprob','italydeathprob','spaindeathprob'],labels=['probability in US','probability in China','probability in Italy','probability in Spain'],ylabel='Probability')

Probability that any actively infected person will die on a given day.

DDP smooths over time as both numbers of actively infected and deaths increase. Of note:

  • Spain's DDP has slowly increased over time. This may indicate increasing burden on healthcare infrastructure.
  • Italy's DDP has recently started declining. This coincides with the recent transition from exponential to linear growth in deaths and may indicate "turning the corner" in that country.
  • US's DDP has decreased significantly, most likely due to increased testing.

COVID-19 Event Timeline

Below is a timeline of events related to COVID-19 that may have affected US consumer behavior.

In [12]:
dates = [
names = [
    '1st US infection',
    'WH forms task force',
    'US travel restrictions',
    '1st non-Chinese death',
    'Diamond Princess',
    'Dow drops 1000',
    '1st untraceable US case',
    '1st death in US',
    '10 dead in WA',
    '100k worldwide',
    '500 US cases',
    'Nat\'l emerg.',
    'NYSE halted',
descriptions = [
    'First COVID-19 infection in the US',
    'White House announces a dedicated task force',
    'Travel restrictions for those entering the US who have recently traveled in China',
    'First death of a COVID-19 victim outside of China',
    'Diamond Princess quarantine reported by media',
    'Dow Jones sheds 1000 points, beginning a five-day correction',
    'First case in the US that could not be traced to an origin',
    'First death of a COVID-19 victim in the US',
    'Four more dead in Washington state, bringing total to ten in that state',
    'Worldwide infections pass 100,000 mark',
    'Over 500 infections in the US',
    'WHO officially declares COVID-19 a pandemic',
    'President Trump declares national emergency',
    'NYSE temporarily halted after 2,725 point drop',
    'DHS issues "no unnecessary travel" advisory',
    'Congress agrees on $2 trillion stimulus bill'
timelinedf = pd.DataFrame()
timelinedf['Date'] = dates
timelinedf['Event'] = names
timelinedf['Description'] = descriptions
timelinedf = timelinedf.set_index('Date')
fig,axis = getTimeline("COVID-19 Event Timeline",dates,names,interval=3)

Details on most recent events:

In [13]:
timelinedf = timelinedf.reset_index()
def prettyDateFormat(val):
    valDate = datetime.datetime.strptime(val,'%Y-%m-%d')
    return valDate.strftime('%b %d')

timelinedf['Date'] = timelinedf.apply(lambda x: prettyDateFormat(x['Date']), axis=1)
Date Event Description
Feb 26 1st untraceable US case First case in the US that could not be traced to an origin
Feb 29 1st death in US First death of a COVID-19 victim in the US
Mar 04 10 dead in WA Four more dead in Washington state, bringing total to ten in that state
Mar 06 100k worldwide Worldwide infections pass 100,000 mark
Mar 08 500 US cases Over 500 infections in the US
Mar 11 Pandemic WHO officially declares COVID-19 a pandemic
Mar 13 Nat'l emerg. President Trump declares national emergency
Mar 16 NYSE halted NYSE temporarily halted after 2,725 point drop
Mar 16 Advisory DHS issues "no unnecessary travel" advisory
Mar 25 Stimulus Congress agrees on $2 trillion stimulus bill


  1. "Mortality in the United States, 2018", CDC. (
  2. "The Average American Knows How Many People?", NY Times. (
  3. "The WHO Estimated COVID-19 Mortality at 3.4%. That Doesn't Tell the Whole Story", Time. (
  4. "Coronavirus disease 2019 (COVID-19) Situation Report – 46", WHO. (
  5. "Disease Burden of Influenza", CDC. (