8. A Simple Model for Forecasting Final Fatalities

(4/3/20) I believe models should be as simple as possible and rely as much as possible on hard data, e.g., deaths

Always check my post called “Daily Rumblings” for late breaking updates.

The Gaussian Model introduced in my last blog (#7) can be extended to forecast the number of fatalities that will occur as the epidemic in a particular population reaches recovery. If one is monitoring the death rate per unit time (days, weeks, etc.) then one can match the shape of that curve to a Gaussian growth and recovery curve and determine how far up or down the curve the actual data lies. Based on the number of fatalities that has occurred on the latest date, one can extrapolate how many more deaths will occur after traversing the entire curve to recovery. The Figure below shows how this works.

Gaussian fatality model using the observed death rate data for Italy. The horizontal axis numbers represent week for convenience, but in fact this model is not dependent on time.

This model assumes that the rate of deaths (and case prevalence also, if one could only measure that well) will follow a rise and then a fall. A Gaussian model works well because it begins to rise exponentially then becomes relatively linear before rolling over and peaking. The recovery is then assumed to following a similar trend in reverse as shown in the bottom plot above. The death rate data for China (Post #7) bears this out. Now the total number of deaths up to a particular point on the Gaussian rate curve is obtained by integrating all the deaths to that point and is shown by the middle plot above. Now if one knows where on the Gaussian rate curve a particular population lies, then the final death count can be extrapolated from the current death count. The factors that convert current deaths to final deaths are shown in the top plot above and the Table on the right.

We show by example the case for Italy. The death rate in Italy has been rising exponentially, but is beginning to show a perceptible slowing from pure exponential growth (pseudo-linear region). These daily death counts are overlaid on the Gaussian rate curve as best as we can visualize. There is a large uncertainty particularly in the near-linear region of the Gaussian such that we could easily place the Italy data such that the last date overlays with week 4.5 rather than week 5.5 as shown. We therefore define this as the uncertainty boundaries for extrapolating to final fatality forecasts. The dotted lines represent these two limits and by tracing up to the multiplicative factor on the top plot we can calculate a final fatality based on the current total deaths.

The results of this model for our highlighted countries and U.S. states is tabulated in the Table below for observed death rate and total deaths as of 04/01/2020 (See Post #7 for these results).

For a particular population, the lower the number of the week on the curve the further from recovery is that population and the greater is the fatality factor relative to the current total deaths. The following observations can be made:

  • The U.S. total fatality is projected to be between 74,365 and 391,502. The large uncertainty is because the current death rate is still on the steep part of the Gaussian curve.
  • China is already near full recovery so the 1-week uncertainty is literally about 12 deaths out of over 3,000.
  • Iran appears at the top of the death rate curve, which projects to a doubling of the current deaths as it progresses down the rate curve.
  • Regarding the severity in different countries, the U.S. is projected to have the largest final death count in the world, though Italy, Spain and France are projected to have greater death counts per capita (expressed as per million in the above Table).
  • Regarding the U.S. states, Washington is furthest along the fatality (Gaussian) rate curve and should peak shortly. New York is still in dangerous territory still exhibiting an exponential death rate. California is progressing further along, but still near exponential. New York is projected to have a final per capita fatality count of greater than 10x that of Washington and California.

There have been a number of reports of projected deaths in the news, some outlandish as they do not assume any social isolation reductions and many that include a host of variables. Our U.S. administration is now projecting 100,000 to 240,000 total deaths, which fits between our uncertainty limits. The University of Washington updates their projections nearly daily and currently forecasts the following (https://covid19.healthdata.org/projections):

  • U.S.: 93,531 people and 13 days from the peak death rate. This lies at the bottom end of our range and we forecast about 3 weeks from the peak.
  • Washington: 978 people and 7 days from the peak. This lies at the bottom end of our range and we forecast about 1.5 weeks from the peak.
  • New York: 16,261 people and 8 days from the peak. This is below our bottom estimate and we forecast about 2.5 weeks from the peak.
  • California: 5,068 people and 24 days from the peak. This is higher than our upper forecast. They apparently believe that CA is further from a peak than our estimate of 2 weeks.

There are several caveats and assumptions to this model:

  • Death rates may not follow a Gaussian nor do they necessarily follow a symmetric rise and fall. However, historical data, such as China for the current epidemic and data from the 1918 Spanish Flu appear to follow near Gaussian behavior.
  • We make no assumptions regarding social distancing, other interventions, anti-viral treatment, etc. We assume these are all embedded in the reported death rate data.
  • We assume reported death rates and totals are accurate. They are certainly more accurate than reported case prevalence and incidence, which is heavily dependent on testing and generally vastly understated relative to the real numbers.
  • This model does not have a time component to it. In fact, virus epidemics know no time. However, for convenience we have expressed the Plots above in terms of numbers that as best as we can deduce from observed data represent weeks.

The utility of this model is that it is based solely on hard data, namely deaths and doesn’t rely on less certain variables. As each country and state moves up and over the curve, we will be able to refine the final fatality projections and reduce the uncertainty.