21. Biweekly Update: A New Model

(6/20/20) Us modelers are having a field day battling for supremacy. Well I am supreme, but nobody has noticed. So, I made some important upgrades so I am now supremer. Spread the word.

The Problem

Now for some tech talk. There are two camps for COVID-19 modeling trying to be the crystal ball:

  • The epidemiologists who have sophisticated kinetic models based on solving several coupled differential equations with many variables, e.g., infectious rate, incubation time, recovery time, population densities, mobility, immunization rates, etc. Their greater value is in taking all the data after the fact and determining all these variables for a particular epidemic and then using them for future (same) epidemics. They say that they can predict based on known previous variables and making certain assumptions, but they don’t predict very far out. These models are sometimes categorized as “dynamic” models, though I never understood what is dynamic about them.
  • Then there are the forecasters, like me, who realize that we don’t know all these variables and who only care to forecast ahead using whatever reliable data there is. I chose deaths since they are pretty real and measurable. Others have chosen number of cases, which I have argued is less reliable because it is convoluted with extent of testing and have shown that the reported and real number of cases were off by greater than 10x. They are now generally within a factor of 2-3x, but that is still too large for me. Us forecasters use real-time data and we are categorized as “statistical” models.

The epidemiologists like to criticize these statistical forecasting models because they keep changing their forecasts. Well, that is what forecasts do; would you really not want the weatherman to update things based on new information? Conversely the forecasters cite the litany of unknown variabilities and the inflexibility of epidemiology (let’s shorten to Epi) models as being their shortcomings.

So, at first the two groups defended their corner of the box, but what is reassuring about scientists is that they are by nature introspective and searching for the truth and that the truth doesn’t lie in either of these corners. What is also reassuring is competition, not only to be right, but also not to be wrong. So, what is happening is both groups are seeing the limitations of their models and starting to adopt pieces of the other so that we are now getting hybrid models.

So, this brings us to how I ended up in the middle of the box with a bunch of epi modelers. Despite my being a scientist and needing to understand everything about a process, I am also trying to be pragmatic about solving problems and given that COVID-19 doesn’t give you an eternity to solve a problem, we need to come up with practical tools. This is my business side talking and it seems to mesh well with my science side. So, for this time-critical problem, and not enough time to do a PhD thesis on it (because you know that takes at least 4 weeks), you reach for whatever resources get you the answer. Looking at previous epidemics, and let’s thank China for doing this all for us before they infected the rest of the world, you see that infections and death go up and they go down. Based on the recent China epidemic and also data for the 1918 Spanish Flu these trends look rather Gaussian in shape. So, for forecasting you look for a shape function that is realistic to history and use that to take the emerging data trends and project forward.

The Gaussian model (you know it as the Bell Curve) worked really well on the upside of the epidemic, but with social distancing and then easing, the recovery after the peak of death (and case) rate did not go down symmetrically relative to the rise. So, this was at first easily fixed with an asymmetric Gaussian model that I introduced that gave different rise and fall characteristics. But then this decay shape didn’t match well further into recovery because of persistence in cases and deaths. This is largely due to relaxing of social restrictions.

So how do we deal with this? Epi models don’t usually allow for this changing of their sacred infectious parameter R0 and so they pretty much get it wrong. And a Gaussian model like mine, even adjusted for a different recovery, doesn’t handle the tail very well where deaths are continuing.


To better understand the Epi models and where they break down, I programmed the coupled set of differential equations for what is called a SEIR model (Susceptible, Exposed, Infectious, Recovered). You can look up the equations by just Googling SEIR so I won’t show them here. I also added death since that is what we are trying to forecast, but that is easy by just picking a mortality factor for the recovered population. The problem with the standard SEIR model is that the transmission factor R0 is made a constant. SEIR models are intended for epidemics that literally infect everyone so the susceptible population goes to zero and you have herd immunity. There are models that consider not all recovered people getting immunized and they feed back into the susceptible population, but we don’t need to consider this here because we are looking at a range where a minority of the population gets infected. My Gaussian model doesn’t care about the fraction of the susceptible population that gets infected. We just care about deaths and from rates and the total we can derive curves for prevalence (active cases or infectious) and incidence (exposed) as I’ve lectured in earlier posts.

In order to make the SEIR model work better for COVID-19 it needs to have an adjustable R0 representing transmission rate before people realize they need to be careful and here we implement two more R0 values for when social distancing and then easing occur. There then needs to be time for these changes in the equations. Starts to get very complicated. But now the SEIR dynamic model has provisions for statistical forecasting.

The asymmetric Gaussian model I originally postulated had only a single change in transmission rate (our sigma value), which was pegged at the peak of the death rate curve. However, this didn’t account for social easing so I added an additional one. So, the asymmetric Gaussian statistical forecasting model has provisions associated with a SEIR model. However, this is not some amazing unification of diametrically different models because the R0 and our σ values do not have simple relationships. So, for us we consider σ a fitting parameter. The whole point here is to come up with a shape function that fits the previous data and extrapolates well into the future.

OK, so here are the results. First the table that compares the inputs to the three models described.

σ and τ are in units of days

The thing to notice here is we tried to make the inputs as similar as possible between the two models. However, there is not a one-to-one correspondence of variables and as noted above the R0 and σ values, which represent transmission in each model are very different because they plug into very different equations. So, the following plots are for death rate and cumulative deaths by the three models in the above table.

Data for U.S. death rate up to 6/12/20 and corresponding curve fits for the three statistical models considered here. The cumulative death count on this date was 113,820.

The key observations are:

  • The asymmetric Gaussian (red curve) does well fitting to the rise and about halfway down the fall. However, it does not forecast the slowing in the decline of the death rate. This is also seen in the cumulative deaths.
  • The SEIR Gaussian and SEIR Statistical models now forecast nearly exactly the same for the parameters in the Table above. The former model may be a little better at the onset of the epidemic as can be seen by a slightly sooner rise by the latter, but this is inconsequential when integrating over the entire death rate curve.

Now we can summarize the forecast of total deaths at various future dates. The intent here was to show that the simpler SEIR Gaussian model and replicate the forecasting of the more complicated SEIR Statistical model and they are very close as you can see, but this was made deliberately. By now having put some SEIR into the Gaussian model we can perhaps anticipate social behavior better. However, we also caution that we are assuming no further social easing, but if social behaviors worsen there could be a much longer death tail and even a resurgence. The UW IHME model, which has consistently under forecasted, a few weeks ago changed their algorithm and now show rampant growth in certain populations (such as CA) that results in much higher forecasted values for all of the U.S. We shall see. As they say “it is difficult to make predictions …. especially about the future.”

SEIR forecasts for U.S. deaths assuming current social easing conditions.

The key take home is that the SEIR Gaussian model can offer comparable forecasting power as the SEIR Statistical model. Though it was shown that both approaches offer comparable forecasting capabilities work, our SEIR Gaussian model uses a single function and only needs the variables σ(1), σ(2), σ(3) and τ(2,3). The parameter τ(1,2) can be fixed at zero and the only other variable used, d, is for calculating and forecasting of case prevalence and incidence. The SEIR Statistical Model requires solving five coupled differential equations and requires the variables r, R­0(1), R­0(2), R­0(3), τ(1,2) and τ(2,3).

The scientific version of this description has been posted on MedRxiv (https://www.medrxiv.org/content/10.1101/2020.06.21.20136937v1.article-metrics). The original model can be found on https://www.medrxiv.org/content/10.1101/2020.05.16.20104430v2

17. Weekly Update: Kudos to NY and NJ

(5/14/20) Most of the world and U.S. states are improving in key statistics but agonizingly slow with some exceptions that we highlight. But fortunately, the two biggest hot-spots in the world, NY and NJ, appear to be recovering well. Every other statistic for the U.S., however, lags the rest of the world and underscores the serious consequences of our nation’s delayed and unprepared response to COVID-19.

The plots below show the familiar death rate curves for hotbed countries and U.S. states. We retain Iran for one more week and plan to show Sweden next week as an example of a lackadaisical approach to social containment.

There were no new upgrades in our 3-color ranking system Internationally and Spain is on the verge of a downgrade for its stubbornly persistent death rate. Domestically we gave NY and NJ well-deserved upgrades but WA is on the brink of a downgrade. The NY and NJ death rate decline is faster than most other populations as you can see from the plots below. This can turn at any point, and NJ still shows signs of new outbreaks, so hopefully they do not relax social restrictions too aggressively and start another firestorm. In fact, the whole COVID-19 situation around the world feels like a huge forest fire that we may believe we are just about to contain, but a sudden change in weather could cause another uncontrollable outbreak. With the social, economic, and political pressure to increase social easing, this is bound to happen. Two states that were early leaders in taming the outbreak, WA and CA, are now having a tough time reducing deaths and active cases as evidenced by the plots below. (You can read about the specifics of CA and Orange County in just released Post 16. Can Orange County, CA Begin Opening this Week?)

We continue to plot a symmetric Gaussian but for visualization only. Our analyses now use asymmetric functional fits that we will detail in a separate post in the near future.

Next is our familiar table for forecasted total deaths, prevalence (current cases), and incidence (new cases) along with their values per capita (per million people) as well as dates we consider to be the earliest to begin a graduate easing of social distancing. These results fully incorporate our asymmetric Gaussian model, introduced last week and to be described in a future post and publication. We remind readers that these forecasts do not account for future premature social easing that could set off new outbursts. The forecasts do, however, represent the extent of social distancing to date as they are reflected in the actual death data.

The threshold prevalence for the easing date was raised this week from 100 to 200 active cases per million population for no better reason than I think I was being too stringent. This number really depends on the tolerable death rate, which is a subject we will treat in a future post.

Key observations include:

  • The U.S. trails the rest of the world: It is hard to criticize our country, but we can’t ignore tough lessons not just for the next pandemic, but for this one if our administration makes yet another mistake and sends premature messages on social easing and digs us into a deeper death trap. By every statistical measure the U.S. lags the rest of the world in handling COVID-19; (i) Next to last to declare an emergency; the U.K. was last, (ii) the last to reach the death rate peak, (iii) last to implement testing and protective gear, and all still at inadequate levels per capita,  (iv) has 5% of the world’s population but 30% of the deaths and nearly half of the prevalence (active cases), (v) will be last to be safe to ease social restrictions (but we will not be the last to implement it), (vi) has seen the most upward forecasts of death by major models of any country (see plot below) meaning our social distancing is not being rigorously practiced.
  • Jack’s rant: I have resisted taking any political views in this forum and have just reported the facts like a good impartial scientist hoping that policy makers will respond appropriately to these facts. However, our nation continues to mismanage this pandemic and is now further sidelining and ostracizing well-meaning medical experts from reporting the truth in order to push a politically motivated agenda to revive the economy. I am all for revitalizing the economy, but at what cost? I will have more to say on this in a future post. But we all have to start speaking out as this callous behavior is needlessly costing tens of thousands of American lives!
  • NY and NJ: These states have made excellent progress in reducing the death rate; however, because they started at such a high level, they still have the largest per capita death rates in the world being, respectively, 69 and 130 deaths per week per million population vs. the world’s worst of 48, 45, and 34, respectively, for the U.K, Sweden, and the U.S.
  • Social easing: It is understandable that we must give great consideration to the economy, but we will be worse off if we socially ease prematurely. Easing as little as 2 weeks too soon could lead to epidemic growth again and require another 2 months of social distancing. That is an atrocious tradeoff.

The table below compares our total death forecasts to the benchmark model from the Institute for Health Metrics and Evaluation (IHME) at the University of Washington (UW) (http://www.healthdata.org/covid/).

The IHME model dropped the reporting of ‘days from peak’, but they do report the peak date so we can calculate that above.

The two comparative models give similar results (plots below) suggesting a similar algorithm, e.g., strong dependence on death statistics. By some measures we may be performing better in terms of week-to-week volatility and quickness to detect new trends as can be visually see in the plots below. To compare volatility, we calculated the sum of squares for error (SSE) for variability relative to the latest forecast values. By this SSE measure the IHME model forecasts have varied greater from week to week than the present model for all but one of the cases (France). If averaged for the international and U.S. states, respectively, that we track the SSE’s are: 26% and 26% for our model vs. 41% and 46% for the IHME model (lower means less variability). At present we do not see a penalty to the present model’s relative stability, but time will tell. It also appears that they are about a week behind the trends that we are forecasting as evidenced by their weekly adjustments tending to values we forecasted the previous week. On the other hand, they have made a brazen call on doubling the U.S. forecasted total deaths (not helping their volatility factor), a trend we also see but not to the same magnitude. We hope they are wrong for our country’s sake!