COVID-19 Projections Collaboration**

Eugene Kolker, PhD
Eugene Kolker, PhD
Member, COVID-19 Projections Collaboration

We offer the scientific, government, business, and policy communities a simulation tool to predict and monitor the effects of the changing dynamics of coronavirus disease 2019 (COVID-19) on the overall fatality rate, one of the most significant determinants of pandemic policy. This tool enables them to predict and monitor weekly fatalities while optimizing public health and economic wellbeing.

Given the variations in model-derived predictions, and despite striving for a consensus, experts often disagree about likely pandemic evolution. It is therefore essential to develop new complementary approaches to pandemic modeling to ease tensions and find an evidence-based, rather than trial-and-error-based, pathway to recovery.

Background and guidelines for reopening

We propose democratizing the policy response through a policy-relevant, user-centric, SIR-driven (Susceptibility, Infection, and Recovery),1 and robust approach using Monte Carlo simulations2 of COVID-19 fatalities. We are a team of researchers, physicians, executives, managers, and students with medical, legal, technical, and academic backgrounds.

This article is our collaborative and volunteer-driven contribution to the ongoing efforts to protect lives and minimize fatalities while steps are taken to restart the American economy. With minor modifications, the proposed approach is applicable worldwide to any country, province, state, county, and city with a population of approximately 1 million or more.

So far, the severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) pandemic has infected over 7 million people worldwide with COVID-19 and killed over 400,000 people.3–5 The United States has been hit especially hard, with over 100,000 deaths, a quarter of worldwide fatalities.3–5 [For a breakdown of U.S. statistics, please visit the COVID-19 Projections Collaboration website .6 There, a Supplementary Materials page includes the breakdown in Table S1. Demographics and COVID-19 Data.]

The pandemic has led to widespread quarantines, social distancing rules, and shutdowns at the federal, state, county, and city levels. These measures have been effective to slow the spread of SARS-CoV-2, minimize fatalities, and enable the healthcare system to keep up with immediate surges in demand and save lives.3–10

As infection rates and fatalities decline, experts and policymakers are introducing guidelines for gradual reopening.9 [For a summary of these guidelines, please visit the COVID-19 Projections Collaboration website, which includes a page titled Guidelines for Reopening.6] These efforts are paired with legitimate concerns of viral resurgence if proper measures are not taken. Many states are announcing plans to reopen in phases, with each phase gradually relaxing earlier restrictions. These plans are heavily influenced by epidemiological and public health models, for which we propose an enhancement.

Monte Carlo simulations

We propose a Monte Carlo simulation approach2 to estimate the ranges of key parameters relevant to the questions: “When will things return to a new ‘norm’?” and “What is the best way to monitor the rapidly changing situation at regional levels?” Monte Carlo simulations use predefined probability distributions of key input variables to calculate unknown outcomes, such as how many fatalities could occur.

Using computation power, these calculations undergo thousands of iterations, each generating a series of outputs, leading to a range of estimates with a higher confidence than a single point estimate.

The proposed Monte Carlo simulation approach is built with four customizable parameters: Basic Reproduction Number (a measure of rate of transmission, R0), Infection Fatality Rate (IFR), Weeks from Infection to Recovery/Fatality, and Weekly Fatality Threshold (WFT). [For additional technical details, please visit our COVID-19 website.6]

WFT is an average of flu-like-related U.S. fatalities over the previous decade, estimated per 100,000 people, based on data from the Centers for Disease Control and Prevention,11 and rescaled for different regions, states, counties, and cities.6 Our model enables robust fatality prediction and can be used to monitor weekly outcomes. To mitigate the effects of noise, we build our simulations on data from three distinct sources.3–5

Extrapolating the number of infections

We modeled data from the seven states constituting the Northeast Pact and the three states constituting the West Coast Pact. [For additional details, see Figures S1.A & S2.A, respectively, in the Supplementary Materials presented on our website.6] Additionally, we illustrate the relevance of our approach with two sample counties: King County in Washington State and Westchester County in New York State. Weekly raw data from three sources were downloaded during a period lasting about two months and ending May 18, 2020.

In the proposed Monte Carlo simulation approach, all four parameters require Low, Likely, and High values. These values for the first three parameters (R0, IFR, and Weeks from Infection to Recovery/Fatality) are obtained from the SIR disease spread models1 for New York, the hardest-hit state.3–6 In other words, the outputs from other SIR models are being used here as the inputs for our Monte Carlo model.

The fourth parameter, WFT, should be recalculated for the flu-like-related seasonality, weekly fatalities, and different population sizes.6,11 These four customizable parameters, along with recent weekly fatality data, enable relevant, case-specific, targeted, and timely projections. We performed the Monte Carlo simulations using the SIPmath™ Modeler Tools12 and Microsoft Excel. [For procedural details, please visit our website, which includes a page titled Excel Instructions.6]

Using weekly fatality data from the previous week(s), one can extrapolate the number of infections two weeks prior and then project out later weeks by multiplying the number of infections by R0. [For an example of such an extrapolation, please visit our website and download the Excel document provided.6] From the number of new infections, we forecast weekly fatalities by multiplying the fatality rate distribution by the number of infected people. This is done based on the number of fatalities during the previous weeks, depending on the simulation’s “time to no longer contagious” distribution. The simulation runs 1000 times and forecasts weekly fatality. Finally, it graphs the 95th, 50th, and 5th percentile values for each week.

Our model calculated WFT for Washington State (population of ~7.62 million) to be ~65.8 and forecasted that the state’s COVID-19 fatalities would drop below the state’s WFT threshold the week of May 25 (Figure 1A). Washington State’s King County (population of ~2.25 million) is also predicted to hit below its WTF the week of May 25. [For details, visit our website’s Supplementary Materials and review Figure S1.B.6]

TM Thought Leader figure 1A
Figure 1A. Weekly fatality forecast for Washington State. Range of fatality forecast from Monte Carlo simulations. Data derived from the COVID Tracking Project3 were visualized with Tableau.


New York State (population of ~19.45 million) has WFT of ~167.9. It is predicted that the state’s COVID-19 fatalities trend below the state’s WFT the week of June 15 (Figure 1B). New York State’s Westchester County (population of ~1 million) is predicted to drop below the county’s WFT the same week as the rest of New York State. [For details, visit our site’s Supplementary Materials and review Figure S2.B.6] These projections can inform decisions on when to reopen Washington State’s King County and New York State’s Westchester County, as well as when to reopen the rest of Washington State and New York State. [For more details, visit our website’s Supplementary Materials and review Table S2.6]

TM Thought Leader Figure 1B
Figure 1B. Weekly fatality forecast for New York State. Range of fatality forecast from Monte Carlo simulations. Data derived from the COVID Tracking Project3 were visualized with Tableau.

Accuracy, limitations, and future developments

The accuracy of the predictions that COVID-19 fatalities decrease below WFT (expected number of flu-like fatalities in an average year) for the same week was thoroughly evaluated. The forecasts vary ±1 week with ±50% variation of WFT with location. The approach always points to the most conservative estimates derived from different data sources.3–6

Like any model, ours is only as good as its assumptions. We hope that our approach can underscore the importance of obtaining more accurate values for key parameters. Better estimates of R0, numbers of infected people, and population density in various locations will yield better estimates of IFR.

R0 can vary with geography, weather, and public health measures. Additionally, IFR might change as medical interventions improve. Also, uncertainty remains regarding the number of fatalities directly caused by SARS-CoV-2. The model’s accuracy would improve with tighter ranges for the customizable parameters in various regions. Future models can include metrics on randomly sampled antibody rates around the United States for improved accuracy. [For technical details, please visit our website.6] If (any of) our assumptions fail, the proposed model can be readily rerun (on weekly or daily basis) with a revised distribution for R0.

Per George Box and Norman Draper,13 “Essentially, all models are wrong, but some are useful.” We did strive to develop a relatively simple, easy to use, and useful model. We hope our approach can underscore the importance of obtaining more accurate, diverse data for key parameters for more advanced and accurate models.


The proposed approach can be used to project and monitor fatalities at the country, province, state, county, and city levels provided that a given population size is large enough (over 1 million) to accurately estimate a range for R0. The model simulates key pandemic parameters, including IFR, Weeks from Infection to Recovery/Fatality, WFT, and a range of R0 values derived from existing epidemiological data for a given location. Our model can be especially useful for experts and policymakers who may lack access to reliable data regarding R0 for their districts and may, consequently, face tremendous uncertainty over possible changes in R0 as states begin to reopen.

This approach empowers policymakers and healthcare and business professionals to make better-informed, data-driven decisions on how to begin to reopen and how to proactively monitor afterward. The proposed Monte Carlo SIR-derived, robust approach is available not only to experts (policymakers, physicians, and healthcare managers), but also to the public at large via a downloadable Excel file. [For details, visit our website.6] We welcome actionable feedback from the users of our Monte Carlo model.


*This article is an edited and abbreviated version of a more formal academic paper that can be found at

**Members of the COVID-19 Projections Collaboration are as follows: Eugene Kolker, PhD, Alexander Huber, Gurkirat Singh Sekhon, Sritham Thyagaraju, Minghao Fu, Isaac M. Krasnopolsky, Andrea Davidovich, Marita Acheson, MD, Dmitri Adler, Anthony M. Avellino, MD, MBA, Philip A. Bernstein, PhD, Paul E. Buehrens, MD, FAAFP, Patrick J. Boyd, JD, Drexel DeFord, MPA, MSHI, Yakov Grinberg, MD, Rose Guerrero, MD, Dawn Josephson, Raif Khassanov, Evelyne Kolker, Eugene Luskin, Aliona Rudys, MD, Irine Vaiman, MD, and Aleksandr Zhuk, PhD. Alexander Huber, Gurkirat Singh Sekhon, Sritham Thyagaraju, and Minghao Fu contributed equally to this article. Please address correspondence to Eugene Kolker (

1. Anuj M, Christopher K, Viswanathan A, Carlos C. Studying complexity and risk through stochastic population dynamics: Persistence, resonance, and extinction in ecosystems. In: Rao ASRS, Rao CR, eds. Handbook of Statistics. Amsterdam: Elsevier; 2019; 40: 157–193.
2. Rubinstein RY, Kroese DP. Simulation and the Monte Carlo Method. 3rd edit. Hoboken, NJ: John Wiley & Sons; 2016.
3. Our Data. The COVID Tracking Project.
4. COVID-19 Projections. Institute for Health Metrics and Evaluation, University of Washington.
5. COVID-19 Coronavirus Pandemic. Worldometer.
6. Monte Carlo Simulations to Predict and Monitor Reopening after COVID-19 Outbreak. The COVID-19 Web Resource.
7. Chaney S, Morath E. April Unemployment Rate Rose to a Record 14.7%. Wall Street Journal. May 8, 2020.
8. Allen D, Block S, Cohen J, et al. Roadmap to Pandemic Resilience. Harvard University. April 20, 2020.
9. Rivers C, Martin E, Watson C, et al. Public Health Principles for a Phased Reopening During COVID-19: Guidance for Governors. John Hopkins Center for Health Security. April 17, 2020.
10. Kissler SM, Tedijanto C, Goldstein E, Grad YH, Lipsitch M. Projecting the transmission dynamics of SARS-CoV-2 through postpandemic period. Science 2020; 368: 860–868. DOI: 10.1126/science.abb5793.
11. a href=””>National Center for Health Statistics. Centers for Disease Control and Prevention.
12. Probability Management.
13. Box GEP, Draper NR. Empirical Model-Building and Response Surfaces. p. 424. Hoboken, NJ: John Wiley & Sons; 1987.

Interview with Eugene Kolker, PhD

GEN: Gene, you are not an expert in epidemiology or virology. What prompted you to tackle a project involving COVID-19?

My research led to two huge questions: “When will things return to a new ‘norm’?” and “What is the best way to monitor the rapidly changing situation at regional levels?” I assembled and led a multidisciplinary team of researchers, physicians, managers, and students. Together, we developed a data science breakthrough approach to forecast reopening and conduct follow-up monitoring of COVID-19 fatalities on the federal, state, county, and city levels.

We felt it essential to develop complementary approaches to pandemic modeling to ease tensions (in an already intense election year) and find an evidence-based pathway to recovery that could obviate trial and error. Indeed, I did not specialize in epidemiology or virology. My doctorate is in bioinformatics and computational biology from the Weizmann Institute of Science, and I was the co-founding editor of Big Data and OMICS: A Journal of Integrative Biology, published by Mary Ann Liebert, Inc. It was vital to think out of the box, so we decided to utilize the outputs from other SIR models as the inputs for our Monte Carlo Simulations.

As a result, we wrapped our model into a downloadable Excel file, which can be used not only by the experts (policymakers, physicians, and healthcare managers), but also by the public at large.

GEN: Could you tell us about your background?

My dad was an engineer, and my mom was a pediatrician. As far back as I remember, I always wanted to help people, especially those in pain. After earning my PhD, I consulted on and off, worked at several enterprises, and co-founded three startups.

After the first startup was acquired, almost for a decade I worked at Seattle Children’s, helping our clients (patients and families). I subsequently moved to New York and joined IBM working with clients on a global scale. I also teach AI at NYU. My technical forte is robust models, data/AI/analytics solutions, and learning/teaching. My five key principles are: 1) respect people, humor, and data; 2) master your work; 3) question basic assumptions; 4) challenge conventional opinions; and 5) take calculated risks.

Last month, I joined DataArt, a global software engineering firm that takes a “human approach” to solving problems. I am super excited to help our clients in any way I can.

This site uses Akismet to reduce spam. Learn how your comment data is processed.