If one is not a statistician, it can be challenging to make sense of news reports about the data used to analyze and predict the impact of COVID-19. In this Q&A, one of the country’s top experts in actuarial science, enterprise risk management, and predictive modeling, Rick Gorvett, FCAS, CERA, MAAA, ARM, FRM, Ph.D., helps shed some light.
Professor Gorvett, Chair of Bryant’s Mathematics Department, offers guidance on how to better understand the data that leaders rely on to make decisions that have far-reaching implications for economies, communities, and individuals around the world.
“If we can mathematically model how a disease spreads, we can test ways of preventing it. We can find where to put resources to stem future outbreaks.”
In the simplest terms, Gorvett explains, “If we can mathematically model how a disease spreads, we can test ways of preventing it. We can find where to put resources to stem future outbreaks.”
What is statistical/mathematical modeling, and why is it so important as the world manages through COVD-19 now and during past outbreaks and pandemics?
A mathematical model is an attempt to quantify and describe real-world phenomena – whether physical, biological, or behavioral. One subset of such models are epidemiological models, which describe the transmission of infectious diseases and their potential propagation through a population or society. By mathematically simulating the transmission of a virus amongst individuals, the macroscopic behavior of the diseased system can be observed, and its characteristics better understood.
By changing certain parameters in the model and running the simulation over and over again, we can identify which containment strategies are most (or least) likely to produce favorable results.
Such a model can become the basis for assessing the potential impact of public policy initiatives to fight the spread of a disease. By changing certain parameters in the model and running the simulation over and over again, we can identify which containment strategies are most (or least) likely to produce favorable results, and even indicate the degree to which infection or death is expected to be reduced.
A common approach to epidemiological modeling involves the SIR framework in which individuals can be classified as one of three states: Susceptible, Infected, and Recovered. (A fourth state is Death, whereupon the modeled individual leaves the model going forward.) A key aspect of an SIR model is to use data associated with the virus (or other relevant information, perhaps data or knowledge gained from studying similar viruses) to determine the value of its parameters – in particular, the likelihood of an individual moving between those states. An example is the important R0 parameter, which is the number of individuals that each infected person will in turn infect. Similarly, likelihoods of susceptible individuals being infected, of infected individuals dying, and of infected individuals recovering are critical values and assumptions for these models.
What other disciplines are involved in examining and analyzing COVID, it’s behavior, impact, and recommendations for responding? And how do they work together?
As with many applications of mathematics and statistics, modeling disease is a strongly multidisciplinary effort. In addition to involving mathematical and medical expertise, an epidemiological model will ideally draw on the skills of people in psychology and behavioral science (for input regarding how people act in response to the virus and to containment efforts), economics and finance (to understand the potential impact on economic and financial markets of control efforts), political science (especially in a global pandemic, where data regarding the disease and its effects must be understood within the context of different political systems and ideologies), communication (to transmit information to the public in an effective manner), and many more disciplines.
What are the data telling you and other experts about the characteristics of COVID-19?
The biggest problem in modeling COVID-19 is probably the lack of data – or, sometimes, verifiable data – in areas that are critical to developing a precise model. When political leaders hold their daily press conferences – many of which have been good, respectable, and informative events – they generally provide that day’s “big picture” statistics, such as number of newly-confirmed cases, number of hospitalizations, and number of deaths. These are all important values in which the public is understandably interested.
But from a modeling perspective, important quantities are missing – particularly about the denominators. Model parameters often involve “rates” – for example, how something changes over time, or how one statistic relates to another. For example, consider the questions that go begging when the number of newly confirmed positive cases is reported. Out of how many tested individuals? And what portion of the population is currently being tested? Because of a shortage of available testing resources, to a large extent only suspected potential positive cases are being tested – which means we have no good or complete data regarding the infection rate of the virus in the general population. And what percentage of positive cases are asymptomatic? We don’t know, because we are not testing the general population. Knowing the denominators, so that the ratios or percentages can be determined, is critical when interpreting statistics.
Are there other pandemics with similar characteristics and patterns? What is different about COVID-19?
To the extent that a pandemic involves a virus that is a recurrence of (or is very similar to) an earlier and well-understood virus, one could expect the modeling of a new outbreak to be relatively straightforward. There has been little opportunity to study and understand COVID-19, and so we don’t know much about it – or whether it’s similar to previous causes of pandemics. This makes modeling difficult, and thus there are widely varying predictions compared with the epidemiological models associated with better understood viruses.
Why are some of the models so different? Three factors: data used, assumptions made, and model purposes.
Last month, Imperial College London (ICL) released a report that caused leaders of the world to take stronger action across the board. What was in that report that had not been previously understood?
On March 16, ICL released a study that suggested that the potential impact of the Coronavirus in the UK and the US was considerably larger than people had been thinking in the prior weeks and months.
This model, based largely on data emerging from the UK and Italy, suggested that, without mediation or control measures, the need for intensive care beds due to the virus would utterly overwhelm existing capacity. The report predicted over 2 million deaths in the US, and over half-a-million deaths in the UK, if the virus went unchecked.
The ICL team’s modeling also indicated that we basically need to attempt suppression of the virus – anything less, and as soon as any controls intended to restrict the virus were relaxed, a deadly second wave of infections would follow. In their opinion, only some combination of public health measures has the potential to be effective – measures such as home isolation (and home quarantine for those with symptoms), social distancing (especially of those over age 70 because of their perceived greater susceptibility), and the closure of schools and universities.
As time goes on, I suspect we will begin to see the various models and their indications converging and becoming more consistent with one another.
It seems every day there are new reports and interpretations of models predicting a range of scenarios and outcomes. What are some things to keep in mind as we try to make sense of these reports?
The difference between the ICL and the Oxford studies (which came out about one week later) is a good example of how different model assumptions can lead to vastly diverging results and indicated public policy options. The Oxford study posited a different starting premise (and one that they felt was consistent with the observed data), specifically that a huge part of the population had already been infected, that the deaths we are seeing currently is the lagged result of the virus’s impact, and that most people in the population have now recovered and are immune. With a much larger “infected” population in the denominator, relative to the number of deaths, the probability of death given an infection would be much lower than people are currently thinking.
As we collect more data, and as we test more widely, many of the rates that are critical to proper analysis and prediction will be valued more accurately and realistically. As time goes on, I suspect we will begin to see the various models and their indications converging and becoming more consistent with one another.
Also, we need to keep in mind that, in addition to starting with different assumptions, different models may have different purposes, or even different perspectives. Even models that appear to be producing very different results may actually be consistent with each other.
As we collect more data, and as we test more widely, many of the rates that are critical to proper analysis and prediction will be valued more accurately and realistically.
Why are some of the models so different?
Three factors: data used, assumptions made, and model purposes. The context of COVID-19 means that the nature, quality, and completeness of the data is not what we would like. Different assumptions, for example regarding the values of parameters, can lead to very different results. And different models can have different purposes – and thus, while the results are different, they may not necessarily be inconsistent with each other.
Until more is known and understood about COVID-19, and more and better data are collected, it is difficult to assess the relative reliability of different models.
What else is important to understand as things develop over the coming weeks and months?
Until more is known and understood about COVID-19, and more and better data are collected, it is difficult to assess the relative reliability of different models. The technical modeling process itself is well-established. However, it is mostly because of the assumptions, the estimates of parameters values, and the purpose to which a model is directed that we get different results amongst models. These results are significantly different in the case of COVID-19, due to the data issues and the lack of full understanding of the characteristics and behavior of the virus.