Of Apples and Oranges in a Pandemic

Written by Michael T. LeVasseur, PhD, MPH, a visiting assistant professor in the Department of Epidemiology and Biostatistics at Drexel University’s Dornsife School of Public Health. LeVasseur has commented extensively in news stories about COVID-19.

As state and local governments discuss when to lift stay-at-home orders and other restrictions, a lot is at stake. Before quality of life – and yes, financial and political decisions are made – here’s what we need to obtain true infection rates, fix inconsistent reporting and accurately understand the data needed to re-open the country safely.

1. Accurate and transparent data needs to be shared with the public.

On April 8, 2020, speaking on Good Morning America, Dr. Deborah Birx, coronavirus response coordinator for the White House Coronavirus Task Force, said: “We are concerned about the metro area of Washington and Baltimore. We are concerned right now about the Philadelphia area.”

This news came as a surprise to Philadelphia Commissioner of Public Health Dr. Tom Farley who said: “I don’t know exactly what numbers she is looking at, and I doubt she is looking at numbers as updated as we are…but at the moment, things are looking a little bit better.”

And they were looking better. In fact, on April 8, the number positive cases had a doubling rate of seven days and the growth rate in cases was 9.6 percent (down from 44.3 percent at the beginning of the local epidemic).

As it turns out, Birx was looking at the percent of positive COVID-19 tests, which just days before was 32.7 percent. This is an alarming positivity rate considering that other locations had a much lower proportion of positive cases. How can it be that Philadelphia had such a high rate of positive tests, but such a low growth rate?

It all comes down to selection bias. In short, selection bias occurs when a sample is drawn from a population that has a different chance of having the outcome than the target population. In this case, the target population is all residents of the city of Philadelphia. But what population, was the sample actually drawn from? Considering the limited testing available in Philadelphia relative to other areas, the answer is: individuals at high risk of contracting COVID-19.

In the end, the positivity rate in Philadelphia was higher simply because we were testing people who were more likely to be positive. Therefore, Birx wasn’t making a fair “apples-to-apples” comparison between Philadelphia’s rate and that of other cities. However, considering the limitations in testing capacity and inconsistencies in data collection and reporting across jurisdictions, she was working with the best information that she had.

2. Increase capacity for testing.

Shortages in testing kits and reagents have forced states and municipalities to make difficult decisions about who gets tested. In Philadelphia, for example, the Department of Health has recommended testing only those over the age of 50, people who live in a group setting (such as prisons or nursing homes), health care workers exposed to someone who tested positive, or individuals who have symptoms and have been in contact with a known case and/or those who are hospitalized with symptoms.

These priorities reflect the outbreak that Philadelphia is experiencing, where according to an April 20 Philadelphia Department of Public Health press release, 52 percent of those who have succumbed to the virus are residents of long-term care facilities.

To better understand the meaning of data across cities, let’s look at epidemic and demographic characteristics, timeline of cases reported and testing in eight U.S. cities: Philadelphia, Baltimore, Washington, D.C., New York City, Los Angeles, Chicago, Miami and New Orleans.

  • New Orleans tests at higher rates than many other U.S. cities, conducting more tests per capita (4,594 tests per 100,000 people) than New York City (2,826 per 100,000 people).
  • Philadelphia, Washington, D.C., Los Angeles, Chicago, and Miami are all conducting tests in lower numbers — around 1,500 per 100,000 people.
  • Baltimore lags with 516 tests per 100,000 people.

Despite these vast differences in testing rates and testing priorities, Philadelphia, Baltimore and New Orleans actually all have similar rates of positive cases (28.1-31.5 percent).

Does this mean that these three cities are experiencing similar rates of infection? Probably not, but differences in testing rates and testing priorities prevent us from comparing cities to one another. 

3. States should report their testing consistently.

Local health department reports must decide on how to uniformly report the following information:

  • The number of tests conducted
  • Who is getting tested
  • When people are being tested (i.e. how soon after symptom onset)
  • The lag between when a test was conducted and when the results came in
  • Which date is being reported — the date the test came back positive or the date the test was conducted

Without this, epidemiologists, public officials and all citizens are “flying blind,” without a way to make any determinations of where we stand along the epidemic curve, when it’s going to be okay to open our communities back up to business and what areas need increased public health surveillance.

Data scientists providing predictive models forecasting this epidemic for policy makers are making substantial assumptions about the data, which leads their estimates to be considerably biased. Reliable forecasting requires reliable and consistently reported data. This is known as the “garbage in, garbage out” principle of predictive modeling, which demonstrates that poor data quality inputs result in poor prediction outputs.

The basic reproductive rate, R0, which is the average number of people to which one person will spread the virus, is usually estimated using any of a variety of complex equations, but in principle it is simply the product of three terms, the contact rate (c), the per-contact probability of infection (p), and the duration of infectiousness (d).

While we do not know the true value of any of these variables, the per-contact probability of infection and duration of infectiousness (the p*d portion of the equation for R0) likely does not vary across locations.

The contact rate, however, does.

Consider that New York City has a population density of 27,751 people per square mile. This means that each individual in New York City has far more contact with other people than someone who lives in New Orleans with a population density of 2,029 people per square mile. This alone means that any epidemic in New York City is going to have a growth rate that is much higher than elsewhere.

Despite this, New York City and New Orleans have approximately the same number of cases per 100,000 residents (1,409 vs 1,445, respectively). However, remember, New Orleans is testing its population at a higher rate than New York City, and New Orleans’ testing guidelines are based on provider’s discretion, whereas New York City is testing only those who are hospitalized with symptoms.

Does this mean that the coronavirus is spreading faster in New Orleans than in New York City?

No. It is possible that the epidemic started earlier in New Orleans than it did in New York City and therefore spread further prior to being detected. It is equally possible that New Orleans is capable of identifying more cases than New York City and truly has a lower number of cases per capita. In short, it’s difficult to ascertain the true reproductive rate because the entire population of these cities is not being tested. 

As communities begin to look for benchmarks for when to open back up, these questions need to be addressed. Lives and the short-and long-term success of the economy are at stake. Any comparison between localities is going to be heavily biased based on testing priorities, differences in reporting and demographic makeup of that region. Without addressing these incompatibilities, we risk undoing all the work we’ve done to mitigate the spread of this virus.

This is why we cannot begin to lift stay-at-home orders and other restrictions until we have the testing capabilities that will allow us to answer these questions:

  • How many people have this virus?
  • How many people have recovered from this virus?
  • How infectious is the virus right now?

Until then, we’re just making educated guesses because relying on outcomes from other countries, states or cities is essentially a comparison between apples and oranges.

Data collected April 14, 2020
*Number of tests estimated based on total tests conducted in the state.

*Number of tests estimated based on total tests conducted in the state.

Note: Data on testing priorities came either from each city’s own department of health, or from the state department of health in cases where a city deferred to their guidelines. Data on number of positive cases and number of people tested come from the city or state department of health and were collected on April 14, 2020. Exact number of tests conducted for Baltimore, Los Angeles, and Chicago were not available and were estimated as a function of the total number of tests conducted in the state, the number of cases in the city, and the number of cases in the state. These were adjusted using a bias correction factor of 0.615 reflecting the fact that this technique overestimated the number of tests conducted within cities that did report the number of tests conducted. Important dates related to the outbreak are based on news and government reports.

Michael T. LeVasseur, PhD, is a Visiting Assistant Professor in the Department of Epidemiology and Biostatistics at Drexel University’s Dornsife School of Public Health. He thanks Leslie McClure, PhD, and Michael Donnelly, for their insights in the preparation of this commentary.

Tagged with: