Sam Scarpino on Modeling Disease Transmission & Interventions

Episode Notes

“We should not have a strategy that involves killing a sizable percentage of the population. But, even if you were going to get over that ethical hurdle, [herd immunity for Covid-19] still isn't going to work.”
- Sam Scarpino

For this special mini-series covering the Covid-19 pandemic, we will bring you into conversation with the scientists studying the bigger picture of this crisis, so you can learn their cutting-edge approaches and what sense they make of our evolving global situation.

This week we speak with Samuel V. Scarpino, who earned his PhD at UT Austin before becoming an Omidyar Fellow at The Santa Fe Institute, and now an Assistant Professor in the Network Science Institute at Northeastern University. In this episode, we glance off the surface of his extensive epidemiological research to discuss the complexity of interacting biological and behavioral contagions, analyzing Chinese mobility data to evaluate pandemic interventions, and the problem of unequal data collection due to socioeconomic inequality.

Note that this episode was recorded on March 20th and we’d like to issue a blanket disclaimer that our understanding of the novel coronavirus pandemic evolves by the hour. We believe this information to be up to date at the time of publication but the findings discussed in this episode could soon be refined by more research.

Sam’s Website & Twitter Page.

Read the papers we discuss in this episode at Sam’s Google Scholar Page.

Visit our website for more information or to support our science and communication efforts.

Join our Facebook discussion group to meet like minds and talk about each episode.

Podcast Theme Music by Mitch Mignano.

Follow us on social media:

Episode Transcription

Michael: Sam Scarpino. Thank you for joining us.

Sam: Thanks for having me.

Michael: Yeah. Circumstances, less than great. But then again, this is a moment where folks with your expertise get to rise to the occasion. I interviewed your coauthor, Laurent Hébert-Dufresne…however you say French-Canadian names…

Sam: That's a great pronunciation.

Michael: …earlier this week, but the recording is lost in ZOOM Cloud processing limbo. So I do want to just briefly cover a couple of topics, that I covered with him, knowing that it may be awhile before we actually get to put those out on the feed. And one of those is the paper you wrote, with him and Jean-Gabriel Young, “Macroscopic patterns of interacting contagions are indistinguishable from social reinforcement.”

Sam: Yes.

Michael: So I think this is a really key point for people to get. I think it's good to lead with this one. Can you talk a little bit about the thoughts behind this particular study, what drove you to do this, and how the results that you came to differ from traditional thinking about this?

Sam: Absolutely. The reason that we started thinking about this project is that for years, probably at least a decade or more now, there's been an observation that the mathematics behind how we think certain kinds of social contagions spread... so you could think of a meme or a hashtag on Twitter or anything that we think is spreading like a contagion but is not a biological pathogen... that the mathematics of how some of those move, from person to person, look very similar to how some biological pathogens may be interacting with each other as they spread through a population of hosts.

In particular, you get an increase in the importance of the nonlinear dynamics associated with the disease spread, similar to what happens when you have certain kinds of social contagions. And this led us to try to answer the question, is the similarity kind of superficial? Doesn't mean that it's not interesting. Or is it something that actually is indicative of a much deeper relationship between the way in which we study social contagion and the way in which we study biological contagion?

Michael: So if I'm reading this paper correctly, one take-home is that beliefs that engender specific kinds of behaviors act as a sort of co-infection with actual biological pathogens. And that certain ideas spread in a way that primes a population for an epidemic. I mean that's a piece of it, right?

Sam: It is a piece of it. Well, there's sort of two main conclusions of the paper. The first is, the answer to the question that we were posing is yes. There is a deep relationship. In particular, you can show mathematically that these models have interacting pathogens. So two viruses spreading through the population can be mapped onto a model of social contagion. So that was the answer to the scientific question we were interested in.

To answer your question, what this means is that there are lots of ways in which two social contagions could interact with each other. Two biological contagions could interact with each other. And, or, as you were pointing out, a social contagion could be interacting with a biological contagion. A couple of examples of that from the COVID outbreak: one of the first cases of COVID that we detected in the United States was in Boston. And this was a college student who had come back recently from China and had pretty mild symptoms—mild enough that, almost certainly, we would not have picked up on this case, except for the fact that everybody was hyper-attuned to this new emerging disease outbreak. And as a result, we are identifying earlier cases.

And so this, of course, identifying cases... then these individuals can self-isolate, as we're doing now, and try to prevent ongoing transmission. And so not only does the social contagion around fear or concern around COVID… not only does it increase maybe the chance that we detect some of these spaces, so it affects our datasets that affects our public health understanding. It actually can really feed back on the transmission process because once we identify that individual, then they self-isolate until the symptoms clear, and it prevents ongoing transmission.

Michael: So I'll leave deeper discussion with that paper for our eventual Laurent episode.

Sam: Yes.

Michael: But, just want to make a note that in your discussion, you say, "Interacting simple contagions are mathematically equivalent to complex contagions if we assume well-mixed populations." And that seems like a segue into another piece. This is a medrxv preprint on “The effect of human mobility and control measures on the COVID19 epidemic in China,” where you're using real-time mobility data from Wuhan and other detailed case data, including travel history. You're looking at how social isolation measures were able to interrupt this there. And maybe look forward with that insight into how comparable measures might work in other places. Can you talk about this one a little, please?

Sam: Yeah. Absolutely. To your point about mass action mixing... so what we mean by that is that individuals mix randomly following kind of an ideal gas law for how they interact. It's clearly not a realistic assumption. And in fact, it's interesting that we have that in our paper, because one of the things that we spend a lot of time working on, both in my group and in collaboration with Laurent and others, is really relaxing those assumptions. So this is kind of unusual territory for us to be in. However, what we were able to show is that in some limited network cases you get the same results. One of the things that happens with networks—and this is why we were interested in the mobility for COVID—is that lots of different kinds of social network structures can also drive the sorts of nonlinear dynamics that we see in these interacting pathogens.

So, if you have an arbitrarily complex social network, you can get arbitrarily anything with respect to the dynamics. And so, we need to place some constraints on what we're studying in order to get a clean solution mathematically. So that then takes us to thinking about the mobility stuff for COVID19, which is, as I just mentioned, we know that social networks, mobility networks, that's what's moving around these pathogens. And there were a number of things that were unprecedented with respect to the initial Chinese response. Wuhan is over 10 million people. It's one of the largest cities in the world. Cordoning off a city of that size is completely unprecedented in the history of humanity. However, there were also a bunch of other measures, some of which we're seeing now in the U.S. and South Korea and Italy, which is the shelter-in-place, reduced social gatherings, remote work, those kinds of things.

And so we wanted to understand, what was the relative importance of those different measures? And also incorporate other things that we're all grappling with, like the availability of molecular testing, etc. So that was really the focus of this study, is given the importance we know exists for social networks and mobility and all of these things that happen in China, what can we learn about what works? And what the complications were?

Michael: In Figure 4 in this paper, which talks about shifting age and sex distributions over time, it looks as though you see a demographic shift in age cohorts and in male-female sex ratios during the course of this intervention. What is going on with that?

Sam: That's right. So we think that the situation there is differences in the early part of the outbreak in Wuhan, that we still don't fully understand, caused there to be a bias sex ratio in terms of the number of cases. So more men than women. And for a respiratory pathogen, approximately 50% male, 50% female in the population, we would not necessarily expect there to be a statistically significant bias.

And then also with the age... we see it in Wuhan... you have a different age distribution than in the rest of China. And with the mobility, what we're able to see is you actually get the kind of shifting of these age distributions in such a way that it is strongly indicative of the importance of the mobility in moving the pathogen around. So we took the fact that you have this unusual age distribution to then try and assess the role of mobility. Or at least provide an additional line of evidence that mobility was really key for moving this pathogen around.

Michael: If I'm reading this correctly, you conclude that their combination of interventions was successful.

Sam: That's right. Wuhan has basically had multiple days strung together of almost no cases and all over China, excuse me, all over China they're coming back to work. You know, manufacturing is coming back online and so it really does seem to have worked. Similar measures worked in South Korea, although they didn't end up implementing the strict mobility cordons that we saw in China. Of course South Korea caught it a little bit earlier, which makes it easier to control with less severe measures.

Michael: Yeah, you end this by making an important point, which is, “More analyses will be required to determine how to optimally balance expected positive effects on health with negative impacts on individual liberties, the economy, and society at large.” When I ask people what kind of questions they had on social media for this mini series, this was one of the big questions: how can we possibly began to understand something as complex and multidimensional as, “Where do we draw the line?” Questions of who's holding the knife come up a lot here. I mean even our manager of communications, Jenna Marshall, I was curious to know if you have any thoughts on the longer term implications of the virus and its containment.

Sam: It's a great question. I had a conversation this morning with a journalist who had very similar thoughts and concerns around, “How do we balance the high value that we place on privacy with the need to respond effectively to this outbreak?” And that's something that we're going to have to have a conversation about very quickly as a society. And I do think that there's a role here for technology because there should be a way in which we can get some of the critical information that we need to respond without necessarily the wholesale… violation is probably too strong a word, but… the wholesale sacrifice of some of our individual privacy. My hope is that for example, we would be interested in finding out the fraction of people that have symptoms, and how many social contacts they might've had in a particular window of time around them getting symptoms.

Those are the kinds of things that we could have, even if it's collected at the individual level, reported out in aggregate such that you're protecting privacy, and would still get most of the benefits from capturing that information. So I do really think this is a situation where we're going to have to balance carefully the costs and benefits. And I think as you were sort of alluding to, there are of course some longer-term consequences. As soon as you give up a little bit of bound on privacy, it can be hard to get that ground back. And so it's not just that we're looking at this outbreak, it's we're looking at what kind of precedents, legal and otherwise, does this set for privacy concerns going forward?

Michael: There was a New York Times article recently on Kinsa Smart Thermometers. They were talking about how they've been using these internet connected thermometers to predict the spread of the flu and that they're now tracking the coronavirus or they believe they are tracking the coronavirus in real time.

Millions of people have these smart thermometers. This is one of these sort of interesting situations that I think dovetails nicely into another paper that you lead authored up on arxiv right now, “Socioeconomic bias and influenza surveillance,” and how exactly we're currently deploying different datasets and how we might start thinking about coming up with sort of more even-handed and humane ways of tracking these things. This piece was about influenza specifically, but could you talk a little bit about how the U.S. is currently tracking epidemics, what kind of data sets they're using and what kind of biases in that data you found in this study?

Sam: Yeah, that's a great point and it's something that I'm really interested in and I think is also really got to be top of mind for us with respect to COVID19. Some of us are fortunate enough to have jobs that there will at least seemingly continue through the epidemic, that we have enough money to buy food and keep our families fed and healthy. Large percentages of the population don't.

I think the number is something like 40 - 50% of the U.S. doesn't have over $400 in emergency money that they can tap into for any reason. And so, one of the things we need to be very cautious about and thoughtful about with respect to our COVID response, is these individuals who are in either at risk or marginalized communities. You close the schools… well, schools also are an important source of lunch and before and after work care for large percentages of the population. And so making sure that we can continue to provide these kinds of services is really going to be critical.

So what we show in this paper that you mentioned is, the longest running results in epidemiology, is that individuals who are in lower socioeconomic groups have higher health burden. And oftentimes that's because they are forced to live in more environmentally marginal parts of our communities, they have lower nutrition. More recently, especially in countries like the U S with our health care system, they don't have access to the same kinds of health care, which causes an increase in health burden. That's known.

So we show that quite clearly in the state of Texas, three times the population controlled number of hospitalizations due to influenza in the lowest 25% of individuals by income. However, what we also show is that our ability to forecast the hospital demand in those most at risk populations, keeping in mind they have the largest burden of hospitalization is much less accurate in terms of our ability to forecast and in the individuals in the upper three fourths of the income distribution. And that is precisely because all of the data that we have for the most part comes from healthcare systems. And these individuals are sicker because they all have access to the healthcare system.

So that's why we refer to this as kind of a blind spot, which is that the data that we have is biased against the individuals who are most at risk precisely because they don't have access to the healthcare system. And of course this also, at least in our minds, fits into this broader narrative around biases when it comes to data-driven decision making, algorithmic decision making, machine learning, artificial intelligence, that we don't have any data on these populations and as a result it is essentially impossible to generate actionable forecasts on their demand.

Michael: You make a point here that it's important to improve the timeliness, the accuracy, situational awareness forecasting, but it's sort of a garbage-in-garbage-out scenario unless you actually are getting a complete, or a reasonably complete, dataset.

I remember one of the paper’s co-authors, Lauren Ancel Meyers, an SFI External Professor, when she gave her Ulam Lecture last year on “Preventing the Next Pandemic,” she talked about some of these new methods of data collection. I mean it's not super new and in fact it's been canceled, but Google Flu Trends correlating search data, people looking up Google searches for signs, symptoms, and treatment. What are some of the other new ways of harvesting large datasets that you think might help patch up this sort of under-reported quartile of the population?

Sam: Absolutely. That's something we looked at a little bit in the paper. We show that Google Flu Trends does not ameliorate any of the data bias. And one of the things, again, that's sort of a blind spot in our minds is we think of everybody having access to a high speed internet and smartphones. Even though a large percentage of the population does, there's still a sizable percentage that does not, and not surprisingly, that overlaps with the percentage of the population that are most at risk for influenza, which are these individuals in lower socioeconomic groups and the elderly. And so we think that's a reason that there's a gap there.

In this case it's really tough because if they don't have access to any of these healthcare providers systems, how do you actually get information on what's going on in these populations, is something that lots of state and governments are constantly wrestling with. Maybe I'm not going to have a good answer because I think the tech solutions that we're all excited about and can provide lots of really valuable information about other parts of the population are still largely going to be blind to these individuals.

Michael: We've got a question from social media about what data points you personally are tracking. What do you find to be the most salient metrics to stay on top of right now in updating your understanding and your modeling of this outbreak?

Sam: For me, I think the most important data points to track are things like ventilator demand, number of individuals in the ICUs, those pieces, because one, that's what we're watching very carefully in terms of any risks for hospitals being overwhelmed, but also because they're going to be much less sensitive to reporting issues. They're going to be biased in their own way, but the test rates don't effect, they do affect sort of, but they don't really affect the number of people that need ventilators. If you need a ventilator, you need a ventilator, and even if you haven't gotten the COVID test yet and you need a ventilator, you still need the ventilator. So those kinds of information, they both tell us what the risks are for how close we're getting to a hospital reaching capacity. But they also help us benchmark the test positivity rates, the case reports, all the other pieces of information that we know are biased because of under-reporting. We start to leverage these different data sets against each other to get a more complete picture of how many cases are out there.

Michael: When I spoke to Andy Dobson earlier this week, he made a good point about how herd immunity requires the majority of the population to develop an immunity. And right now China reported, what was it, 100,000 confirmed cases or something like that? It was so extremely below the number of people that would have to have contracted it for the population to have developed herd immunity. And so it looks like we're sort of lurching back into business-as-usual or at least the first few countries to be infected are… and Andy was kind of convinced that this is going to become an endemic infection, that it's something that we're going to be dealing with for a long time. I saw that MIT Technology Review put up a thing about modeling intermittent social distancing over the next 18 plus months. It seems like a lot of people are expecting us to be able to get back to normal, but normal is no longer in view anymore.

Sam: Well, I think we're probably going to be in a situation where normal is going to be very different certainly for a period of time into the future. The point around how we're going to deal with herd immunity is a really complex one because as you pointed out, China didn't get anywhere near herd immunity. Obviously they put in these fairly incredible measures, and I'm not saying incredible in terms of whether they're good or bad, I'm saying incredible in terms of that they’re totally unprecedented. But still, maybe somewhere in the range of a half a percent to 2 or 3% of the population got infected. That's not even anywhere near close to the herd immunity threshold. The other piece of it is we don't know about how long lived the immunity is for this pathogen. So there's good evidence that individuals will be immune for a period of time.

We don't know what that period of time is. And so the herd immunity also presupposes that individuals will be immune for a year or two years. There might not be. The other is, that also assumes that people are not going to take measures into their own hands. So we can see all over the United States that people were reducing social contacts well before any mandatory orders were put into place because the social contagions are spreading as well as the biological contagion, and we're starting to change our behavior. And so what you're really risking with that herd immunity strategy is really just killing a bunch of people and not actually getting the outcome you want. And so I really think it's not really an ethical question, for example, cause it was clear the UK had already made that ethical decision. I think the real question is, it's very damn unlikely to work.

And so that to me is the problem. I mean actually the real problem is the ethical problem. We should not have a strategy that involves killing a sizable percentage of the population. But, even if you were going to get over that ethical hurdle, it still isn't going to work. And I think that's the situation that was really unfortunate about those conversations—although, I think it's very clear that the scientific community, especially in UK were able to inform the government and help them think through that scenario. I will say that to me is one of the really, such as there can be bright spots aside from penguins roaming around the shed aquarium during this outbreak. One of them has been, I think gave the tight interface between public health, academic, scientific research, private sector, everybody, government kind of coalescing NGOs around this initiative, sharing data, sharing resources, building models, helping to inform public policy, supporting each other. That has been really exciting to see and I think is a big part of the reason why we're starting to move in the right direction with respect to the response.

Michael: Yeah. Actually that, I think you sort of preempted my last question to you, which is, what is the good news? And it sounds like: collaboration, learning an effective real-time response to crisis.

Sam: Collaboration is good news. It's good news that we're taking this thing seriously. It's super important. We're ramping up the testing, we're going to need that testing when we try to come back to normal. The strategies that they have in South Korea that are working really well. Singapore, Hong Kong, Taiwan, China, mainland China now, involve this sort of high rate of testing, coupled with case isolation when you identify them, to help bring things back to normal. And so this test volume that we have now is going to be super important for our current response, but it's also going to be incredibly important for our ability to come a little bit more back to normal a little bit earlier because we can engage in this kind of “test, isolate, measure” once the cases start to come back down, that will allow us to try to control the outbreak as best as possible.

Michael: Awesome. Thanks Sam. I look forward to having you back on the show when we're no longer on a war footing.

Sam: Absolutely. Me too. Thanks so much for this. And have a great day.