Fighting Hate Speech with AI & Social Science (with Joshua Garland, Mirta Galesic, and Keyan Ghazi-Zahedi)

Episode Notes

The magnitude of interlocking “wicked problems” we humans face today is daunting…and made all the worse by the widening schisms in our public discourse, the growing prominence of hate speech and prejudicial violence. How can we collaborate at scale if it’s not even safe to act as citizens, to participate in a sufficiently diverse society, without becoming targets? The World Wide Web has made it easier than ever for hate groups to organize…but also grants new power to those willing to oppose the hateful. New tactics such as “counter speech” have sprung up to depolarize society. But do they work? Can organized nonviolent interventions restore civility and save our public spaces? Or does the ensuing arms race only bring our fora closer to collapse?

Welcome to COMPLEXITY, the official podcast of the Santa Fe Institute. I’m your host, Michael Garfield, and each week we’ll bring you with us for far-ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe.

This week’s episode features three authors of new research on hate speech and counter speech — SFI Applied Complexity Fellow Joshua Garland, Professor Mirta Galesic, and External Professor Keyan Ghazi-Zahedi — who, along with co-authors Laurent Hébert Dufresne and Jean-Gabriel Young, have discovered patterns in the Twitter data that just might help save the Web. Over the next hour, we’ll discuss how they use AI to classify hate speech and counter speech, what this reveals about the hidden structure of our conversations, and how it offers hope for social media just when we need it most…

To learn more about SFI's work on counter speech, and the new CounterBalance seminar series, please visit

If you value our research and communication efforts, please consider making donation at, or joining our Applied Complexity Network at Also, we hope you’ll help this show find new listeners by rating and reviewing us at Apple Podcasts. Thank you for listening!

This Week’s Guests:

Joshua Garland

Mirta Galesic (who also appeared on this show for Episode 9)

Keyan Ghazi-Zahedi

Papers we discuss in this episode:

Countering hate on social media: Large scale classification of hate and counter speech

• As-yet-untitled follow-up paper TBA  (we will add this link as soon as it's available)

Join our Facebook discussion group to meet like minds and talk about each episode.

Podcast Theme Music by Mitch Mignano.

Follow us on social media:

Episode Transcription

MICHAEL GARFIELD: Well, Mirta, Joshua, Keyan, it's a pleasure to have you on Complexity Podcast.

ALL: Thank you for having us.

MICHAEL GARFIELD: And I know that the papers that we're going to discuss today are kind of unique to the show in two ways: one is that this is a very timely and intense area of study, a challenging topic. And then the other is that you are in the middle of the second paper. This is probably the earliest that we've even discussed a particular piece of research on the show. So, we're like volcanologists here down in the mouth of the science volcano, and it's really exciting. The right place is to start with, probably, is with a little bit of personal background. So, introduce yourselves and talk about how you got into science, and how you got into this particular project in whatever order y'all want to speak.

KEYAN GHAZI-ZAHEDI: So my name is Keyan. I'm a computer scientist by education. I live in Germany, and did my PhD on robotics and evolutionary algorithms. So, I tried to understand how the brain works and tried to find some principles in our deficient systems that could explain how the systems work. Then I moved to Max Planck about 12 years ago, 13 years ago, did my postdoc there, some habitation, and was also working on understanding how the brain works. The last research question I was interested in was understanding how a body contributes to intelligence, so how much less does our brain have to think because the body is the way it is. I actually want to measure that, I wanted to quantify that. By living in Germany, and seeing what's happening, seeing the rise of an alt-right party, and also seeing how civil discourse is shifting and breaking down, I was wondering if we could measure what's going on. I started discussing that with Josh and Mirta at a very early stage, and I had this really small question, and wasn’t sure that it was actually something to go after. But from this small question, the three of us kind of generated this project. Is that helpful?

MICHAEL GARFIELD: Yeah. Yeah. What about you, Josh?

JOSHUA GARLAND: I'm currently an Applied Complexity Fellow at the Santa Fe Institute. Previously, I was an Omidyar Fellow at the Santa Fe Institute. I grew up as a mathematician and then slowly…I guess I started out very theoretical with most of my work, and then I slowly and progressively moved more and more towards applied things, mostly because as I saw as I was doing theoretical research or theoretical math, that I really enjoyed it. But, I also saw the world, in my eyes, sort of falling down around me, and as a mathematician, I was really curious about how I could contribute to helping on these problems? So, for example, I worked a long time in cardiac electrophysiology. I worked for quite some time on climate research, and studying the paleoclimate. And now I've primarily switched to working in hate speech.

I take the mathematics and the time series analysis and machine learning tools that I'm comfortable with and try to utilize those, coupled with experts in the field, to understand and try to have an impact on real, societal problems. This project is, in some ways, no different than those projects for me; I see a huge problem right now in civil discourse, and a transition from civil discourse to really polarized, hateful discourse, and I'm curious as a mathematician, how can I contribute to real societal change in that regard? So, I think in terms of how we got into this project, I've seen this big shift in civil discourse in the last two or three years towards being much more hateful and much more polarized. And, there doesn't seem to be a real social theory about how you interact with these people. And so, you know, if you are an immigrant journalist and every single time you post a story, you're constantly being attacked by the alt-right who are saying, “You have no business here, you shouldn't be reporting this,” or “They're doxing your children,” or any number of things that these people say. Then what's an appropriate response? How do you come back at this person? Do you even come back at this person? And there's really no good social guidance for that.

And so, you know, I would see people around me that were being attacked in this way or female scientists, for example, who are hearing, “You have no business in science. You shouldn't be a scientist as a woman,” and they have no idea how to respond. So, for me, that's an interesting question: what's an appropriate response when you're being cyber-bullied or when you're receiving hate speech online? Do you respond? Do you try to get your friends to help you protect you? Do you just block the person? Do you ban the person? What's the appropriate response? And, you know, Keyan and I have talked about this several different times, for a variety of reasons.

Recently, there’s been an alt-right party that's getting quite a bit of power in Germany, called the Alternative for Deutschland, or the AFD. And alongside them, there's an active hate group that promotes their message, and I think to call them anything but a hate group would be a mistake. And what was interesting is that they self-marked themselves. So, from a computer scientist perspective, that's very interesting because a lot of times the distinguishing “hate” from “not hate” is quite hard and subjective. Like, hat's really just “right” versus “alt-right.” But the big difference here was that they actually started marking themselves as being part of this hate group, and that was very useful as a labeling set for a machine learning person. And then, Keyan and I both got really excited when a TV show host, Jan Böhmermann, announced a movement, called Reconquista Internet, that was aimed at countering this group.  And we were lucky enough to have a lot of insider information from an anonymous source that gave us the IDs, and a lot of other different things about this group, so that we could start monitoring their counter speech efforts, as well.

So, from a machine learning perspective, this is fantastic, because now I actually have a data set where I can start thinking about these questions that I've been mulling over in my mind for some time, and thinking about a kind of rigorous approach to quantifying the effectiveness of counter speech. What are the effective strategies? Things like that. So for me, it was a no brainer to jump in and start helping Keyan with this project.

MICHAEL GARFIELD: Mirta, for those who didn't episode nine with you, how would you introduce yourself and your line into this particular study?

MIRTA GALESIC: Thank you. I’m Mirta Galesic, I’m a professor at the Santa Fe Institute. And, I got lucky to get involved with these guys studying hate and counter speech cause it's in line with my longstanding interest in social norms, and how they're formed, and how they affect what we believe and how we behave. And one important way in which social norms are formed are through language, through communication, for observing what other people are saying or doing. If people are starting to endorse certain ideas that might be potentially dangerous in some way, or could be frightening to certain groups of people, while they're still not only the realm of language – and some people say, well, this is not that important – but eventually, once put in the open, these ideas have the potential to convince others about them. And eventually if enough people share a certain idea, they could start acting on it.

And so, I do believe that we cannot look at language in vacuum, and that it's not necessarily okay to just use tactics that we’ve used before when we were not so connected, such as “don't feed the trolls,” which is to not answer, and people will just go away. In this super connected world they are not going to go away, because it's easy to find like-minded people online, even if they are very rare. So, it's important to somehow react and make sure that at least many different lines of discourse are open and are talked about. Hate speech has this property of being threatening and invoking negative emotions in people, like fear and anger. So many people just tend to withdraw or leave the situation, and that's the problem, because then there is no alternative narrative. And so, to an outsider, if somebody stands by, this might signal that this is actually an acceptable way of thinking or behaving.

One way that has been considered for a long time and was implemented – it's implemented still in Europe – to counter just the worst kinds of hate speech is censorship. But of course, this doesn’t inform ways, who decides, how is this implemented, how to recognize this ever-evolving language. Another promising way is actually empowering citizens to counter hate. And as Joshua and Keyan were saying, I mean, there is no established way. There is no good theory of how this should be done. We know something from the research of bullying – this is traditional research on bullying in schools, bullying and workplaces – we know what the empower tool is, we know some of what victims can do, but in the online realm, it's mostly qualitative research. And one big problem is that there is no a large data set that where there are labeled speakers of counter speech and hate that could be studied over time and see how these two different kinds of speeches interact with each other, and what actually is effective counter speech.

So when Keyan and Joshua came out with this huge data set, and the enormous ability to analyze this in a matter of what seems to me now, seconds, as a the psychologist, I was just enchanted by this. This is the very first time in social science that we have such huge data set spanning six years, and millions and millions of tweets running in parallel to many important societal events, terrorist attacks, political rallies, and now we can study how these two different speeches interact. It's basically a gold mine for many different things we can do. And so, we did some of them now and we are still in process of doing any others.

MICHAEL GARFIELD: Yeah, that's a great place to dive in. I want to back up a little bit and talk about the earlier efforts that have been made to try and classify hate speech and the comparably fewer efforts that have been made to classify counter speech, and to talk about how people were trying to go about this, and why you feel that that was insufficient enough to launch into this project. And then, in sort of a longer frame – this is like the subject of the entire first paper – but how did you think differently about this? How did you decide to adopt a different approach and what factors ended up being more useful than those from some of the earlier studies that have been done on this?

MIRTA GALESIC: I would say hate speech classification has been around for quite a long time. There are a lot of attempts, and there are a lot of good algorithms that are achieving good scores. The problem always with hate speech, and also polling for us, is that language is continuously evolving, especially given that the algorithms for discovering hate speech are used to ban the hate speech, so you're basically removing the language once you discover it. And of course, other kinds of language are being developed. So, this is an ongoing battle. It’s a very interesting area that has a lot of players. The area of counter speech is less well studied, perhaps one reason is that social media companies are hoping that they could just somehow censor it, abandon it, or filter it out, and this is very difficult for reasons I just mentioned. So, the idea of empowering citizens to do counter speech is actually relatively new. I mean, it has been promoted, especially in the context of cyber bullying of youth, but in the context of hate speech, relatively new. And so, there are several, really good studies on counter speech, however, there are limitations because they are done on small data sets, you know, up to 10,000 posts, classified by hand, by students or be researchers themselves. This limits both the time span in which the speech can be analyzed, and the diversity. I think, there are good qualitative insights, there is some beginning of formal to counter speech. There is still not enough empirical data to really discern the main ways in which people can engage in complex speech, and in particular, to measure how that interacts with the hate speech. That was just outside the realm of possibility for most researchers in the area of counter speech.

JOSHUA GARLAND:  Yeah, I think one of the biggest differentiators is just the labeling problem. All the data sets from the past, or a lot of the prior studies, which are fantastic studies and really got this kicked off, we're really relying on hand labeling every single instance. And so, there was really no automated way of doing this.  I think a big part of it is there aren't very many organized counter speech groups and certainly not organized counter speech groups that self-labeled themselves. Maybe there were a few people trying to resist hate speech in some particular way, but they're not labeling themselves, and they may be taking part in other things. You really have to go through the history of all the things they're saying and, you know, break this apart, and say “This was counter speech,” “This was not,” “This is counter speech,” “This is not,” and it's super time-consuming. As a result, a lot of the studies that have been published before ours, in some instances maybe, had a hundred instances of counter speech, and it's really challenging for any kind of language model when you have a hundred sentences to learn from, maybe.

One of the things that really differentiated us was that we were able to collect not hundreds of instances, but tens of millions of instances to start learning with. And I think this really set us apart from that group in terms of that data. Hate speech classifications have been around for a long time and people have been working on it, but it is super subjective. And what I consider hate speech, someone else might not consider hate speech. So again, what really nice about this is that we have a group that's labeling themselves as such, and so have people that have dedicated accounts just for hate speech. So we can really pull and extract a lot of really good sample sets to learn the different languages and learn a lot of the covert and overt signals that they're putting forth, as opposed to having to try to rely on some very small hand-coded set.

KEYAN GHAZI-ZAHEDI:  Maybe one thing to add is that we also see that identifying hate seems to be easier because hate is mostly directed to one specific group of persons, immigrants, for example. And then the content is pretty much similar. But for counter speech, it’s more diverse. It can be irony, they can be making fun, or they could be sending pictures. So, it's way harder to identify and classify counter speech than it is to identify hate speech.

MICHAEL GARFIELD: In your section on just defining, for the purposes of this study, “hate” and “counter speech,” I thought it was useful that you clarify that there's both a narrow and a broad definition of hate speech, and that like you all have already alluded to, it has in certain respects more to do with the impact or the outcome than it actually has to do with the context in this expanded sense. That it's not simply about overt insult, or discrimination, or using the intimidating or pejorative language, but that it can also just incite hate. And so, as you're saying also with counter speech, that one of the tactics for counter speech – and I want to get into this a little bit more with you later about the different tactics  – but one of those tactics is just sort of absurdity, that it’s kind of like just jamming the conversation, that it's not necessarily specific to particular symbols or signifiers.

So just as a way of framing this within a sort of broader complex systems understanding, in terms of understanding a particular organism’s functional relationships within a food web, rather than say the anatomy of that particular organism. That's how I understood this. What you're talking about are the ways that we can identify these patterns of behavior activity in terms of their fruits, socially speaking, in terms of the way that they're organized structurally.

JOSHUA GARLAND:  I think one thing that sort of touches on you taking this from a complex systems view…a lot of the prior studies have viewed both utterances of speech and counter speech in isolation. And so you have, in isolation, this particular utterance of hate speech, but you don't have any context of the conversation at large. And so, it's really hard to understand effectiveness and understand the interactions between hate speech and counter speech, if you only have utterances in isolation. So taking more of a complex systems perspective here, what we’re more interested in is more than just individual utterance, we were interested in the dynamics and how those dynamics between the two groups played out over time. And so, another thing that really sets apart our study from other groups is the way that we're approaching the study because we actually have hundreds of thousands of fully resolved conversations between these two groups, un-isolated grounds.  And this is not something that other people have, period. Maybe they have a few conversations, but definitely not at the magnitude or over the longitudinal depth that we have. And so, one of the things that we tried to do was, we were actually able to collect nearly 200,000 reply trees, which are conversations that occurred between the two groups, and then we can use our classifiers. So, then we actually started extracting the dynamics, and I can replay them for any given conversation in a year: What was the proportion of hate speech? What was the proportion of counter speech? Was counter speech directly applying to hate speech, or were they just replying to the conversation? Several different macro and micro level analyses of the dynamics from both the dynamics perspective, but also from a network perspective, as we can discuss, if you'd like. And so, one thing that really kind of shifted our thing from a machine learning classifier task to a more complex systems lens was shifting into studying the dynamic interplay between these two groups, by really studying their interactions, in depth.

KEYAN GHAZI-ZAHEDI:  Well, I'm wondering if we should explain why we used to reply trees, and where we got them from. Would that be helpful?

MICHAEL GARFIELD: Yes, that was my next question for you.

KEYAN GHAZI-ZAHEDI:  So, what we look for is where to scrape, where to get these reply trees from. What would be a good source of reply trees. And in Germany. we have certain news agencies which are considered very reliable and actually neutral, as far as you can be in journalism, right? And we saw over time was that as the alt-right party was getting more extreme to the right, we see that these kind of news outlets are more under attack. They get more replies to each tweet that they tweet. So, their tweet level is almost constant, 40 tweets a day, more or less, but then we see as public discourse shifts more to the right, and things can be said today that were impossible to say years ago, you see that the more and more replies to every tweet by these agencies.

And actually, looking at what happened even to myself, when you see tweets by these news agencies, you see disconnect in the replies, right? You see that some neutral tweet about migration pact, for example, over a couple of days, you see very hateful tweets, hateful answers to these messages. And what can happen is that you get the impression that the news agency is tweeting one thing, but that the public thinks something different. It can happen that somebody neutral, like a new user to Twitter just looking for news input, kind of drifts off to get more and more extreme to the right. So that's why we look for these neutral grounds, to see how discussions evolve on these Twitter feeds and how effective counter speech would be in this environment. But please add to that, Josh and Mirta.

JOSHUA GARLAND:  So, we actually took a very standard machine learning approach. Our special sauce, if you will, was the data and the labeled data. We had a bunch of labeled data that made it much easier, but the actual pipeline is not very extraordinary. So, what we effectively did was we trained document embeddings, as they're called. We trained a bunch of document embeddings, and then we coupled several document embeddings with logistic regression in order to do the classification stage. Would it be helpful if I go into what any of those words mean?

MICHAEL GARFIELD: Yeah. Okay. I'm sure. I’m trying to make fewer assumptions on the part of our listening audience. So, if you're listening and you're like, wow, they're really dumbing it down for us then…sorry.

JOSHUA GARLAND:  So I think one of the fundamental things you have to do when you're building a machine learning model for language is understand how they kind of mathematize language. You know, when you see a tweet in English or in German or whatever, it may be that that doesn't directly translate to something that a neural net or a classifier can use, off the shelf. So, you have to take the language that you're trying to use and you have to translate that language to something that math understands. You need to translate it to some vector representation. But the difficulty in doing that is translating the sentence into a vector representation that somehow has the same meaning as the sentence in English or in German. And that's actually the really hard part, because you kind of want to build in context, you want to build in as much of the sarcasm in context, and little nuances in the language and understanding.  You don't want to simply map every word to one element of the vector and then just have the first element of the vector is "a," the second element of vector "is," "be," and so on. That doesn't make any sense, because then the vectors don't play with each other in space, right? And you want them to interact with each other in space in a way that you would think of them interacting with each other, if you're just having a conversation. And one way you can do that is you can build off this intuition of prediction.

So, as I'm listening to someone speak, I have this kind of model in my mind where I'm trying to predict the next words that they're going to say. I can kind of keep up with the conversation, and I can play this out. Whenever I say "fire," there are only so many words that I can say after the word fire and makes sense.  So maybe I say, fire-man, maybe I say fire-truck, but I'm probably not going to say the like fire-Soviet, right? Like I'm not going to like jump completely out of context. So basically, I train a neural net to guess the next word that I'm going to say, and as the neural net gets better and better at guessing the next word that I'm going to say, what it does naturally is it starts understanding the context of the words that I'm saying. It understands that firemen and firetruck all kind of go in the same part of semantic space, and policemen are maybe a little bit different, like police cars and policemen, or maybe in a little different part of space, but they're also sort of in the same context space. They're sort of in the same area of space and they're probably very different than school-house or Congress or these other kinds of things that you can think of.

So, what you can do is you build up this kind of semantic, contextual understanding of language, and then we can utilize that to map words and language into some kind of higher-order space where we can then put classifiers. And then I can say, okay, “take all the words that sort of represent emergency, like first responders, into one bin and take all the words that kind of represent government and put them in another bin,” and you can start taking these vectors and then map them into different things, map them into different categories. It’s no different than what we do in the hate speech and counter speech categories. So we have these neural nets that build up a semantic understanding of German language that could take a political German tweet, map it into some space that kind of understands the context as best as it can.

And then what we can do, once we have all the vectors mapped into some big space and labeled, it can then say, “Oh, all of these vectors that clustered next to each other are hate speech and all the vectors that clustered over here are counter speech, and all the vectors that are kind of mixed in the middle are neutral speech.” And then we can just use logistic regression or a simple classifier to split that space and given an out of context tweet or a tweet that we haven't seen before, we can pipe it through the neural net. We can map the tweet text or the German or whatever it is to a mathematical representation, and that mathematical representation is then piped into somewhere in the spectrum space, and then we can just ask, “Is this vector more similar to counter speech?  Is it more similar to hate speech? Is it more neutral speech?”

So that's kind of the big, high-level idea of how we accomplish this. And then there are tons of nuances that you can get into, right? I just gave you the most vanilla possible representation, but then you can start thinking about how to weight them by term frequency versus the inverse document frequency. Can I build that in? Where do I start the weights of the neural net? Do I start them in the pre-trained state? Do I train them in a randomized state? And you can start thinking about all of these different ways to kind of tweak it and make it better, but at the end of the day, really, those are just like turning the screws on the engine, the pipeline remains the same.

And so the other thing that we do on top of that, is that we don't necessarily just build one model because there may not be a best model. One thing that you should know is that I just described this piece of machinery where you take spoken language and you turn it into math space. There are a million different parameters that you can tweak and turn and twist. And each time you twist and turn these different parameters or knobs or screws in your algorithm, the way that the language gets represented is totally different. So, for example, you can think about asking it not to predict the next word I'm going to say, but to predict the third or fourth word I'm going to say. Or skip all these middle words, or only considered only try to predict words that I've seen a hundred times or a thousand times.

So you can imagine there's all these different knobs that you can turn. And so, one thing we do is we parse our data set – these millions of tweets that are both hate and counter speech – we parse them into all sorts of different buckets, and then we have neural nets with stickler parameters and logistic regression with particular parameters and we say, “Go learn this language model, and go learn a separation or a decision boundary between hate and counter speech." And we call that an expert. So we can give an expert a spoken piece of language, and then we can say, given that spoken piece of language, put it in a bin of hate counter or neutral.

We then have a different expert, which you can think of as just a different person's understanding of language, and it's going to see a different subset of German. It's going to see a different subset of hate speech and counter speech. It's going to be trying to accomplish different tasks. So, it's semantic understanding of the language will be very different or quite different than the first expert, or at least that's what we hope will happen. So you can train many, many, many of these experts, and then you basically collaborate, and say that I give the same tweet to many different experts that have very different understandings of the language, and I hope that each one of the experts can vote on whether it thinks it's hate counter or neutral. Once you build that framework, you can kind of have a consensus protocol where you can say that these three things say it's hate speech, but the other 22 think it's counter speech, so it's probably counter speech, it’s more counter speech. And then you can start defining thresholds where you have to be super confident that it's hate speech, or super confident that it's counter speech because now you have this consensus protocol to build off of. That’s exactly what we do, we take a tree or we take a conversation that occurred on Twitter, and we have all of these different experts that have been trained on this very diverse data sets with different parameters and different understandings of language, and we go down the tree, through the conversation and say each node in this tree, is it hate speech? Is it counter speech? Is just regular neutral speech? Because you can’t expect every single conversation to be pure hate and pure counter speech. There's going to be a, “I really like fries,” or whatever they're staying on Twitter, just mixed in, and we have to be able to distinguish that. Right?

So, we can ask the panel of experts to be at a certain level of confidence. If you're at least 93% confident that this is hate speech, we'll mark it as hate speech. Or if you're 93% confident that this is counter speech, we can mark it that way. And then you can actually just eliminate everything in the middle. That's the protocol we take. What that allows us to do is then study these hundreds of thousands of conversations we’ve collected in this longitudinal framework and study how hate speech has evolved. This is how the proportion of hate speech has increased or decreased. This is how counter speech is affected. This is how the intensity increase, right? Like, so you can imagine studying intensity by thinking, Oh, suddenly we're 99% confident that everything in this conversation was hate speech. So, the intensity, meaning what they feel comfortable saying, is becoming much, much more intense to where it's not even a question if this is hate speech anymore, this is not a borderline case anymore. It's very adamant. And so, by passing all these conversations through, we can really start understanding the dynamics and how they interplay between each other.

MICHAEL GARFIELD: Just to pull out a little bit, and look at this in a general way, one of the things that I really think is going to become kind of more and more applicable in terms of the way that we negotiate the ambiguity of our complex world in the years to come…this kind of approach also could work very well – it probably is being studied in application on the recognition of deep fakes and this kind of a thing. We're getting to a point where having total confidence about something or relying exclusively on one point of view is no longer sufficient. And I really like, not only that you're talking about training multiple different experts in this way, and comparing them, and creating like a voting system, and establishing a range of confidence, but also that you tested that against human judges. And, you know, it starts to look kind of like the so-called centaurs, where you have teams of humans and machines working together in contest. Do any of you want to speak to how you selected a panel of human judges? Because that's kind of interesting. Although I'm sure this is a pretty common practice, but to me, on the outside, I found it cool how you checked your math against human intuition.

MIRTA GALESIC: So should I take that?

JOSHUA GARLAND:  Yeah, you're the human expert.

MIRTA GALESIC: No! Yeah, so we found a number of German native speakers through Amazon Mechanical Turk and we filter them through several iterations. They all had to pass a test, which was a pretty difficult task, similar to the task they were supposed to solve. And that involved them understanding a story in German that was followed by several comments by readers of an article in German. And they had to say whether the readers agreed or disagreed with the article which required quite a nuanced understanding. And then we gave them some of these tweets to code. They had to say whether this was hate speech or counter speech, or neither. And then in the process, we've seen that some people were doing it much better than others. We were inspecting some part of this, of what they did, and we realized that some people will use it just doing this randomly.  I had a couple of conversations with some people who said, you know, “I really need the money. I'm sorry, I did everything randomly. Please pay me. I don't speak really German,” but somehow they passed the test. Anyhow, after a couple of such rounds of testing them on smaller samples, we ended up with around 25, really good coders that I hope also to use in the future that then reclassified. And so, uh, interestingly, these human coders were extremely well related to the classifier scores, but more so for the hate part of the speech. As Keyan was saying before, there are many ways in which one can do counter speech, and so the correlation was a bit lower for the counterpart. It was still in a monotonic relationship. It was nice, but humans tend to classify counter closer to the neutral speech than they classify hate.

JOSHUA GARLAND:  I have something to add. One thing that I think is really important when doing any kind of artificial intelligence is to keep humans in the loop. And I think that's something that's sort of underappreciated in a lot of artificial intelligence. And I had a recent – actually not recent anymore, I think it's been five years….one of SFI’s external professors, Barbara Gross, is one of the big figures right now in artificial intelligence, in my mind. And she talks about how we really shouldn't be shooting for artificial intelligence. We should be shooting for assistive intelligence, and we should be giving artificial intelligence tasks that humans aren't good at, like pattern recognition, but then we should be verifying these things, and putting a human in the loop to kind of subsidize that. So when Mirta was willing to start a Mechanical Turk thing, where we could kind of verify that our farmers weren’t just absolutely crazy, that they weren't labeling very strange patterns, and to see whether they were matching with human intuition, I was all about that.

MICHAEL GARFIELD: Yeah! So, I hope now's a good time to dig into your findings and to talk about what you noticed in both the structure of conversations, as well as how these things played out over the years of data that you collected.

MIRTA GALESIC: So our main question is whether counter speech works, is it effective in curbing hate? And in particular, whether organized counter speech is better than individual efforts. And it is a very, very difficult question because it is a multifaceted problem and you can observe it many different ways. For example, just think about how you would go about it. Well, you can measure the amount of hate and counter speech online, the proportion you can measure, the average score that a classifier or a human gives to this hate or contract speech to see whether the intensity rises over time. And you can measure it to a finer level, because you can look at how often hate speech is replied to, or how often is counter speech replied to, and what does it do? What happens after hate speech is posted and then counter speech replies? What is due to the rest of the conversation?

Then you can also measure the number of likes and retweets, these kind of indicators. And also, you can go even finer, and that's actually a topic of our further project, like what type of counter speech is best? Do use humor? Should you try to convince with facts? Should you post pictures of puppies to drown the discussion? What is actually effective? So, this would help us then to empower individuals to do it better. Because it's such a multifaceted problem, we were actively looking at different measures of the effectiveness. And we find they all suggest that there is value in counter speech. And Keyan can speak more about that, because I know that by now he is trained to speak about it, but there is value come to counter speech.

We see that after this Reconquista Internet shows up, the organized counter speech movement, there is an increase in frequency of counter tweets and a slight decrease in the frequency of hate tweets. The counter speech has more power, it is more likely to change the discussion afterwards in a reply tree, in the conversation. So, once the counter speech shows up, it is now more powerful in changing it towards counter speech, rather than keeping it hateful as it was. And so, it seems to us in particular, that this organized element is important because there was counter speech before; We see it’s very prevalent across the whole period of this study from 2015 and earlier, we see a lot of counter speech, not only when this organized group shows up in May or late April of 2018. It's always been there, but when organized hate shows up at the end of 2016, beginning of 2017, the individual counter speech loses its power dramatically.  The whole narrative turns red. Red is our code for hate in our graphs, it's becomes much more difficult for individual counter speakers to balance this discourse. Why? And we know from all the literature on bullying, it's difficult to spend up to the bully when one is alone, especially when there are several bullies. You know, it is frightening. It promotes all kinds of negative emotions. And also, one doesn’t see much point if one’s voice is drowned out immediately by a lot of hate messages. It’s emotionally taxing and it doesn't seem useful, and so people slowly give up. But once the counter speak starts to organize in late April, May of 2019, then we see the difference, because now they come in groups, they know that they have counter speakers in support of each other. There are several of them to respond to haters, and the post helps them to overcome these negative emotions related to this, and also to reinforce their own social norm, to show that there is more than one person who is speaking up. That’s what we found.

KEYAN GHAZI-ZAHEDI:  Well, what we see in these conversations with the rise of the organized counter speech group is a kind of signaling behavior. I would see in reply to hate tweets sometimes, hashtags were used, and it seems like these hashtags are used to call other people from the group to join the conversation, which wasn't there before organized counter speech was there, right? And one thing we try to understand is when we see patterns. We see that there's a hate tweet and somebody's replying to that, if a specific group of people is always replying to hate speech, did this have an effect on reducing hate over time? Because if we look at the reply trees, the data we have, it seems to be something that's going on in there.

MIRTA GALESIC: I should say, a standard disclaimer, that this kind of data is wonderful, but we cannot decide anything about causal effects. It's very difficult. Of course, society is very complex. There are many things going on at the same time and our data strongly suggests that there is an association between organized counter speech and a more balanced discourse. We should know that the whole society at that time, the general society, was going into a direction of really being pissed off with so much hate. So there was, again Keyan who lives there will know more about it, but there were also some large scale, organized rallies against racism. At the same time, there were also pro-Nazi rallies. So there were a lot of things going on at the same time. Sometimes it's difficult to see the pattern, but because we are looking at it in so many different ways, and with so many different measures, it seems to us that what's emerging is a picture that could suggest that there is actually effective, organized speech.

MICHAEL GARFIELD: Yeah, actually one of the key takeaways I got out of your current paper in draft was, towards the end you mentioned that from this data set, the extremity of both types of speech increased over time. So, you're talking about a more balanced discourse on the whole, but it's occurring within an increasingly extreme and polarized environment. And just to link that to other work that's gone on at SFI, where for all the world it looks like there are number of different ways that modern society is pulling apart at the seams.  And given your own sort of personal story, Mirta, growing up in a nation coming apart, I'm curious what the orbital view of that is. I mean, on the one hand, obviously counter speech reducing the frequency of hate speech is good on its own, but on the other side, thinking about this increasingly polarized discourse, is it helping or is it just contributing to the balkanization of our cultures here?

MIRTA GALESIC: Yeah, it's an arms race. Once one group starts to speak in extreme terms, it almost seems like the group who is not participating can be losing. At some point…before, at least we could say, “Well, look at this extreme person, we will ostracize them. They will learn the error of their ways. Nobody will buy into these stories,” but these days it just doesn't stop, because there are so many ways to connect with other extreme speakers. And so, the force of extreme speech continuous, it seems that the old rule of not feeding the trolls doesn't work anymore. And so without, even in a way replying in force, I worry that the new kind of speech that's insulting to everyone and everybody, that makes most people feel uncomfortable, might just become the norm and people just retreat and just not engage anymore because it's becoming too toxic in a way. You just stay home.

And so at some point…of course it's not healthy in the long run, it's a very stressful situation like organism fighting a disease, if you want, it's stressful. If it takes too long, it's not going to be well for the organism, or for the society. But at some point, I think we need to activate our immune system and fight back. Hopefully in a while we can come back to a more neutral discourse and discuss soccer, and healthy diet options or whatever we were talking about before this all started. That's my impression. But basically, we are all learning as we go. So I'm sure there are many other opinions, maybe even among my coworkers.

JOSHUA GARLAND:  Yeah, we actually see this exact thing, which I think is really interesting. And so one thing that was surprising to me, but showed up in the data. was that you know, before Reconquista  Germanica which was the hate group showed up, what'd you see as that most conversation started on the hate side of the spectrum. So, they were a little bit more hateful than neutral discourse, but only slightly. So, what would happen is that you'd have slightly hateful responses to these news things, and then people would come into the conversation and neutralize the conversation. People would come in and respond to the hate, and the whole attitude of the conversation would shift back towards neutral conversation. So some people were angry and responded right away. People would talk some sense into them, and it would kind of calm things down, and you'd come back to a neutral discourse. And that happened emergently or naturally over time. And what happened with Reconquista Germanica was we saw a big shift once they came around.

So, once there was organized hate, what happened was that the conversation started hateful. And instead of kind of shifting back to neutral, instead of being kind of watered back down, what we see is that they would either stay hateful, or get even more hateful. So, they would be able to reinforce the hateful rhetoric as opposed to kind of this natural diffusive measure that was occurring. They were reinforcing the hate. And what occurred over time was that this reinforcement mechanism got more intense and more intense over time, where people got more and more bold and to say things they're not supposed to be saying in Germany where hate speech is illegal. And so you kind of look at a conversation and you see that like, “Oh, well the whole discourse is getting kind of more and more hateful, and now it's even more hateful the next day,” but you kind of get the lobster-in-a-pot effect, where you just see it any more and more hateful.  What we were seeing before Reconquista Internet, or the counter, was that the conversation would start out hateful and then drift more and more towards hate. And you saw the norm or the natural state of these conversations being more filled with hate, What you see right after Reconquista Internet is an ability for the conversation to be dragged from this fairly hateful state, down into a neutral counter state. And so, you see this return to more of a neutral state, but then it also seemed to have this relaxation effect after the shock point. We see that Reconquista Internet kind of shocked the system away from this hateful norm, back to something more neutral.

But then there was retaliation, right? Different hate groups retaliated and you see more people jumping on that bandwagon and then you see more people countering them, you see more people hating against them, more people countering, and then you see the split and discourse. So you actually see a lot more polarization.  And so from one aspect, you can say that is that overall, if you just think about the effectiveness of counter speech as being this single facet, like, “Did the proportion of hate speech decrease over time?” then I think you can see in our data that the proportion of hate speech did decrease over time. You actually see this gap between neutral counter discourse and hate discourse being like 15 to 20% on average in conversations drop significantly after Reconquista Internet closer to a couple percentage points. So, you do see this massive drop in proportion. So from that, yes counter speech worked, it’s effective, it dropped hate, but then you think about the other facets, like what did it do to our society? Some of the measures we're currently looking at are polarization measures, and it does seem that like having an organized hate group and an organized counter group, grabbing the conversation and pulling it into different directions, is splitting the conversation a bit, and that's actually causing a great deal of polarization. And so all that brought down hate, maybe caused polarization. And so, does that count as being ineffective or is it effective in a different way? It's a really, really hard problem to think about, especially in the growing polarized climate, not just in the U S, but internationally, like in Germany and Brazil and many different places. One of the things that our team is really struggling with currently is how do you even think about effectiveness? If you view it from one angle, yeah, it was effective, but it caused these other problems. And then when you mix into this conversation these causal effects, well, there are also political rallies, and there were also these other events happening at this time. It makes it a really, really fascinating, really interesting data set to work on…but it’s definitely a really challenging thing to think about

MICHAEL GARFIELD: One of the most interesting things for me, being the Santa Fe Institute social media guy –  because part of my job, unfortunately, is trying to participate in the kind of dehumanizing affordances of social media as they exist in 2020, and being a technologist of attention: trying to game what we know, what I'm learning from SFI about neuroscience and cognitive science,  and use it to craft the stickiest most attractive communiques. And people like Tristan Harris have spoken about this other arms race going on in the tech sector, that is, how do we mop up the most attention? – One of the more interesting pieces out of this for me out of this second draft paper, is what is it that speech groups and counter speech groups are actually looking for as targets? And then, how they're engaging the features of the media that they're choosing to engage with? I would love to hear one or all of you speak to what properties are actually getting identified here. And then what it seems like those properties indicate about the strategies of these groups, and how they’re deliberating their methods of engagement here.

JOSHUA GARLAND:  In an effort to try to understand the dynamic interplay between the counter and the hate group, one thing we are interested in is inferring the strategies that are being used by them. So, we're viewing it from the outside. We're not privy to the war room of either Reconquista Germanica or Reconquista Internet, so we don't know what strategies they're imploring. We can record the conversation. We can say, “This is hate speech.” “This is counter speech.” “This is how they interacted.” “This is neutral speech.” We can do that, but we can't say that we know that Reconquista Germanica went after this particular tweet because they use more hashtags, they use more media. We can’t say what it was that attracted so much attention from the hate group to a particular tweet. And so, one thing we were interested in is doing inference on the strategies. Can we reverse infer what strategies are being utilized by the hate group and the resistance group in the background?

So, the way that we approached that was with two of our other co-authors who aren't present: John Gabriel at University of Vermont, now, I believe. He just switched from University of Michigan. And Laurent who was a former postdoc at SFI, and is a professor at Vermont. We’re in collaboration with them, and what we did was use choice theory to understand the choices that are being made by the group. The way that this kind of plays out is when you’re user that goes to a social media conversation, like a Twitter reply tree, you have a choice of where you attach to, and who you reply to. Do you reply to the root node, or do you choose to reply to someone that used your favorite hashtag, or particular media, or a particular content, or do you only reply to your friends who are also counter speech? Do you also only refer to your friends that are only hate speech now? What is the strategy that's being employed by these two groups? And while one thing that we're interested in is whether or not counter speech is even effective…let's say that it is, let's just make the hypothesis that counter speech has some effect on discourse. Then what is the most effective strategy in doing that? And, in order to answer that question, we'd really need to be able to infer the strategies that are being used. To do this, we use choice theory, and that allows us to infer the choices that are being made when people approach a conversation.

And so, we have several interesting findings in the paper about what choices each group was making, and they seem to be much more structural than content-based, which we found very interesting. I think one of our primary conclusions from this analysis was that both Reconquista Germanica and Reconquista Internet were very good at choosing to attach to the tree in ways that made them maximally visible…so they seemed to be much less interested in particular content, but were replying to the tree in a particular way where their content would be most visible to an outside observer, which is a very interesting perspective. And the way that they do this is by leveraging things like the display algorithm in such a way as to make them able to do this. For example, by replying directly to the root, or replying to very new tweets or replying to very heavily liked or retweeted tweets, they could leverage the display algorithm in ways to propagate their message forward. And, you know, just like we can't claim causality with the time series analysis we're doing, we also can't say that we can know the strategy that they're using, or say that Reconquista Internet was fundamentally leveraging Twitter's algorithm to maximize display preference. But we will say that the way that they chose to interact did maximize their display chances, and so that's kind of the reason we have that take in our paper. We found it really exciting.

MICHAEL GARFIELD: There's another thing here, and I want to be respectful to y’all, and wrap this here shortly, but the key point that I was really looking forward to discussing with you – Joshua, I remember you giving a talk at SFI, I think it was last year where you talking about what you can see from orbit with these conversation trees, and how the network structure of these different conversations yields some insight into the nature of those conversations. And to me, this was such a profound flip in the way that I thought about the possibility for designing future user interfaces. Given that the Twitter API is available, you know, what would it look like, and what would the benefits be of, for example, (and I know this I've been haranguing you about this for almost a year), but the idea of designing a different user interface for Twitter that allows you to see the quality of a conversation before you decide to participate in it, just looking at the structural features instead of the content features. So, could you talk a little bit about that piece of it, about what you can see just by looking at the graph rather than by digging into the tweets themselves?

JOSHUA GARLAND:  So, in some of our preliminary analysis, what we saw was that there were structural differences about how the hate encounter groups interacted, and how the neutral speech interacted. So, there are different distributions in the choices they’re making. And so, just like I was just talking about the choice theory, what you can say is that, given that you're part of the hate group, what is your strategy? And use that to attach to the tree. Or given that you're a counter group, how do you attach to the tree? But you can also ask, if you're just a regular generic citizen not part of the hate of the counter group. How do you attach to the tree? And some of our analysis showed that looking at how different people chose to attach to the tree was predictive of whether they were a hate member, a counter member, or just a neutral citizen.  I think that's what I was talking about then.

And so, in our early analysis– and I think we should probably check this because I think it changed quite a bit since we changed the algorithm – but what we saw was that haters attached directly to the roots, a lot of hate speech was attaching directly to the root node. They were not participating in a conversation. And for example, you saw a lot of counter speech, attaching to the leaves, or attaching to the circumference of the network. But, again, I think that changed quite a bit. Mirta, do you remember this?

KEYAN GHAZI-ZAHEDI:  I wonder if you could use that to actually identify how you encounter accounts? Could that be like one piece of the puzzle?

JOSHUA GARLAND:  I think if you actually got a really good understanding of the choices that are being made as to strategies, that could be utilized. I think one of the more interesting ways to utilize that would be actually to change the display algorithms to kind of mix between the two. So there's a display algorithm where I know exactly the kind of tweets that are going to be displayed prominently, and the kind of tweets are going to kind of get pushed down. If I want to push my message as a member of a hate group, or a member of the counter group, I can leverage the display algorithm to propagate my message. And there have been many studies that have shown that, right? Where groups on Reddit or groups on Twitter have attached in particular ways to conversations, or liked in particular ways to propagate a message forward and to make it seem like it's more prominent message than it really is. And it was surprisingly, you can do that with very little interaction. You can kind of propagate a message to the front page of Reddit, things like that. And so, what I think might be really useful is that if companies like Twitter or Facebook or Reddit looked at what the strategies are that are being leveraged, and then were able to mix up their display algorithms so that you can't leverage these things as easily, I think that can be super beneficial.

KEYAN GHAZI-ZAHEDI:  I was saying, if you could change the display algorithm, could you actually reduce hate speech. Is that what somebody was saying?

JOSHUA GARLAND:  Maybe. I think if you have a fixed display algorithm, you can weaponize that, right? As soon as the display is fixed, it's weaponizable. But if you randomize the display over them on a daily or weekly basis, where you don't know what's going to be the prominent tweet that's shown, then it's really hard to weaponize that display algorithm. That’s all I was getting at.

KEYAN GHAZI-ZAHEDI:  So kind of messing with this strategy of posting hate.

JOSHUA GARLAND:  Yeah, exactly.

MICHAEL GARFIELD: I mean, that would probably strike it at the heart of the ad revenue model, though. You got to have one that doesn't give massive preference to engagement, because that's a huge well-recognized piece of the way that social media has contributed to the polarization of discourse. Pissing people off is a better way to get eyes on your posts.

JOSHUA GARLAND:  No, I think that that's very fair. And you know, it really gets back to the kind of multifaceted nature of effectiveness. If you do counter speech, and you drive everybody off the platform so there's no hate speech but there's no speech, nobody's talking at all, nobody talks to each other, was that actually effective, from Twitter's perspective? That's awful because there's no more ad revenue, right? So, it's very ineffective to have counter speech if you drive everybody off the platform, but by the same accord, if all of the Nazis come in and they just hate speech hate speech, hate speech, and that drives everybody off, okay, was that effective? Because then they're no longer propagating their message. It's this really interesting interplay between what do you mean by when you say “effective,” and what do you mean when you try to parse this apart? And if you view it from the perspective of an average citizen or if viewed from a politician’s standpoint, or from the ad revenue model standpoint, “effectiveness” wildly changes. It's this moving target that we're trying to analyze in this work, and it's proving to be super challenging.

MICHAEL GARFIELD: So that, that brings us into the last question that I have, about the glowing open questions that you're hoping to answer in this paper, in its final form, and in follow up research. And then, what are you hoping that people latch onto and take home from this, and consider how that might change the everyday practical implications and so on. You spoke to some of that already, but I feel like that's a good place to button this up.

JOSHUA GARLAND:  What do we want the final takeaway to be?

KEYAN GHAZI-ZAHEDI:  Maybe I can give it a shot. So, my hope would be from the current research that we have that the take-home message is to organize yourself. It seems to have an effect, and it seems that organized counter speech is more effective in countering hate. That'd be my hope in the long term. I would hope to, and I know this is not realistic, but maybe to have a handbook for counter speech: what would be the best strategies for specific forms of hate? It’s not so much a one to one rule, you know, “if this comes, then do this,” “if this comes into this,” but maybe the most effective strategies, or most likely effective strategies for specific types of hate, and how could you counter that in a civil way without destroying the discourse? That will be my hope for the long run.

MIRTA GALESIC: Yeah, I agree with that. I think that the first question was “does it all matter?” And it seems like it matters. And now the next question is how can we tailor it so that it's even more effective? How can we empower others to do it? And that's what Keyan is saying. What are the strategies that actually work? Should we attack? Should we laugh? Should we help the weak teams to survive more? What can we do?

MICHAEL GARFIELD: And in a way that doesn't just accelerate the arms race, hopefully, I imagine.

KEYAN GHAZI-ZAHEDI:  Yes, in a way that will bring it back to the civil discourse like it was before, when we talked about recipes and football scores, and not about violence, right?

JOSHUA GARLAND:  Yeah. I mean, can we curb hate, or could we come up with effective strategies that are backed by science that really tell you what is an effective way to counter hate without just increasing polarization? And I think that polarization is destroying our society in so many ways. You know, we're getting to a point in the US where you can't send a news article from particular outlets to another friend without being worried you're going to ruin that friendship. And people say two words and you just know that they're on the right, or they're on the left, and so you just don't speak to them anymore.  It’s just toxic and that's not the way that we're going to go forth in our society. We can't have a transmission of ideas that way. You know, the right has great ideas. The left has great ideas. We need to come together to be a better people. And I think that one of the most important things that I hope…you know, you said what I get to hope, So I get to pick anything I want: I hope that we come up with effective ways to curb hate both on the right and on the left, and return to a civil discourse where we can come together as a society, and we can talk about pizza without worrying about whether Obama had pizza, or if Trump had pizza. That's not the point. We can have a great pizza recipe and just discuss that. Bring it back to civil discourse.

And what I'm hoping is that we can come up with a rigorous social theory that tells people how to counter hate in a productive way that's non-polarizing. And I also hope that our work alleviates a lot of the ethical burden on social media platforms. Right now, you know, the government has kind of shelled off, in the U S in particular, the government has kind of shelled off all the social, ethical, legal, responsibility of censorship and propagation of ideas onto these social media platforms. It's really not fair. So you'd say, “Facebook, you need to worry about censoring hate speech, and you need to be worried about censoring these groups, and it's your choice who you ban and how you propagate these ideas and how you deal with misinformation and disinformation,” etc. It puts a huge amount of burden on these companies. And I think if we could empower citizens to do this, and citizens to come in and say, “We don't support these ideas. We don't hate immigrants. We don't hate the women in science. We don't hate all these things that the alt-right is against.” and we have proper ways to respond that shifts our society back to a neutral place. I think that would be a fantastic outcome of this study. And I think it’s something that's actually achievable that we can do.

MICHAEL GARFIELD: Wow! I'm imagining this fantasy future in which Facebook is actually paying me to engage in de-polarization of discourse, that we've been volunteering for years, but anyway.

JOSHUA GARLAND:  One of the exciting things about the way our group is approaching this problem is trying to bridge the gap between academics and industry. There are a lot of problems in academics that can't be solved by academics alone, and the same on the industry side. They need an academic/industry perspective. And so, one of the things our team is trying to do is marry together, not just mathematicians and social scientists and network theorists and all these different groups on the academic side, but we're also trying to bridge the gap to actual industry companies. So, actually trying to talk with Twitter and Facebook and Reddit, and actually bring them on board, because we think this is actually going to meaningfully impactful in the space. You not only need the academics thinking really hard about the background theory and talking to each other across disciplines like we always do at SFI, we really need to bring an industry and say, “How do we take what we're finding in this, what we're understanding, and apply it on an industrial scale?” We have a recent initiative at SFI where we're marrying academics to industry through a fellowship where each fellow is tied to academic advisors, as well as being tied to industry advisors, in an effort to bridge the gap between academics and industry. That's called the Applied Complexity Fellowship, and that is a brand-new fellowship that we're starting at SFI, and we encourage you to reach out and discuss it with us if you're interested.

MICHAEL GARFIELD: Awesome. It's been absolutely wonderful to talk about your research with you, and thank you for doing this. And, there is a lot of stuff that comes out of SFI that I feel has really sort of profound social implications, and this is right up there at the top of the list. So, thanks all. I hope that this podcast draws some useful attention to you and your research.

MIRTA GALESIC: Thank you for having us.