Teaching Artificial Intelligence to Zap Hate Speech

How can spaces where people gather on the internet be safeguarded from trolls poisoning them with hate speech? Liam Hebert wondered if artificial intelligence could be trained to do the job.

So, he dived deep into the complex world of teaching machines to grasp what people truly intend when, in conversation, they use certain words that can have different meanings in different group contexts. After all, some words that can be used as a vicious insult in some settings can be harmless in another.

Hebert tackled the challenge as he pursued a PhD in computer science at the University of Waterloo in Ontario — and produced a hate-speech zapping deep learning model so effective that when he was done, Google hired him as a research scientist. Hebert has also received prestigious awards including the IEEE Canadian Foundation, the Nick Cercone Graduate Scholarship in Computer Science award and a Vanier scholarship from the Natural Sciences and Engineering Research Council of Canada.

We are two secondary school students who participated in a UBC research project that invited us to note our worries about the future and interview someone working to make things better. Racist and sexist speech online is something each of us encounter often, so we wanted to know if a technological fix might be invented. That led us to reach out to Hebert and talk to him about his project and why he pursued it.

Along the way, we also spoke with Takara Small, a Toronto-based journalist who covers technology and its social impacts for news outlets including the CBC and BBC. She reminds us that even if AI gets better at detecting and removing hate speech online, “a lot of the platforms that many people use are designed by private companies and their filters are designed to work based on what they feel is important, what words, what ideas, what topics they feel their audience should have access to or be able to talk about.”

The people who run these “private entities decide what words, what language is allowed and it may differ from what the general public feels is acceptable, but it doesn’t necessarily mean that they’re not working how their builders want them to.” One example that comes to mind is Elon Musk’s X platform which is home to a lot of bigoted views.

But Small considers combatting hate speech online “an incredibly important topic because a lot of these online spaces are how young people and people of all ages primarily engage with the world. That is their window into learning about other cultures, other people, world news. It’s not necessarily from fact-based reporters like myself. Sometimes it’s from individuals who have no interest in truth telling and reporting on the real world.”

So addressing hate speech online, Small adds, is “important not just because it forms how we engage with each other, but because it’s become the primary place where people learn about what’s happening around them.”

Takara Small is seated in an indoor space. Her chin is resting in one hand. She has long dark hair and is wearing a beige jacket over a white T-shirt with black lettering. Behind her are black sofas and wood-paneled walls fitted with black screens. — Tech issues journalist Takara Small says combatting digital hate speech is ‘incredibly important’ because many young people ‘primarily engage with the world’ in online spaces. Photo submitted.

Small notes that legal regulation of hate speech can be an important tool alongside any technological fixes. “If you look at racism and how hate speech is treated in Canada versus in other places, for example, the European Union, there’s a real divergence,” she notes.

“The European Union has stricter rules when it comes to that. There are actually quite hefty fines in some instances, and then the consequences can grow from there. I think it goes to show that governments can have a huge role to play when it comes to the type of speech we see online, the type of racism, and that it exists everywhere.”

With Small’s perspectives in mind, we conversed by Zoom with Hebert in San Francisco about the power and limits of AI in detecting hate speech, how atoms and molecules helped him figure out his approach, the rough experiences as a kid that still motivate him, and more. This conversation has been edited for length and clarity.

Can you tell us a bit about how you became interested in doing something about online hatred?

I grew up when the internet was popping off and becoming big. I got cyber-bullied a ton when I was a kid. Middle school and high school was not a great time for me, and it made me feel like I wanted to do something about it.

That’s what led me to do graduate studies and make tools that can empower other people, so that they don’t find themselves in the same place that I did.

It was really personal because I know how much it can mentally affect you, being targeted by hate speech.

Also, when you see things like the Jan. 6 attack [by a mob on the U.S. Capitol in 2021], a lot of that could have been triggered by the perpetuation of hateful speech and hateful stereotypes and things like that.

What are some challenges you faced when you started trying to make those tools?

My research was really around this idea of context. For example, if you have the word ‘fire,’ what that word means can change dramatically depending on the context that it’s said. If you say ‘fire’ and you’re a mechanic, that can mean firing up the engine and turning on the car. Or it could be if you’re a soldier, if you say ‘fire,’ it could be shooting your gun. Or if you’re in a really crowded movie theatre and you hear the word ‘fire,’ that’s an emergency and everyone needs to evacuate and it could be life or death.

And so, my research asked how can we teach AI models to not just understand individual pieces of data, like the word ‘fire,’ but really understand the context in which it was said. And so that led to the fact that what is hate speech really depends on the context of conversations.

If, for example, you have the words, “ew, that’s gross,” that could be in response to someone offering you pineapple on pizza. And that’s a normal thing to say, if you happen to think pineapple on pizzas is gross, right? But if it’s, “ew, that’s gross” in response to, you know, LGBTQ rights, that becomes hate speech.

It kind of goes both ways. A good example is RuPaul’s Drag Race. That’s a famous example of what existing hate speech systems before I did my work would constantly get wrong. They would automatically assume a saying is hate speech when words were being used [in a way that was reclaiming language as a marginalized community]. My work tried to basically solve that. So can we understand when people are actually celebrating LGBTQ culture? We’re not trying to say mean things when we use slurs as reclaimed words that were used as slurs against our own community. That’s where my approach came in.

It’s important to be proactive, but it’s also important to be transparent. I would never want an AI system that does blanket removal — we’re going to impose our own beliefs and remove everything. It’s important to do it very carefully and be really transparent about the guidelines that you’re imposing, how your data was collected to train the system.

Interesting! Are you optimistic about AI in general, and do you think AI can help diminish online discrimination?

I love AI systems. That’s why I’m doing this. But I think it’s important to not apply AI systems blindly. It’s important to look at the predictions that you make, and the data that you’re using to train them. Both are really important things that a lot of people overlook.

Technology journalist Takara Small warned us that AI can be programmed to reflect biases of those who create it.

When you train a machine learning model, it’s important to be really careful with the data that you pick. A big problem with hate speech is making sure that you have representative data. If you are drawing from online communities, they can be very different. If I were to just take data from really popular communities that are always super left-leaning and are mostly older people, for example, then some communities might fall between the cracks.

In terms of AI for racism and hate speech, you don’t want to make a system that is a positive feedback loop. Let’s say you have an online social media platform that you have human labellers that are labelling all the data that says, “this is hate speech,” or, “this is not hate speech.” Then they have their own biases. Which is saying, “Oh, if I hear a certain word, I’m always going to say it’s hate speech.” If you train an AI system on just that data, you’re just going to perpetuate those biases. AI systems, especially when it comes to racism and online discourse, serve as an amplifier for your own biases that are in the data.

What about the role of human moderators to keep hate speech in check on online platforms?

It’s important to have moderation, for sure. And communities have to be involved. A lot of my research focused on Reddit, [which] comprises many different communities. It’s important that the communities themselves are stakeholders in whatever moderation you do. I think a good example of a poorly moderated platform would be just some company saying, “We’re going to impose this moderation thing and everyone’s going to use it,” and that’s it. It’s important to do it in a really grassroots fashion.

Can you help us understand what’s new about how your AI system works?

Our key breakthrough was realizing that conversations kind of look like molecules, which is a crazy connection to make. The idea is that there’s a lot of AI models that work in chemistry. Chemistry is all molecules. What we kind of saw was that we can treat a conversation like a molecule. We can say that all the atoms are comments, and the bonds are like the replies. If two comments reply to each other, it’s like two atoms with a bond connecting them.

We took a lot of the systems that worked well in chemistry, and then we adapted them to work on conversations. Then we had to do some work in terms of saying, how do we select what part of the conversation to keep, what part do we remove? A lot of it is just noise — it’s unrelated. Once we figured out that piece, everything fell together.

How have you tested your system? What are some concrete, measurable, positive results that you’ve seen?

The way that we collected our data was from Reddit. We had labelled and worked with 18,000 comments from 8,266 discussions across 850 different communities.

We had all the conversations, and we labeled them. If they contained slurs, we’d label whether this would be an example of a hateful slur or a reclaimed slur — a slur that’s not actually hateful because it’s reclaimed by the community. We had things like person-directed abuse. Then we had affiliation-directed [meaning hateful comments towards group identities]. We had all these different classes of hate speech, and then we evaluated the model on basically a held out set of data. We trained on about 80 per cent of the data, and then 20 per cent of it we used to test the capabilities. This is the data that the model never saw.

We were able to evaluate on each of those categories of hate speech how the model performed, and then we had an overall metric as well. So overall, we were able to get about 88 per cent accuracy across those three classes, which was the state of the art. It beat many other systems.

What about the fact that trolls try and figure out ways to defeat comment moderation systems by substituting in asterisks or spelling hate speech differently?

It’s really hard. This is a whole concept called dog whistles. AI systems can be good at disambiguating what letter an asterisk would have been. I think a really strong signal is just seeing how people react and in response, something is being said. The easiest example is when someone says something really, really hateful, and then all the replies are like, “Why would you say that?” or “That’s so mean.” Like, we can use that as context to say this is actually a really hateful comment. In the same way we can see the topic is something that’s really divisive and politically charged. Then we can kind of fill in the gaps using that information. It’s definitely a big problem. We’re fixing it.

When you haven’t been designing anti-hate speech AI systems, where have you put your energies?

As part of my PhD studies, I mentored nine undergrad students and also some high school students, all towards social good causes. I helped the high school students develop a sign language glove.

A lot of people don’t understand ASL [American Sign Language] so we looked for a solution. Already there are a lot of AI systems out there that rely on a camera. So you could have a camera pointing at you and you would do your gestures, and then you’d be able to translate it to language.

But if you're out in public, you're not going to have a camera right in front of you.

Our idea was, can we make this an embedded technology? Can we have gloves that someone could put on that come with a speaker, so that you could do the gestures and then an AI system, small enough that you could fit in your phone, could then translate those gestures into spoken words? That would let you actually interface with the public. We wrote a paper on that, and that got accepted at Neural Information Processing Systems — a really big AI conference.

Also when I was mentoring, in undergrad, I led in building a satellite that detected oceanic debris in the ocean. It's still up in space. It's got about three years left before the orbit's going to go back around and it's going to burn up in the atmosphere. It's too bad. We don't like space pollution, but we're trying to save ocean pollution. I've always been inspired by mentorship and social good causes.

This story is part of a series by secondary school students, ‘Reporting on Better Futures.’

Read more: Rights + Justice, Science + Tech

Tyee Commenting Guidelines

Please note that email notifications for replies are not currently working due to a software issue which may be resolved in a future update.

Comments that violate guidelines risk being deleted, and violations may result in a temporary or permanent user ban. Maintain the spirit of good conversation to stay in the discussion and be patient with moderators. Comments are reviewed regularly but not in real time.

Do:

Be thoughtful about how your words may affect the communities you are addressing. Language matters
Keep comments under 250 words
Challenge arguments, not commenters
Flag trolls and guideline violations
Treat all with respect and curiosity, learn from differences of opinion
Verify facts, debunk rumours, point out logical fallacies
Add context and background
Note typos and reporting blind spots
Stay on topic

Do not:

Use sexist, classist, racist, homophobic or transphobic language
Ridicule, misgender, bully, threaten, name call, troll or wish harm on others or justify violence
Personally attack authors, contributors or members of the general public
Spread misinformation or perpetuate conspiracies
Libel, defame or publish falsehoods
Attempt to guess other commenters’ real-life identities
Post links without providing context

The Tyee is supported by readers like you

The Tyee is supported by readers like you

Grow independent media in Canada

Journalism by real people, supported by real people. Help us reach 650 new Tyee Builders by June 15.

Human journalists still have a job to do -- if readers support them

Rights + Justice

CULTURE

Rights + Justice

Science + Tech

Teaching Artificial Intelligence to Zap Hate Speech

Deciding which words are toxic in different online worlds is complex. This engineer cracked the code. A Tyee Q&A.

Tyee Commenting Guidelines

Most Popular

Firing Victoria’s School Board Was an Abuse of Power

A Stunning Exploration of Climate Grief

Vancouver Unveils Its World Cup Human Rights Plan. And Gets Blasted

Most Commented

Most Emailed

Danielle Smith Offers a Delusional TV Defence of Her Referendum

David Suzuki’s Call to Action at 90

We Can Lower Gas Prices. Here’s How

A Stunning Exploration of Climate Grief

Gu Xiong’s Constant Evolution

Teaching Artificial Intelligence to Zap Hate Speech

When Urban Planning Works Too Well

Listen Up! ‘The Tyee Podcast’ Is Now Live

Amid Tensions over Police in Schools, a Vancouver Report Shows Promising Change

Vancouver Unveils Its World Cup Human Rights Plan. And Gets Blasted

Firing Victoria’s School Board Was an Abuse of Power

Alberta Had a Bad Case of ‘Guilbeault Derangement Syndrome.’ Now What?

AI Chatbots Are Coming to BC Classrooms

A Company Funded by Bill Gates Wants to Capture BC's Carbon

Would You Eat a Salmon Fillet Grown in a Lab?

The Barometer

The Tyee is supported by readers like you

The Tyee is supported by readers like you

Grow independent media in Canada

Get The Tyee's Daily Catch, our free daily newsletter.

Tyee Commenting Guidelines

A Stunning Exploration of Climate Grief

Gu Xiong’s Constant Evolution

Teaching Artificial Intelligence to Zap Hate Speech

When Urban Planning Works Too Well

Listen Up! ‘The Tyee Podcast’ Is Now Live

Amid Tensions over Police in Schools, a Vancouver Report Shows Promising Change

Vancouver Unveils Its World Cup Human Rights Plan. And Gets Blasted

Firing Victoria’s School Board Was an Abuse of Power

Alberta Had a Bad Case of ‘Guilbeault Derangement Syndrome.’ Now What?

AI Chatbots Are Coming to BC Classrooms

A Company Funded by Bill Gates Wants to Capture BC's Carbon

Would You Eat a Salmon Fillet Grown in a Lab?

The Barometer