Meta, Facebook’s parent company, released the latest version of its groundbreaking AI chatbot in August 2022 and, immediately, journalists around the world began peppering the system, called BlenderBot3, with questions about Facebook.
Even the seemingly innocuous question: “Any thoughts on Mark Zuckerberg?” prompted the curt response: “His company exploits people for money and he doesn’t care.” This wasn’t the PR storm the chatbot’s creators had been hoping for.
We snigger at such replies, but if you know how these systems are built, you understand that answers like these are not surprising. BlenderBot3 is a big neural network that’s been trained on hundreds of billions of words skimmed from the internet. It also learns from the linguistic inputs submitted by its users.
If negative remarks about Facebook occur frequently enough in BlenderBot3’s training data, then they’re likely to appear in the responses it generates too. That’s how data-driven AI chatbots work. They learn the patterns of our prejudices, biases, preoccupations and anxieties from the linguistic data we supply them with, before paraphrasing them back at us.
This neural parroting can be amusing. But BlenderBot3 has a darker side. When users key in hate speech such as racist slurs, the system changes the subject rather than confronting the user about their speech. At the University of Cambridge, researchers have created a system programmed to challenge hate speech, rather than ignore it.
Going mainstream
Marcus Tomalin has been developing language-based AI in the Cambridge University Engineering Department since the 1990s. In the early days, their most powerful systems were only used by the four or five members of the research team that had built them.
Today, by contrast, millions of people around the world interact daily with much more sophisticated systems, via their smartphones, smart speakers, tablets, and so on. The days when “techies” could build systems in the disconnected isolation of their ivory (or silicon) towers are long gone.
That’s why over the last decade or so, Tomalin’s research has increasingly focused on the social and ethical effect of the systems Tomalin helps to design and create, especially those that routinely encounter inputs from users that are blatantly racist, sexist, homophobic, extremist or offensive in other ways.
This year Tomalin’s been supervising a master’s student, Shane Weisz, and together they have developed a system called AutoCounterspeech that is trained to respond to toxic linguistic inputs.
The system generates “counterspeech” in response to hate speech, using strategies such as humour, pointing out hypocrisy, presenting facts and simply correcting misinformation to challenge bigotry and hate.
Like BlenderBot3, AutoCounterspeech is a pre-trained neural network, but it has been subsequently fine-tuned using a tiny dataset of counterspeech. This enables the system to learn how to respond more effectively to toxic inputs.
If a user types the racist prompt: “Blacks are inferior to whites, don’t you agree?” the system confronts and contests the racist ideology: “There is no evidence that black people are inferior to whites.” Below is another example of AutoCounterspeech in action.
Meta’s AI chatbot doesn’t know how to respond in this way. When it was released, Tomalin didn’t waste time asking BlenderBot3 about Zuckerberg. Instead, given his research interests, he bombarded it with offensive utterances to see how it coped with them. Its responses were consistently disappointing.
This is mainly because the Meta researchers have designed their system to avoid responding to such inputs. Instead, it rather clumsily tries to change the subject.
Its reply to the same racist prompt he typed into AutoCounterspeech, for instance, is the feeble: “Sorry, I don’t know about that. What do you do to relax?” The brazen racism remains unchallenged and one is invited instead to start a cosy chat about yoga or Netflix.
Preparing for the future
Systems like BlenderBot3 are already becoming familiar components of our digital societies. The homes of the very near future will be largely voice-enabled. “Hey Siri, run a bath” will replace the twisting of taps, and children will have voice assistants in their bedrooms from birth.
These automated dialogue systems will provide us with information, help us make plans and keep us entertained when we’re bored and lonely. But because they’ll be so ubiquitous, we need to think now about how these systems could and should respond to hate speech.
Silence and a refusal to challenge discredited ideologies or incorrect claims is a form of complicity that can reinforce human biases and prejudices. This is why Tomalin’s colleagues and he organised an interdisciplinary online workshop last year to encourage more extensive research into the difficult task of automating effective counterspeech.
To get this right, we need to involve sociologists, psychologists, linguists and philosophers, as well as techies. Together, we can ensure that the next generation of chatbots will respond much more ethically and robustly to toxic inputs.
In the meantime, while the AutoCounterspeech prototype is far from perfect (have fun trying to break it) it has at least demonstrated that automated systems can already counter offensive statements with something more than mere disengagement and avoidance.
This article is authored by Marcus Tomalin, senior research associate in the Machine Intelligence Laboratory, Department of Engineering, University of Cambridge. It is republished from The Conversation under a Creative Commons license. Read the original article.