Zffoto - stock.adobe.com

The implications of generative AI for trust and safety

Leaving generative AI unchecked risks flooding platforms with disinformation, fraud and toxic content. But proactive steps by companies and policymakers could stem the tide.

Freely available AI tools such as OpenAI's ChatGPT and Dall-E can quickly generate high-quality, bespoke text and images. But this same capability enables the proliferation of harmful AI-created content.

Although generative AI has potential for positive applications, its rapid adoption requires efforts to mitigate risks and prevent abuse. As AI use becomes more widespread, technologists, companies and policymakers must implement safeguards and change incentives around safety and ethics in AI development.

Since the beginning of the generative AI boom last fall, "there's already been a pretty significant impact on the internet writ large," said Jesse Lehrich, co-founder of Accountable Tech, a nonprofit tech watchdog group. "Generative AI has become so widely embraced, and companies have been so eager to deploy tools that leverage it and seize on the moment in the AI hype cycle. ... I think the internet has very quickly been flooded by a significant amount of content that is AI-generated."

Proliferation of high-quality harmful content at scale

In addition to a range of positive and productive applications, generative AI is already being used to create harmful content, ranging from disinformation to phishing scams to abuse and hate speech.

"Generative AI has good and bad components to it," said Matar Haller, vice president of data and AI at ActiveFence, a content detection and moderation company. "And if we think about the bad components -- basically, it's lowered the bar of entry for malicious actors."

A key appeal of AI text and image generators is their ability to quickly produce high-quality output that can rival human-created work. But this capability also makes it easier to produce more compelling and convincing harmful content. While disinformation, fraud and abuse certainly aren't new risks online, generative AI changes the scale.

"I think there's a pattern that has emerged on the internet, which is, as capabilities are increasingly democratized, volume increases," said Reggie Townsend, vice president of data ethics at analytics company SAS and a member of the U.S. National Artificial Intelligence Advisory Committee (NAIAC). "And so you see a higher volume of good stuff -- people trying to do the right thing -- and you see a higher volume of disinformation, misinformation -- people trying to do malicious things."

Historically, creating harmful content has typically required sacrificing either quality or scale. In phishing attempts, for example, attackers could produce either massive amounts of low-quality messages or smaller, targeted spear phishing campaigns. While the latter tends to result in more believable messages that trick more victims, it also requires a significant investment in time and effort.

"Now, suddenly, you could do both," Haller said. "You could output extremely high-quality [content], but also at scale."

Generative AI lowers barriers to creating harmful content

To inform its content detection models, ActiveFence monitors threat actor communities on dark web forums and instant messaging channels. Earlier this year, the company released a report based on this monitoring that found an increase in AI-generated harmful content since early 2023.

One area ActiveFence has seen grow significantly is generative AI-created child sexual abuse material (CSAM), Haller said. In monitored communities, ActiveFence has noticed an uptick in conversations that involve sharing tips on tuning models and crafting prompts to generate CSAM. "In the dark web rings, they're discussing it, and they're excited about it," she said. "They're sharing it ... not only the CSAM itself, but also how to produce it themselves."

This highlights one of the risks associated with open source AI models, Haller noted. Open source generative models such as Stable Diffusion can be fine-tuned for a range of specialized tasks, often relatively cheaply.

And while this is an advantage in many business and research contexts, open source models can also be customized for malicious purposes, such as creating CSAM. "It makes it almost easier to consume that kind of content when you can just do it yourself [with generative AI]," she said.

Likewise, Lehrich has concerns about generative AI facilitating creation of believable political propaganda. He contrasted the output of today's text and image generators with Russian disinformation in the 2016 U.S. presidential elections.

Although sometimes compelling, that content often had tells revealing it wasn't written by a native English speaker or someone familiar with U.S. culture and politics. But tools such as ChatGPT can produce well-written English language text, potentially enabling more compelling disinformation at scale.

"It lowers the bar to entry for running that kind of propaganda scheme or disinformation scheme, whether it's for political gain or profit," Lehrich said.

What generative AI's potential harms mean for businesses

These risks might seem far removed from many business AI use cases. But they should serve as a warning of the potential trust and safety implications of deploying generative AI, especially in public-facing tools.

For some things, it may be just a funny mistake or not a huge deal if an AI system gets a fact wrong. In other contexts, it could potentially be fatal.
Jesse LehrichCo-founder, Accountable Tech

"The impact and nature of harm is dependent on context," Lehrich said. "For some things, it may be just a funny mistake or not a huge deal if an AI system gets a fact wrong. In other contexts, it could potentially be fatal."

Haller mentioned an incident earlier this year in which the National Eating Disorders Association (NEDA) announced it would replace human helpline employees with a chatbot shortly after workers voted to unionize. Yet NEDA shut down the chatbot shortly thereafter, following reports that its "advice" in fact promoted disordered eating. User Sharon Maxwell, for example, told NPR that the chatbot offered her weight loss tips such as calorie restriction.

Even when such outcomes are unintentional, the effects are real, underscoring the importance of not rushing into unsafe, poorly considered AI deployments. "Irrespective of the intent, if your impact is one that is disempowering, if your impact is one that is biasing and discriminatory ... we also have a responsibility to correct that wrong," Townsend said.

Over the next year or two, expect some disillusionment around the capabilities of tools such as ChatGPT, especially in business settings, Townsend said. He noted the increasing interest in specialized enterprise generative AI models over their general consumer counterparts. Often, this interest stems from organizations' desire for a secure, customized model fine-tuned on their internal data, but it could also mean less bias and increased safety.

In large data sets derived from web scraping -- such as that used to train ChatGPT -- the most prevalent data isn't always the most accurate and is often more discriminatory, Townsend said. Greater control over training data and practices, in addition to having business benefits, could thus also lead to safer and more trustworthy models.

Incorporating safety by design into AI deployments

To address potential harms, companies must consider safety early on -- including planning how to handle harmful content well before putting anything into production.

"We just need much better, more robust content moderation and safeguards for [AI] outputs," Haller said.

This requires creating accessible tools and guidelines that enable AI developers to build safer products. Tools for moderating prompts and conversations can help ensure enterprise chatbots stay on topic and nontoxic, for example. Other safeguards could include blocking or throttling access for users who repeatedly try to circumvent safety measures and guardrails.

Using AI to make AI safer

Ongoing human oversight remains necessary to update models in response to evolving attempts to "jailbreak" them through creative prompts, as well as to weigh in on tricky edge cases. But automation and AI could help companies scale moderation in response to a growing volume of harmful content, protecting human moderators in the process.

"There's a lot of really horrific content out there," Haller said. "There's no reason for me to expose human moderators to it if I can remove it off the bat."

For example, existing harassment detection models could be incorporated into chatbot applications to detect abusive language in prompts and output. "If we can build tools ... to provide content moderation and incorporate safety by design -- sort of use AI to make AI safer -- then I think that we'll be in a much better position," Haller said.

Generative AI models themselves can create synthetic data that can be used to refine content detection and moderation systems. Artificially generated data that closely mimics a phishing attempt or harassing social media post could be fed into detection models to improve their ability to identify such content in the real world.

"It's literally the same tool, but it's being used for a purpose that's beneficial as opposed to harmful," Townsend said.

The future of generative AI content regulation

Existing and upcoming regulations will shape the future of generative AI development and deployment. But to be effective, they'll need to include changes to the current incentive structure.

If consequences for unethical or unlawful behavior amount to "a slap on the wrist," as Lehrich put it, AI companies won't be motivated to deploy AI responsibly. "Any for-profit business is going to continue to maximize value based on the rules that exist," he said.

Last week, the White House announced that seven leading AI companies had agreed to take certain steps to develop AI responsibly. But some, including Lehrich, are skeptical of voluntary commitments that lack the force of law, pointing to the failure of social media platforms to self-regulate. "If we actually care about curtailing these kinds of harms, then we need actual accountability measures with clear enforcement mechanisms," he said.

Lehrich suggested starting by first fully enforcing existing laws, as many harmful uses of generative AI might already be illegal under current statutes such as federal anti-discrimination laws. Although the specifics of how those laws apply to AI have yet to be worked out, developing that case law could be faster than writing and passing new legislation for a still-nascent technology, he said.

"I think the main way that you change the incentive structure in the immediate term, as we try to enact new laws and regulation, is to bring strong enforcement that causes [companies] to at least rethink what they're doing and what the risks are for violating the existing law," Lehrich said.

Looking ahead, new laws and policies could standardize expectations around risk assessments and monitoring before and after AI deployment. On the model development side, companies might be required to maintain records of what data AI systems were trained on, their models' intended use and foreseeable risks, along with their mitigations. After deployment, this could be paired with ongoing risk monitoring, along with unambiguous prohibitions on certain harmful uses of generative AI.

Education and clarity are key to accurately assessing AI risk

Realistic education about AI's harms is also essential for the public, business leaders and policymakers. Better AI literacy enables informed decisions about its use, and transparency around when users are engaging with AI-generated content could also help address disinformation concerns.

"We think there's an interest -- or we think there should be an interest -- in finding ways of building a broad base of national understanding and awareness around what AI actually is," Townsend said in reference to his work with NAIAC, which recently released its initial recommendations on AI. "Because if it's as ubiquitous a technology as we suspect it will be going forward, then it's important that all of us have some general understanding [of AI], much like we all have general understanding about electricity."

But, Townsend cautioned, that education should avoid excessive fearmongering and an overemphasis on hypothetical concerns. "The conversation about responsible AI has to include responsible rhetoric about AI," he said.

Recent conversations focusing heavily on existential AI risk could cause more harm than good, especially when those conversations happen to the exclusion of discussions of present harms such as algorithmic bias and disinformation. While there are real risks associated with AI, it's crucial to accurately educate people on those risks, including what they have within their control to do about them.

"As a technologist, obviously, I want to see people use these capabilities," Townsend said. "As a citizen, I want to see people use these capabilities in ways that are beneficial to themselves, their communities and the people around them."

Next Steps

Former Google exec on how AI affects internet safety

How to scrape data from a website

Who owns AI-generated content?

Secure your machine learning models with these MLSecOps tips

Dig Deeper on AI technologies

Business Analytics
Data Management