SFIO CRACHO - stock.adobe.com

Former Google exec on how AI affects internet safety

Longtime trust and safety leader Tom Siegel offers an insider's view on moderating AI-generated content, the limits of self-regulation and concrete steps to curb emerging risks.

The social media era saw an increase in the volume of online hate speech, misinformation and abuse. And with the rising popularity of AI-powered text and image generators, the content landscape of the internet is likely to transform yet further.

Trust Lab co-founder and CEO Tom Siegel has a unique vantage point on these challenges. During his nearly 15-year tenure at Google, he led the trust and safety team before leaving to establish Trust Lab, a startup developing software that aims to detect harmful content.

Harm on the internet predates the recent generative AI wave, of course, and Trust Lab was founded in 2019 -- well before the launch of ChatGPT, Midjourney and other tools that have sparked recent conversations on AI's trust and safety implications. But Siegel believes that in the years ahead, identifying and combating new forms of AI-enabled abuse and misinformation will be critical.

Siegel sat down with TechTarget Editorial to discuss challenges emerging around AI and the internet. In this interview, he weighs in on impending tech policy, the limitations of voluntary industry commitments on AI ethics, and the risks and opportunities that lie ahead for tech companies and regulators.

Editor's note: This Q&A has been edited for clarity and conciseness.

What led you to found Trust Lab? What problems were you hoping to solve?

Tom Siegel: I joined Google very early on, before it went public. I was responsible for the area which now we call trust and safety, then we just called content quality. Initially, there was only one product, Google web search. But as the company started expanding, we built teams that could identify fraud and scams -- and over time, misinformation, hate speech online, manipulation and account takeovers. By the time I left, it had become a very large team with thousands of people.

But while we were working really hard and putting a lot of resources into [trust and safety] -- not just Google, but the whole industry -- things were getting worse. You had a lot more harm for vulnerable groups, election interference, state-sponsored actors getting involved, terrorism recruitment online. There was a lot more concern over the years about what the internet had become and, in particular, the pillars of the internet economy and what their contributions were. And they're often seen as mainly negative at this point.

I decided I had to leave to start a company that hopefully can make the internet a better place by tackling it differently, [rather] than from inside one monolithic corporate entity. We gave it a good 20 years or so to let the internet flourish and see if it could take care of itself, and I just don't think it's working. I don't think we can rely on large social media companies to do the right thing. I don't believe that we can really leave it up to them. At the same time, asking governments to regulate speech online is equally unsettling, if you think about it. You have different governments with different motivations, and certain governments have started to misuse it.

Impending regulations such as the EU Digital Services Act are poised to make platforms more accountable for harmful content. How significant of a change do you think this will be, particularly for AI companies?

Siegel: I do think we need guardrails in the absence of industry standards. And the Digital Services Act is a very sensible approach to that. It focuses on illegal [online] practices that are also illegal offline. [The GDPR] has really reshaped how companies think about users' private information, and the Digital Services Act is doing something very similar for user-generated content online. We're still very early -- the EU is building out the enforcement mechanisms and teams, and a lot of the specifics are still forthcoming. But I think it is a very important change for the industry.

AI companies have unique challenges. They are covered by this like everyone else, right? Even though the user here is not a physical person, necessarily, [AI] is still generating content for which very similar rules will be applied: It can't be illegal. It can't be harmful. Policing it is a lot harder for [AI companies] because often, in particular with generative AI, you can't really predict what the outcome will be. There's a lot of work underway trying to understand what the impact will be from a regulatory point of view.

In the U.S. context, a small group of major AI companies recently agreed to a set of voluntary safeguards for AI development. You've expressed some skepticism in this conversation about self-regulation. What concerns you about those voluntary commitments?

Siegel: It's definitely a good first step. Doing something is better than doing nothing, and I think this is going in the right direction overall. But it does not go anywhere near where it needs to go to have a real impact. If you read through it, it's very general language. It's very aspirational. There are really no details. It's demonstrating goodwill without really being able to show very specific, concrete steps.

And many of the companies [that made these commitments] have not really published their own AI guidelines. What are they currently doing internally to assess the efficacy of these models? What are they doing to ensure safety? Do they use keyword lists, and what are they like? None of this has been shared transparently.

We have other examples of voluntary commitments, like the Code of Practice on Disinformation in the EU. It's a bit of a mixed bag because if it's a self-commitment, there's also often a self-serving element to that. But it's also understandable because the regulators themselves don't really know either. It's not like we're doing this as an industry because we're trying to game the system -- there just aren't any clearly emerging [examples of] 'here's how you do it.'

What is clear to me is that we need more stringent safeguards. You wouldn't have anyone develop drugs and then sell them without going through rigorous FTC [Federal Trade Commission] testing. You wouldn't allow someone to offer airplane rides in self-built airplanes if the FAA [Federal Aviation Administration] hasn't signed off on it. You wouldn't allow anyone to build a nuclear power plant without some safety regulator having signed off on it. I think AI algorithms, or generally algorithms on the web, need to have these safeguards from an independent third party.

You mentioned the lack of concrete actions. What are a few steps that the tech industry could take to curb these risks?

Siegel: One is very specific: third-party independent measurement. Already for AI models, you have leaderboards from independent, reputable organizations like universities that show models' accuracy or potential harmfulness. They're not perfect, but that's a good step. Transparency, in general, is a really important aspect. When do you update models? What went into these models? What data sets? Were these data sets debiased? What is the history? What is the explainability of the model's results?

We have to put a lot of scrutiny on these foundational models because they will affect everything else that's built on top of them.
Tom SiegelCo-founder and CEO, Trust Lab

NIST, IEEE and other industry organizations have started to publish certain standards that they would like to be used. The problem is that everybody has their own set of standards, and there really is no industry agreement. We could probably take some of those proposals and all agree on them. I don't think we have to be perfect. It would just have to be something that's tangible and that allows for direct comparison and very transparent reporting of the findings on these models' performance.

I also think there needs to be a lot of scrutiny on foundational models, because there will not be a lot [of them]. [Building a foundational model] takes a lot of resources, a lot of effort. You have ChatGPT, Bard, Llama -- there's probably a handful, but not dozens and dozens. We have to put a lot of scrutiny on these foundational models because they will affect everything else that's built on top of them. I think that can be done from a regulatory perspective.

What would you like to see as the role of regulators or policymakers in this space?

Siegel: They need to set the guardrails. I think they should fund a standalone organization that is hopefully free from political influence, that can really build the technical expertise to independently build frameworks and very tangible guidelines and enforcement mechanisms. And we have examples in other areas: the FTC and other [agencies] that have government oversight and funding. Not so much oversight that they should be allowed to meddle in the interpretation or prioritization of what needs to be done, but that have stable funding and can build industry alliances.

It's obviously hard with private companies that are naturally also competitive. But in an important area like safety, [collaboration] is going to be required because we do need standards in the end. It's not just the measurement that I mentioned -- it's also how you generate ground truths. We need a way to compare and assess the impacts. I think there's a societal benefit of having standards, particularly in a highly consequential and risky technology area. It's not all bad, but there are safety risks, and I think we're best off handling them as an industry together.

You've said that Trust Lab's recent red teaming of ChatGPT found concerning levels of harmful content in the model's responses. Could you expand on that testing and your findings?

Siegel: We've been looking at generative AI models and to what degree they give us inconsistent or harmful responses. For queries that are meant to elicit harmful responses, like 'tell me how to steal' or 'tell me how to harm,' we see that over a third of responses are concerning. Then we're finding that over half of [those responses] are actually realistic, in the sense of being something that someone could do.

The models are getting better as we've been tracking this over the last two or three months, particularly through manual interventions. I think some of the AI companies are starting to create boundaries. The problem is that a keyword-based approach is a very blunt hammer.

What we're seeing is almost an overregulation in some areas -- an overly restrictive use of keywords, but also still a ton of harmful responses. Given the way the technology works, it's going to be very difficult to use that kind of blunt instrument to create a very safe and still useful product. But that's where we are right now. And it is concerning, if you think of the wrong user groups getting access to it -- for instance, children or folks who may not be able to assess the risk or harm that can come with those responses.

We're tracking this all the time, but I think there needs to be a lot more transparency and measurement. It's very hard to do this in a statistically significant manner from the outside in. Right now, a lot of this will look anecdotal because you'd have to have access to all user queries and take a sample if you really wanted to put statistically significant statements behind it.

What do you see as key areas to address in the near future at the intersection of AI and internet safety?

Siegel: We will obviously see a massive increase in content in general because it's a lot cheaper to generate. We're going to see a mix of user-generated and machine-generated content. The border between what is AI and what is human-generated is starting to be fuzzy. It's going to be much, much harder to identify fraud and misinformation because it's becoming so much more humanlike. We don't think we're going to see as many blatant lies as much as a lot of low-quality content that will drown out good content.

I think deepfakes will become a bigger problem. With more than half of the democratic world having national elections in the next 18 months, I think you're going to see how [AI] is going to be used for misinformation to manipulate outcomes of democratic elections. Obviously, for every attack, there's a defense, and you're going to see that cat-and-mouse game playing out. But it's going to be very important to stay on top of the safety threats.

There are also some really amazing positives that can come out of it. Until now, we've had a lot of manual content moderation, where you have humans labeling content, usually in low-cost locations. There are a lot of concerns around that -- how effective it is, but also what that does to the human condition. You can actually automate a lot of [moderation], which is going to be better for people and cheaper for companies. There's some absolute upsides to the technology, but the risks that come from more humanlike gray-area content will be significant.

Next Steps

How to prevent deepfakes in the era of generative AI

Dig Deeper on Artificial intelligence platforms

Business Analytics
Data Management