Podcast: Sama responds to AI data labeling criticism

The data annotation and labeling vendor, which has faced criticism for its practices in Africa despite its social good mission, is out with a new platform to reduce model failure.

Data labeling and annotation vendor Sama seeks to make an impact not only in the tech market, but also in parts of the world where it's hard for people to partake in the digital economy.

As a women-led B Corporation chartered to do social and environmental good, Sama employs numerous people in countries such as Kenya, CEO Wendy Gonzalez said on the latest episode of the Targeting AI podcast from TechTarget Editorial. She said the company has created more than 10,000 jobs in those regions.

Yet Sama has faced intense criticism for paying substandard wages to workers in Africa and also subjecting them to inhumane work environments by requiring them to view and then label offensive and violent images.

On the podcast, Gonzalez blamed some of the practices on Sama's former client, generative AI giant OpenAI. She also argued that her company created decently paying jobs for people who otherwise would have trouble gaining employment.

"It went beyond the boundaries of work that we were comfortable doing," Gonzalez said. "It was only in existence for a handful of months."

Meanwhile, Sama's business mission is to help enterprises minimize the risk of AI model failure using its data annotating services.

New multi-cloud integration

Most recently, on Jan. 24, the vendor introduced a multi-cloud integration strategy in its platform to increase the speed of new project onboarding.

The integration allows enterprises to keep their data on one of the three top cloud providers -- AWS, Microsoft and Google -- while still giving Sama access to the data. It also enables faster onboarding to the Sama platform and an integration suite compatible with Python SDKs and the Databricks platform.

The integration reduces the cost of data egress because it eliminates the need for organizations to move data around in a multi-cloud model deployment, Gartner analyst Sid Nag said.

"It speeds up application development via integration with other SDKs and programming language models while conforming to compliance and security models," he added.

However, it's unclear how the Sama product gets access to the data contained in an organization's primary cloud provider, Nag continued.

Ethics of data annotation and labeling

While Sama has found success in the data annotation niche, it has navigated a turbulent history in Africa.

Sama came under fire while performing contracted work for OpenAI in November 2021. On behalf of OpenAI, Sama hired data labelers in Kenya for a take-home pay of about $2 per hour. The labelers were charged with trying to remove toxic data from the training data sets of tools such as ChatGPT.

It went beyond the boundaries of work that we were comfortable doing.
Wendy GonzalezCEO, Sama

However, some of the workers accused Sama of making them read sexually disturbing texts while paying them unfairly low wages.

Although the work was beyond the norms of what Sama says it usually does in regions such as Kenya, the incident still raised questions about the ethical implications of data labeling and what human workers are asked to do when removing toxic data from generative AI systems such as ChatGPT.

For Gonzalez, it has to do with the types of jobs available for workers such as those in Kenya and how those workers can be a part of the digital economy.

"If there were plentiful jobs, meaning you sort of take it or leave it, then that would be amazing," she said on the podcast. "But that's not the situation. Being able to have people from around the world -- globally in particular, the ones that have the greatest barriers to employment -- have access to the digital economy is important."

Complete and effective data is also important, she continued.

"You need a human in the loop to then validate that the AI or the model is interpreting that data as expected," Gonzalez said. "If it isn't, then you need to be able to flag that, and then reflect and retrain that model."

Esther Ajao is a TechTarget Editorial news writer covering artificial intelligence software and systems. Shaun Sutner is a journalist with 34 years of experience, including 25 years as a reporter for daily newspapers. He is a senior news director for TechTarget Editorial's information management team, covering artificial intelligence, customer experience and unified communications software, and analytics and data management technology. Together, they host the Targeting AI podcast.

Dig Deeper on AI business strategies

Business Analytics
Data Management