Enterprises are crowdsourcing AI development

By crowdsourcing AI development, enterprises can broaden the knowledge base of their machine learning applications, and early adopters are showing promising results.

The idea that a collection of people can make better decisions has been around since the early days of democracy. Over the years, statisticians discovered that the wisdom of crowds could even be harnessed for analytical decision-making.

Now, AI researchers are pursuing several different approaches to combine AI and crowds to make more informed decisions, improve predictions and advance data labeling for machine learning.

There are certainly obstacles to crowdsourcing AI development. First, you need to know how to create an effective feedback loop between the AI and participants. It's also important to note that not all individuals perform a given task the same way. Some thought must be given to finding the right weight for individual input.

But due to the potential benefits of pairing crowdsourced wisdom with AI, enterprises and software vendors are working to overcome these challenges.

A long history of the wisdom of crowds

In the early 1900s, Sir Francis Galton discovered that the mean weight guesses of a crowd were often more accurate than those of experts. Over the years, researchers have toyed with variations of this basic notion to improve the accuracy of collective predictions.

More recently, the U.S. Intelligence Advanced Research Projects Activity launched the Hybrid Forecasting Competition to improve prediction via human and AI integration. The goal of the project is to improve the accuracy of predictions of worldwide geopolitical issues, including foreign elections and disease outbreaks. The program is working with researchers from several universities to test out new human machine interfaces that can be combined with machine learning in various ways.

Improving predictions with swarms

Meanwhile, private companies are working to develop crowdsourced AI tools that may have more immediate applications outside of the intelligence community.

For example, Unanimous AI is working with algorithms modeled on the natural principle of swarm intelligence to connect groups of people together in real-time via a collaboration platform. Teams can share data visualizations and analyses, enabling them to think together as a hive mind and converge on optimized solutions.

"We believe this approach will enable human groups to form super-intelligent systems, achieving remarkable amplifications of insights over the next few years," said Louis Rosenberg, CEO at Unanimous. There is some evidence that feedback loops that facilitate communication in swarms can outperform crowds of individuals at a given task.

The Unanimous Swarm AI platform has been used in a variety of disciplines. For example, Bustle Digital Group, a web publisher with titles aimed primarily at millennial women, used the Swarm platform to make optimized sales forecasts of women's garments during the most recent holiday season. Boeing used Swarm to optimize the input collected from military pilots regarding the design of aircraft cockpits. Stanford Medical School used Swarm to generate optimized diagnoses from small groups of practicing radiologists, reducing diagnostic errors.

Better discussions

IBM recently debuted its Project Debater platform in a live matchup with one of the world's top debaters. Although it lost its first debate, which was held at IBM's 2019 Think conference in San Francisco, it demonstrated the ability to mine large bodies of texts generated by crowds and organize them into arguments in a discussion.

One component of this platform, called Speech by Crowd, used natural language processing (NLP) to mine discussion boards and summarize a collection of arguments.

"It is challenging because you can imagine different people making the same argument with different words," said Ranit Aharonov, manager of the Project Debater Team at IBM.

One of the biggest challenges lies in finding an appropriate way to structure natural text. To overcome this challenge, Epistema, an Israeli software vendor that is building a knowledge management and decision-support tool, has developed a system that uses structured questions that can capture different lines of arguments in a graph knowledge base. The idea is to enable semantic analysis that surpasses more typical NLP techniques, like those used in Project Debater, said Joab Rosenberg, founder and CEO of Epistema.

The Epistema crowdsourcing AI tool has been used by investment researchers to answer questions like "What will oil prices be in 2019?" and "Will Iran withdraw from the nuclear agreement in 2019?" It has also been used by business users to answers questions such as "What should our priorities be in the next quarter?" and "Why are we losing customers in the Asia markets?"

Improved data labeling

One of the earliest uses of Amazon's Mechanical Turk crowdsourcing service was to label data for machine learning consumption.

This approach has expanded to support projects like the University of California, San Francisco (UCSF) and IBM's website, which recruits thousands of eyes to examine scientific images on behalf of researchers. Although each participant may not be a trained scientist, his or her mistakes tend to cancel out when combined using the right algorithms, said Zev Gartner, associate professor in UCSF's Department of Pharmaceutical Chemistry.

Figure Eight Inc., a data labeling and enrichment service based in San Francisco, has been using AI to streamline this process so that workers can do more, faster and at a higher quality. The Figure Eight software has been used to detect toxic speech on comment boards, to train AI to detect retina separation and tears in medical images, to identify branded products in images, and to transcribe speech.

One of the biggest challenges of crowdsourcing this aspect of AI development is detecting and eliminating bias from training data sets, said Ben Kearns, vice president of engineering at Figure Eight. For example, in sentiment analysis, the meaning and toxicity of words, phrases and images varies based on the reader's cultural norms. Words and images that would subjectively be classified as being derogatory in Cleveland might not be the same as those in New York.

"We believe that the future is machines automating the repetitive tasks, while humans do things that are more and more complex or less well-defined," Kearns said. "Humans are better at work that requires more judgment than is currently available in the state-of-the-art AI."

Dig Deeper on Artificial intelligence platforms

Business Analytics
Data Management