Google AI researchers recently published tangible results from the tech giant's Project Euphonia to create better AI-driven tools for understanding impaired speech.
In an Aug. 13 blog post, the researchers noted particular successes they had achieved in work with ALS patients.
The apparent conversational AI training advance came as tech vendors have made major strides in developing ever better natural language processing (NLP), natural language generation (NLG) and conversational technologies and products as the demand for such products has soared.
Enterprises increasingly deploy chatbots and conversational agents to augment customer interactions, enhance marketing efforts and simplify data querying, and consumers have begun to rely on AI assistants in their smartphones and chatbots on the other side of a helpline.
In recent months, major tech vendors have published research papers and have claimed to have made advancements in NLP and NLG technologies, as well as with conversational AI training methods. The efforts of Google, IBM and others demonstrate the demand for these technologies, and the competitive, fast-paced nature of developing them.
IBM conversational AI training techniques
IBM in October 2018 published results -- later updated in July 2019 -- from a new set of data an IBM research team created to improve a conversational agent's ability to offer helpful suggestions within a group chat that may have multiple conversations happening at once.
Now, explained Luis Lastras, principle research staff member and senior development manager at IBM Watson, AI models have trouble accurately processing and responding to multiple conversations.
"Usually humans do well when determining when response matters to them or not, but a machine doesn't have that capability today," said Lastras, who led the team that authored the paper.
Luis LastrasPrincipal research staff member, IBM Watson
To create a conversational agent that could operate well in a group chat situation, the team had to create a massive data set, around thirty times larger than typical data sets, to teach the model. The dataset, according to the paper, contained "77,563 messages manually annotated with reply-structure graphs that both disentangle conversations and define internal conversation structure."
Manually annotating that data was time consuming, and Lastras said he couldn't comment on whether IBM would commercialize the conversational AI training techniques.
He noted, however, that enterprises typically already have enough data needed to train a model. The training method, he continued, could become important as more enterprises rely on multi-chat environments like Slack or Teams.
In a separate paper published in July 2019, Lastras' team highlighted a training method for end-to-end dialogue systems in call centers that would use a combination of human agents and conversational AI agents to better handle customer questions.
A traditionally built model might fail or respond incorrectly if a human issues a response it has not been trained on, said Jatin Ganhotra, a research engineer at IBM Watson and an author of the paper.
The conversational AI training method IBM proposed would have a human agent intervene when a model cannot understand a response. The customer would then get the correct response, and the model would then learn from the human agent's response for the future.
The premise, however, requires that a human agent always gives a correct response, which Ganhotra said shouldn't be a problem for large-scale enterprises.
Enterprises typically maintain smoothly run call centers staffed with experienced employees, he said. If a human agent cannot successfully answer a customer question, then it usually is passed on to a supervisor, and the model can then learn from the supervisor.
The conversational AI training method puts the customer first, Ganhotra said, ensuring that first and foremost, they receive the help they need.
Google AI and ALS
Meanwhile, for the Google automatic speech recognition models to understand impaired speech, the models are first trained on thousands of hours of non-regionalized and non-impaired speech, the blog post and an accompanying research paper noted. After the training, the models are then fine-tuned on a much smaller personalized dataset using specialized neural networks.
In this case, the smaller dataset, provided through a partnership with the ALS Therapy Development Institute, contained 36 hours of speech from 67 speakers with ALS reading fairly simple sentences. The results, though early, proved promising, Google said.
"We train personalized models that achieve 62% and 35% relative WER (word error rate) improvement on these two groups, bringing the absolute WER for ALS speakers, on a test set of message bank phrases, down to 10% for mild dysarthria and 20% for more serious dysarthria," the paper notes. Dysarthria is a speech disorder caused by muscle weakness, which ALS patients can suffer from.