How do you design for a conversational interface in which users can ask or say anything? That's one of the challenging questions IBM's Bob Moore and Raphael Arar have faced in their work on conversational design.
In part one of this three-part interview series, "Conversational UX design: What it is and who's paving the way," Moore, a conversation analyst, and Arar, a UX designer, defined the emerging field of conversational UX design and its roots in academia. In part two, "How to make AI agents better conversationalists: Context is key," the collaborators discussed how their approach to AI agents differs from what's currently on the market and how it enables more humanlike conversation.
In this final installment, Moore and Arar discuss the biggest challenge they've faced in designing for conversational interfaces: determining scope and combating the misguided "ask me anything" prompt given by so many AI agents. The solution to a better conversational interface, according to these two experts, is to make AI agents state their specific purpose from the get-go and teach users how to converse effectively with an artificial intelligence. Moore provides a six-action navigation method for users.
Editor's note: This interview has been edited for clarity and length.
What challenges have you faced in designing for conversational interfaces?
Raphael Arar: I would say, determining scope. One of the guidelines that we're really trying to set is to have the [AI] agent lay out what its scope is within the first turn of the conversation. 'Hi, I'm so and so. I can help you book travel flights.' Then, when a user asks, 'What movies are playing?' the agent might respond, 'I'm sorry. I don't know what you mean.' But you have laid down the foundation of what the scope is.
Raphael ArarUX designer, IBM Research
[That introduces] challenges from a design standpoint. When you think about traditional user experience design or interface design, designers had a lot of constraints. If you use a mobile app today, you only have a certain number of buttons that you can push or things that you can do. It's very constrained in that respect, which is a good thing. It's almost a testament to good design -- to remove the guesswork from the end user's process.
With conversation, it's a whole other game. The number of things that a person could say to an agent is infinite. So, that presents a challenge: to really anticipate the types of things that a user might say and then determine what of those things are actually valid and in scope and something that the agent should actually be able to handle. That takes a lot of iterations and a lot of back and forth with not just the designers, but also your stakeholders and anyone else who's had some stake in the end product.
Bob Moore: Sometimes, I see systems make the mistake of instructing the user to 'ask me anything' or 'tell me anything,' and that just makes the problem worse. Alexa is a good example of it. It does want to be able to answer anything and everything because [Amazon] has third parties developing all the stuff. But that's a hard use case. In most cases, you want your conversational agent to do some particular thing, and you want it to tell users what it does so they're not lost.
Part of [the solution to] that is we need to provide users with -- and teach them -- a navigation method for conversational interfaces. It's not enough to just say, 'You can say or ask whatever you want.' That's misleading. When we have a new kind of interface, we need to teach users how to use it.
For example, when we first learned the web, we learned that there are these pages that have these weird addresses to get to. And when you get to one, you can you can jump back to a previous one, or you can click on a hyperlink to go forward. We learned this method of navigating the web, and all the other interfaces have their own ways. I think, right now, one of the big UX questions for conversations is: How do you navigate a conversational interface? How should you?
More about Moore and Arar
Bob Moore is a research staff member at IBM Research-Almaden in San Jose, Calif. He is the lead conversation analyst on IBM's conversational UX design project. Prior to working at IBM, Moore was a researcher at Yahoo Labs and at the Xerox Palo Alto Research Center, and was a game designer at The Multiverse Network. He has a Ph.D. in sociology from Indiana University Bloomington with concentrations in ethnomethodology, conversation analysis and ethnography.
Raphael Arar is a UX designer and researcher at IBM Research-Almaden. Previously, he was the lead UX designer for the Apple and IBM partnership and lecturer at the University of Southern California's Media Arts and Practice Division. Arar holds an MFA from the California Institute of the Arts and his artwork has been shown at museums, conferences, festivals and galleries internationally. In 2017, he was recognized as one of Forbes' "30 Under 30" in enterprise technology.
I understand you provide a method for that in IBM's Natural Conversation Framework.
Moore: We do. We recommend a method where there are six actions that the user can always take at any time to help navigate that conversation space.
- You can always ask [the AI agent], 'What can you do?' or 'What do you know?' In other words, you can always ask the system what its capabilities are and start a conversation with a system about its own capabilities. That's critical for exactly the reason that Raphael just said; you don't want people going down a path about movie times when this is a travel app and its capability is booking flights.
- You should always be able to ask the system, 'What did you say?' or 'Say that again.' -- especially if it's a voice interface, because, once the assistant says something [and you don't hear it], it's gone.
- You should always be able to ask the system, 'What do you mean?' and ask it to paraphrase what it said. [The AI agent] should be able to elaborate on what it said and what it's asking you to do, especially when it's giving you instructions. That's help at a local, turn-by-turn level.
- You should also be able to, at any time, say 'OK' or 'Thanks' and close down the current sequence to show that you're ready to move on.
- You should also be able to escape from a sequence or abort a sequence. For example, [if] I ask for movie times and [the AI agent] comes back with something that's the wrong answer, I could say, 'OK, thanks,' and close it and that means we're good. Or I could say, 'Nevermind' or 'Forget it,' and basically escape from that sequence. You should be able to do that anytime so you don't get stuck.
- You should always be able to say 'Goodbye' to end a conversation and show that you're done. That gives the system an opportunity to deal with last topics. If it wants to get your email address or if it wants you to, god forbid, take a survey or something like that, it gives closure. It's ending the session.
That's our recommendation for navigating [a conversational interface]. If we teach the user those things, which are completely built on what we naturally do in conversation, then, if they get lost in the conversation, they know how to get out of a sequence and how to get help for a sequence.