New deep learning algorithms could improve robot sight

Nicole Laskowski, Senior News Director

Listen to this podcast

Self-driving cars, as brilliant as they are, are not nearly as good as humans at dealing with uncertainty. David Held has an algorithm for that.

Podcast

In the sixth episode of Schooled in AI, David Held, assistant professor at The Robotics Institute at the Carnegie Mellon University School of Computer Science, talks about why it's important for robots to operate in a changing environment.

In this episode, you'll hear Held talk about:

Why uncertainty presents a big challenge to robots operating in the real world
How he's building new deep learning algorithms to push the learning envelope
What he hopes training a robot will one day look like

To learn more about Held and his research in deep learning algorithms, listen to the podcast by clicking on the player above or read the full transcript below.

Transcript - New deep learning algorithms could improve robot sight

Hey, I'm Nicole Laskowski, and this is Schooled in AI.

One of the biggest obstacles for fully autonomous vehicles is uncertainty. Let me explain what I mean by that: When we're out there driving on the road, we encounter elements that are simply beyond our control. Other vehicles, pedestrians and bicyclists can behave unpredictably. And we rely on intuition to guide our decision-making or we use tools like making eye contact or giving a friendly wave to communicate. But autonomous vehicles lack tools like these. And it's a problem researchers like David Held are trying to solve.

David Held

Held is working on technology to help robots see better -- and to understand what they see more effectively. He is an assistant professor at The Robotics Institute at CMU, and he describes his research as being "at the intersection of robotics, machine learning and computer vision."

Improving the perception system in autonomous vehicles is just one application of his research, another is helping robots operate in the home. Those two things -- a vehicle that safely drives itself and a machine that successfully sorts and does your laundry -- might not appear to have anything in common. But, as Held explains, he's focused on helping robots function in a changing environment -- in cluttered, messy, unpredictable settings. And he's having to develop new deep learning algorithms to get there.

To start, here's a little from Held on the research he did as a Ph.D. student at Stanford University with the perception system for self-driving cars.

David Held: The basic idea is there is a laser and a camera mounted on top of the vehicle. And the laser and camera are both used to record what's around the self-driving car.

The laser emits beams and measures how long it takes for each beam to return to the device. That's used to determine how far away an object is. The camera, on the other hand, provides another data point: the color of an object.

Held: My work was focused on how do we handle complex, really crowded settings, like if you're in a crowded urban environment and you have lots of people, bikes and cars all near each other, how do we make a car understand such a complex environment.

And then during his postdoc at University of California, Berkeley, Held focused on robots and task manipulation.

Held: You can imagine you have a robot with two arms and the goal is to pick up objects and do things like picking up a ring and putting it on a peg, putting a can through a hole.

One approach was to build up a robot's knowledge the way you would educate a budding mathematician.

Held: You don't teach someone calculus right away; you might teach them arithmetic first and then algebra and then build up to calculus. You need to teach things in order from easier to harder -- and always teach things at the appropriate difficulty level for the students.

He developed a method to automatically determine what makes a manipulation task easy and what makes it hard for the robot to do, and then he used that information to target the right level of difficulty for the robot.

Held: If you have a robot, and you want to teach it to manipulate objects, can you have it understand where those objects are in a crowded, say, tabletop environment? So, let's say you put a robot in your kitchen. It's a mess because you didn't clean up; now, the robot needs to be able to find all the ingredients and then figure out how to cook something or find your dirty laundry and sort through this pile and do your laundry. So, how can you make robots do manipulation tasks in crowded environments with a wide diversity of different types of objects?

The results for this progressive style of learning have been promising. Held said he presented his work at the first-ever Conference on Robot Learning, or CORL, last fall, where he was able to show that this method successfully taught a robot how to put a ring on a peg. Don't be fooled by how simplistic the task sounds.

Held: This is a test that traditional learning-based methods have a really hard time with because it's a complex maneuver -- you have to lift a ring over the peg and put it down. And it's a very tight fit between the ring and the peg, so you also have to think about the contacts between the ring and the peg as you're trying to maneuver it on. But through our approach of learning from easy to hard, we were able to actually accomplish that task.

In some ways, Held is building this learning approach from the ground up. And by that I mean he's developing new deep learning algorithms to train and teach robots -- algorithms that can function successfully in a world of uncertainty.

Held: If you think about some of the traditional approaches people typically take, they involve writing down a physics-based model of a scene and then doing an optimization to figure out what the robot should do based on the physics. But part of the problem is that there are many settings where it's hard to write down what the model is or there are lots of uncertainties or unknowns that you don't even know how to model.

Let's go back to driving. When we drive a car down a street, we rely on lots of patterns -- we know what to do at a yield sign, we don't drive down one-way streets. But we can't account for everything we may encounter -- no matter how many years of experience we have or miles we've clocked. That's also true for a self-driving car.

Held: It might encounter the position of the people, bikes and cars all change or it might encounter some object that fell off a truck onto the road. So, it has to be able to handle a high degree of uncertainty, and that's something current methods aren't fully equipped for -- especially if the uncertainty happens in a large group.

Algorithmic problems exist for task manipulation as well. It can be hard for a robot to determine what right actions to take to complete a complex task efficiently.

Held: If the robot is trying to figure out how to put a ring on a peg, there are many different possible ways it can move the ring, but only a small number of ways that are actually going to lead it to doing the task. So, it's a huge search space and figuring out how to guide the robot to find the right action is challenging.

But Held's research also relies on the combinatorial effect of big data and massive processing power as well as the open source nature of AI. Image classification systems, for example, have been extremely helpful with his research in autonomous vehicles.

Held: One of the interesting things that people have found in the deep learning era that I think wasn't necessarily expected is how transferable neural networks are from one task to another. So, we were able to actually take the network that was just trained for clustering what's in an image and reuse it for tracking pedestrians or other objects in a moving video where there are lighting changes and other things going on that you have to account for.

We had a large data set of video that had already been labeled for us. We took that video, we designed a slightly different neural network for our task of tracking, and then we used a deep learning library that a group at Berkeley had written called Caffe, and this library was really designed to work with GPUs. So, we didn't have to do any particular hardware integration for that. A lot of the work of data labeling and setting of the deep learning library and connecting to the hardware is taken care of.

Because of all of that labeled data, Held and his team were able to build a robust application that can handle different circumstances an autonomous vehicle might encounter. But, he said, there's still more work to be done -- both with autonomous vehicles and with task manipulation.

Held: One of the challenges is how do you scale this up if you have a problem that's even more complex. Do we just collect more data? Or one of the approaches that a lot of people have been thinking about recently is how can we work with unlabeled data? So, if you just have a car drive around by itself and observe lots of images but you don't have a human labeling each one, can it learn? Or if you have a small number of human labels and lots of data that no one's labeled, can you combine those data sources to scale up your learning?

He's also interested in exploring whether robots can learn from humans just by watching them complete a task. So, rather than writing down a physics-based model of the world for the robot, can it learn by observation? Held said there are lots of challenges to this approach, including mapping the human body to the robot body. But he sees the promise in developing such a method.

Held: I think that if we can achieve this, then we can teach robots much more easily to do a wide diversity of different types of tasks. It will also lead to very large benefits in terms of enabling casual users without programming experience to be able to train robots. And it's also technically interesting in terms of combining both perception of watching what someone else is doing and actions in terms of what the robot should be doing.

+ Show Transcript