Updates to our Terms of Use

We are updating our Terms of Use. Please carefully review the updated Terms before proceeding to our website.

Saturday, June 29, 2024 | Back issues
Courthouse News Service Courthouse News Service

Watch and learn: Scientists teach robots how to do chores by having them view videos

Two different robots learned more than 10 tasks after they were shown videos of humans doing them.

(CN) — A white mechanical arm with gray accent marks and blue claws creeps into frame, moving methodically, deliberately, like a hunter tracking its prey, as it slowly aligns itself with and reaches toward a knife on a rack.

This is not a horror movie about a killer robot from the future, but rather a demonstration of Carnegie Mellon University researchers' new methods for teaching robots how to learn to do household chores like opening doors and opening cans of soup by having them watch videos of human beings performing simple tasks.       

Their study, which will be presented at the annual Conference on Computer Vision and Pattern Recognition in Vancouver on Wednesday, describes Vision-Robotics-Bridge: a new approach developed by researchers for training robots to move around, do tasks and interact in the world.

Other contemporary robot-training methods require a person to manually demonstrate the tasks or movements they want their robot to do. After that, the robot can do the task only in an environment identical to its training. The Vision-Robotics-Bridge approach trains robots to do similar simple tasks by having them watch what are called egocentric videos, or videos of people doing everyday tasks like cooking and cleaning that are filmed using body cameras, such as GoPros attached to their heads or chests, to get as close to as possible to the viewpoint that humans have as they are doing those tasks.  

Using this approach, researchers led by Deepak Pathak, an assistant professor at the Robotics Institute in the School of Computer Science at Carnegie Mellon University, taught two different robots more than 10 different tasks, including opening a drawer, opening an oven door, taking a pot off the stove and picking up a phone — and the robots can do those tasks in an environment not identical to the ones shown in their training videos. The researchers said they could successfully teach a robot a new task in 25 minutes using the Vision-Robotics-Bridge approach.  

"We were able to take robots around campus and do all sorts of tasks," said Shikhar Bahl, a doctoral student in robotics at the Robotics Institute in the School of Computer Science at Carnegie Mellon University, in a press release. "Robots can use this model to curiously explore the world around them. Instead of just flailing its arms, a robot can be more direct with how it interacts."

Videos of humans interacting with objects in their kitchens were used to train robots with a concept from psychology that has since made its way to design philosophy called affordance. 

The American Psychological Association describes affordance as “any property of the physical environment that offers or allows an organism the opportunity for a particular physical action,” which is comparable to certain animals living in specific ecological niches. In design, affordance means possible actions that an individual, or a robot in this case, can perceive — for instance, the fact that a chair can be sat upon.  

Humans understand how objects in a room or an environment can be used, how to hold them and generally how to interact with them.

“For instance, the oven is opened by pulling the handle downwards, the tap should be turned sideways, drawers are to be pulled outwards, and light switches are turned on with a flick. While things don’t always work as imagined and some exploration might be needed, humans heavily rely on such visual affordances of objects to efficiently perform day-to-day tasks across environments,” the researchers wrote in the study. 

But robots must be explicitly taught everything.

“For VRB, affordances define where and how a robot might interact with an object based on human behavior. For example, as a robot watches a human open a drawer, it identifies the contact points — the handle — and the direction of the drawer's movement — straight out from the starting location. After watching several videos of humans opening drawers, the robot can determine how to open any drawer,” the press release for the study states. 

"We are using these datasets in a new and different way,” Bahl wrote. “This work could enable robots to learn from the vast amount of internet and YouTube videos available."

Categories / Science, Technology

Subscribe to Closing Arguments

Sign up for new weekly newsletter Closing Arguments to get the latest about ongoing trials, major litigation and hot cases and rulings in courthouses around the U.S. and the world.

Loading...