Researchers from the University of Southern California (USC) are developing a system that enables robots to autonomously learn tasks from observing demonstrations.
USC researchers are working on a system that would teach robots tasks such as setting a table or driving a car from viewing a handful of demonstrations. The work has been published in a report Demonstrations Using Signal Temporal Logic.
The report details how the system evaluates the quality of each demonstration, learning mistakes as well as successes. This allows robots to learn from only a few demonstrations rather than current methods which take over 100 demonstrations.
Furthermore, it enables robots to learn intuitively similar to the way humans learn from each other.
Lead author Aniruddh Puranic, a Ph.D. student in computer science at the USC Viterbi School of Engineering, said: “Many machine learning and reinforcement learning systems require large amounts of data and hundreds of demonstrations, you need a human to demonstrate over and over again, which is not feasible.
“Also, most people don’t have programming knowledge to explicitly state what the robot needs to do, and a human cannot possibly demonstrate everything that a robot needs to know. What if the robot encounters something it hasn’t seen before? This is a key challenge.”
There are safety concerns that imperfections in demonstrations can lead to robots learning unsafe or undesirable actions. The research looks to address these issues with Signal Temporal Logic (STL) which evaluates the quality of demonstrations and automatically ranks them to create inherent rewards.
Co-author Stefanos Nikolaidis, a USC Viterbi assistant professor of computer science, said: “Let’s say robots learn from different types of demonstrations, it could be a hands-on demonstration, videos, or simulations, if I do something that is very unsafe, standard approaches will do one of two things: either, they will completely disregard it, or even worse, the robot will learn the wrong thing.
“In contrast, in a very intelligent way, this work uses some common-sense reasoning in the form of logic to understand which parts of the demonstration are good and which parts are not. In essence, this is exactly what also humans do.”
The report used a driving demonstration as an example. If a driver skips a stop sign this would be ranked lower by the robot than in a demonstration where a driver applies the brakes to avoid a crash. The robot will learn from this smart action, adapting to human preferences.
Nikolaidis added: “If we want robots to be good teammates and help people, first they need to learn and adapt to human preference very efficiently.”