Unmanned ground robots will be trained to receive demonstration commands – instead of verbal commands – to interpret, follow, recall and apply in similar contexts as part of a new US Army research project starting this month with the University of Texas at Austin.
Autonomous robotic systems will first be taught procedures for tactical behaviours through teleoperation and once they learn where and how to safely move around, through and over simple and complex terrain, they can apply that knowledge on their own when facing particularly ruggedised, unfamiliar territory.
Researchers from the US Army’s Combat Capabilities Development Command’s Army Research Laboratory and the university will work jointly to develop autonomous system behaviours using the laboratory’s Autonomy Software Stack. The stack, which was developed under the decade-long Robotics Collaborative Technology Alliance, is a suite of software algorithms, libraries and software components that perform specific functions that are required by intelligent systems such as navigation, planning, perception, control and reasoning.
“Allowing autonomous systems to learn new behaviours after fielding would be a significant step forward in how the US Army trains, approves and integrates autonomy into units” said Dr Craig Lennon, an US Army researcher.
The US Army is developing an autonomous system to interactively learn from a soldier and once the soldier understands the robot’s degree of confidence in applying learned behaviour to new, difficult tasks, soldiers can make informed choices about how to use their robotic systems, Lennon said.
This could open opportunities for robotic systems that learn and execute new behaviours in operational environments without coming home again to be checked, he added.
The US Defense Innovation Board highlighted a challenge within the test and evaluation of systems when employing machine learning techniques. In its 2019 AI Principles report, the board pointed out, “for systems that learn over their lifetime, challenges remain for continual certification that these systems do not learn behaviours outside of their intended use”.
In the interim, Lennon said, the US Army’s corporate research laboratory is exploring potential solutions to discover methods to provide assurance for autonomous systems that learn real-world scenarios, such as crossing a novel obstacle, in the field.
“Suppose the robot has already learned procedures for crossing a danger area but has never crossed a river with the load that it’s now required to carry,” he said. “During mission rehearsal, the soldier finds a river in a friendly area, demonstrates the river crossing by tele-operating the fully-loaded robot across and repeats demonstrations until the system can provide assurance it has learned how to safely execute the behaviour while conforming to the tactical procedures for crossing a danger area. Now the soldier can take it on the mission and it can get itself across the river.”
Dr Ufuk Topcu, lead researcher for this work at the University of Texas at Austin, said earlier work showed how the research team can take advantage of existing contextual knowledge in learning from demonstrations.
“The resulting techniques improve the data efficiency and generalisability, and are particularly promising for the data-starved applications of the US Army with a need for verifiability,” Topcu said. “We look forward to extending the techniques and demonstrating their effectiveness on the US Army’s experimental platforms.”
According to Lennon, demonstrations might also be needed for teaching a robot how to perform while damaged when previous behaviours no longer work.
Researchers will test the new software on simulated systems in the first year. By mid-2022, they’re expected to transfer tests to a Clearpath Warthog unmanned ground vehicle, which is designed for rapid prototyping of robotic systems and built for tough environments.
The three-year cooperative agreement between the US Army and the university is expected to result in a system that can interactively learn from human demonstration while continually updating a quantitative estimate of its ability to perform the new behaviours and meet system assurance specifications.