Happy to share my paper on Learning to Exploit Elastic Actuators for Quadruped Locomotion.

We learn to trot/pronk in only 10 minutes, directly on the real robot 🐈.
Paper: https://arxiv.org/abs/2209.07171

#RL #reinforcementlearning #reinforcement #learning #robot #robots #locomotion

Learning to Exploit Elastic Actuators for Quadruped Locomotion

Spring-based actuators in legged locomotion provide energy-efficiency and improved performance, but increase the difficulty of controller design. While previous work has focused on extensive modeling and simulation to find optimal controllers for such systems, we propose to learn model-free controllers directly on the real robot. In our approach, gaits are first synthesized by central pattern generators (CPGs), whose parameters are optimized to quickly obtain an open-loop controller that achieves efficient locomotion. Then, to make this controller more robust and further improve the performance, we use reinforcement learning to close the loop, to learn corrective actions on top of the CPGs. We evaluate the proposed approach on the DLR elastic quadruped bert. Our results in learning trotting and pronking gaits show that exploitation of the spring actuator dynamics emerges naturally from optimizing for dynamic motions, yielding high-performing locomotion, particularly the fastest walking gait recorded on bert, despite being model-free. The whole process takes no more than 1.5 hours on the real robot and results in natural-looking gaits.

arXiv.org
@araffin Nice work! I was wondering what optimization algorithm you were using that is sample-efficient enough for real trials. The paper says black-box optimization (BBO). Is that a way of saying "we won't disclose", or am I missing something obvious?
@maxy "To tune the parameters of the open-loop controller, we
use Bayesian optimization, as it is sample efficient in low-
dimensional search spaces [23]. We use the TPE [39] im-
plementation from the Optuna [40] library."

@araffin Thanks. After re-reading it's clear.

I didn't make the connection from that section to the term BBO when reading the first time. (And later I was searching for "BBO", expecting to find the algorithm name somewhere close by.)