Abstract
Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose FAME, a force- adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously vary- ing arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.
Method
Our framework enables force-adaptive standing by conditioning a base policy on a learned latent representation of upper-body dynamics. The system consists of two main components: (i) an upper-body context encoder and (ii) a base standing policy.
The encoder processes upper-body joint states together with hand interaction forces and produces a latent context vector that captures the disturbance induced by bi-manual manipulation. This latent variable conditions the base policy, allowing it to adapt its lower-body control strategy according to the current upper-body loading condition.
During training, we expose the policy to diverse manipulation-induced disturbances by (i) sampling and applying external forces at the hands and (ii) randomizing upper-body target poses through an upper-body pose curriculum that gradually expands the pose range as standing quality improves.
Baselines: We consider three training variants to isolate the effect of curriculum learning and force-conditioned latent adaptation. Base trains a standing policy with the upper body held at a fixed nominal posture (no upper-body pose curriculum) and without any latent context conditioning. +Curr adds the upper-body pose curriculum, exposing the policy to continuously varying upper-body joint targets during training, but still does not provide the encoder latent. FAME combines both components: the upper-body pose curriculum and the upper-body context encoder, whose latent context conditions the standing policy for force-adaptive balance.
FAME system overview.
Real Experiments
Real experiment results on a full-scale humanoid.
BibTeX
@article{YourPaperKey2024,
title={Your Paper Title Here},
author={First Author and Second Author and Third Author},
journal={Conference/Journal Name},
year={2024},
url={https://your-domain.com/your-project-page}
}