FAME: Force-Adaptive RL for Expanding the Manipulation Envelope of a Full-Scale Humanoid

Institution Name
Conference name and year

*Indicates Equal Contribution

video overview

Abstract

Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose FAME, a force- adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously vary- ing arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.

Method

Our framework enables force-adaptive standing by conditioning a base policy on a learned latent representation of upper-body dynamics. The system consists of two main components: (i) an upper-body context encoder and (ii) a base standing policy.

The encoder processes upper-body joint states together with hand interaction forces and produces a latent context vector that captures the disturbance induced by bi-manual manipulation. This latent variable conditions the base policy, allowing it to adapt its lower-body control strategy according to the current upper-body loading condition.

During training, we expose the policy to diverse manipulation-induced disturbances by (i) sampling and applying external forces at the hands and (ii) randomizing upper-body target poses through an upper-body pose curriculum that gradually expands the pose range as standing quality improves.

Baselines: We consider three training variants to isolate the effect of curriculum learning and force-conditioned latent adaptation. Base trains a standing policy with the upper body held at a fixed nominal posture (no upper-body pose curriculum) and without any latent context conditioning. +Curr adds the upper-body pose curriculum, exposing the policy to continuously varying upper-body joint targets during training, but still does not provide the encoder latent. FAME combines both components: the upper-body pose curriculum and the upper-body context encoder, whose latent context conditions the standing policy for force-adaptive balance.

FAME system overview: architecture diagram and With/Without FAME comparison

FAME system overview.

Real Experiments

Real experiment results: bipedal robot under loading and disturbance conditions

Real experiment results on a full-scale humanoid.

BibTeX

@article{YourPaperKey2024,
  title={Your Paper Title Here},
  author={First Author and Second Author and Third Author},
  journal={Conference/Journal Name},
  year={2024},
  url={https://your-domain.com/your-project-page}
}