Abstract
Maintaining balance under external hand forces is critical for humanoid bimanual manipulation, where interaction forces propagate through the kinematic chain and constrain the feasible manipulation envelope. We propose FAME, a force- adaptive reinforcement learning framework that conditions a standing policy on a learned latent context encoding upper-body joint configuration and bimanual interaction forces. During training, we apply diverse, spherically sampled 3D forces on each hand to inject disturbances in simulation together with an upper-body pose curriculum, exposing the policy to manipulation-induced perturbations across continuously vary- ing arm configurations. At deployment, interaction forces are estimated from the robot dynamics and fed to the same encoder, enabling online adaptation without wrist force/torque sensors. In simulation across five fixed arm configurations with randomized hand forces and commanded base heights, FAME improves mean standing success to 73.84%, compared to 51.40% for the curriculum-only baseline and 29.44% for the base policy. We further deploy the learned policy on a full-scale Unitree H12 humanoid and evaluate robustness in representative load-interaction scenarios, including asymmetric single-arm load and symmetric bimanual load.
FAME system overview.
Method
Our framework enables force-adaptive standing by conditioning a base policy on a learned latent representation of upper-body dynamics. The system consists of two main components: (i) an upper-body context encoder and (ii) a base standing policy.
The encoder processes upper-body joint states together with hand interaction forces and produces a latent context vector that captures the disturbance induced by bi-manual manipulation. This latent variable conditions the base policy, allowing it to adapt its lower-body control strategy according to the current upper-body loading condition.
During training, we expose the policy to diverse manipulation-induced disturbances by (i) sampling and applying external forces at the hands and (ii) randomizing upper-body target poses through an upper-body pose curriculum that gradually expands the pose range as standing quality improves.
Baselines: We consider three training variants to isolate the effect of curriculum learning and force-conditioned latent adaptation. Base trains a standing policy with the upper body held at a fixed nominal posture (no upper-body pose curriculum) and without any latent context conditioning. +Curr adds the upper-body pose curriculum, exposing the policy to continuously varying upper-body joint targets during training, but still does not provide the encoder latent. FAME combines both components: the upper-body pose curriculum and the upper-body context encoder, whose latent context conditions the standing policy for force-adaptive balance.
Real Experiments
Real experiment results on a full-scale humanoid.
BibTeX
@article{2603.08961,
title={FAME: Force-Adaptive RL for Expanding the Manipulation Envelope
of a Full-Scale Humanoid},
author={Niraj Pudasaini Yutong Zhang Jensen Lavering},
journal={preprint},
year={2026}
}