CEI: A Unified Interface for Cross-Embodiment Visuomotor Policy Learning

Abstract

Robotic foundation models trained on large-scale manipulation datasets have shown promise in learning generalist policies, but they often overfit to specific viewpoints, robot arms, and especially parallel-jaw grippers due to dataset biases. To address this limitation, we propose Cross-Embodiment Interface (CEI), a framework for cross-embodiment learning that enables the transfer of demonstrations across different robot arm and end-effector morphologies. CEI introduces the concept of functional similarity, which is quantified using Directional Chamfer Distance. Then it aligns robot trajectories through gradient-based optimization, followed by synthesizing observations and actions for unseen robot arms and end-effectors. In experiments, CEI transfers data and policies from a Franka Panda robot to 16 different embodiments across 3 tasks in simulation, and supports bidirectional transfer between a UR5+AG95 gripper robot and a UR5+Xhand robot across 6 real-world tasks, achieving an average transfer ratio of 82.4%. Finally, we demonstrate that CEI can also be extended with spatial generalization and multimodal motion generation capabilities using our proposed techniques.

Method

CEI leverages a novel notion of functional similarity, which captures shared object interaction behaviors across different end-effectors, to align demonstrations from a source embodiment to a target embodiment. This is accomplished by quantifying functional similarity using the Directional Chamfer Distance between manually selected functional representations, aligning trajectories via gradient-based optimization, and synthesizing observations and actions for the target robot.

Cross-Embodiment Transfer in Simulation

Here we demonstrate the cross-embodiment transfer capability of CEI in simulation. In 3 different tasks, the source demonstrations in the left column can be transferred to 16 target embodiments shown in the right column.

Evaluation Videos (👆Click to Select !)

Source Embodiment

Target Embodiment

Quantitative Results

Success rates across different embodiments — **Table I:** Success rates of ***CEI*** across the 16 different embodiment combinations.

These results indicate that despite variations in kinematics and morphology, CEI is capable of bridging the cross-embodiment gap by leveraging functional similarity. We further observe that the difficulty of cross-embodiment transfer increases with the complexity and dexterity requirements of the task.

Ablation Study on Cross-Embodiment Techniques

Ablation study on functional similarity — **Table II:** Ablation study on trajectory alignment across tasks and embodiments.

The results show that CEI without Directional Chamfer Distance achieves an average success rate of only 32%, only half of CEI. BMS completely failed in the PickCube and StackCube tasks, as it is challenging to manually determine optimal open and close poses, and the linear interpolation often leads to unstable grasps. Moreover, although BMS constrains the target embodiment to an opening degree similar to that of the source, discrepancies between the two end-effectors (e.g. the distance from grasp point to end-effector frame) result in frequent failures

Ablation Study on Functional Representations

Sensitivity Analysis of the Functional representations — **Table III:** Sensitivity analysis of the functional representations.

We found that although we select three different functional representations, their success rates remain comparable, suggesting that CEI is robust to such variations and exhibits low sensitivity to the choice of functional representation.

Ablation Study on Observation Synthesis

Table IV presents the policy evaluation results using synthesized cross-embodiment data. Policies trained without any augmentation fail to complete the tasks, demonstrating the necessity of targeted data augmentation for cross-embodiment generalization. Additionally, removing Inference Augmentation results in a 22% drop in success rate.

Bidirectional Transfer in Real World

We evaluate bidirectional transfer between the AG95 gripper and Xhand on 6 real-world tasks: PushCube, OpenDrawer, PlaceBird, PickCup, PackageBread, and InsertFlower. For the first three tasks, we collect 25 AG95 demonstrations and transfer to Xhand; for the latter three, we collect 25 Xhand demonstrations and transfer to AG95. DP3 policies are trained on CEI-generated data and evaluated over 10 trials per task.

Generated Data Visualization (👆Click to Select !)

Below are examples of data generated by CEI for different tasks. Select a task to view the corresponding visualization.

Source Data

Target Data

Qualitative Evaluation (👆Click to Select !)