Generalizable Interaction Recognition for Learning from Demonstration Using Wrist and Object Trajectories

Journal

PUBLICATION DATE: 10 October, 2025

PUBLICATION AUTHOR/S: Jagannatha Charjee Pyaraka, Mats Isaksson, John McCormick , Sheila Sutjipto and Fouad Sukkar

Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition in LfD that encodes human–object interactions using 2D wrist trajectories and 3D object poses. A lightweight extraction pipeline combines MediaPipe-based wrist tracking with FoundationPose-based 6-DoF object estimation to obtain these trajectories directly from RGB-D video without specialized sensors or heavy preprocessing. Experiments on the GRAB and FPHA datasets show that the representation effectively captures task-relevant interactions, achieving 94.6% accuracy on GRAB and 96.0% on FPHA with well-calibrated probability predictions. Both Bidirectional Long Short-Term Memory (Bi-LSTM) with attention and Transformer architectures deliver consistent performance, confirming robustness and generalizability. The method achieves sub-second inference, a memory footprint under 1 GB, and reliable operation on both GPU and CPU platforms, enabling deployment on edge devices such as NVIDIA Jetson. By bridging pose-based and object-centric paradigms, this approach offers a compact and efficient foundation for scalable robot learning while preserving essential spatiotemporal dynamics.

RELATED PROGRAM/S:
Biomimic Cobots

Publication link

View all publications

Publication

Generalizable Interaction Recognition for Learning from Demonstration Using Wrist and Object Trajectories

Share

Contact Us