POSTED: 31 Oct, 2025
Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition in LfD that encodes human–object interactions using 2D wrist trajectories and 3D object poses. A lightweight extraction pipeline combines MediaPipe-based wrist tracking with FoundationPose-based 6-DoF object estimation to obtain these trajectories directly from RGB-D video without specialized sensors or heavy preprocessing. Experiments on the GRAB and FPHA datasets show that the representation effectively captures task-relevant interactions, achieving 94.6% accuracy on GRAB and 96.0% on FPHA with well-calibrated probability predictions. Both Bidirectional Long Short-Term Memory (Bi-LSTM) with attention and Transformer architectures deliver consistent performance, confirming robustness and generalizability. The method achieves sub-second inference, a memory footprint under 1 GB, and reliable operation on both GPU and CPU platforms, enabling deployment on edge devices such as NVIDIA Jetson. By bridging pose-based and object-centric paradigms, this approach offers a compact and efficient foundation for scalable robot learning while preserving essential spatiotemporal dynamics.
Recent News
ARTICLE: Making Cobots Ready-to-Hand: A Compliance Perspective
Written by Katia Bourahmoune, UTS & Acting Co-Lead Quality Assurance and Compliance program Heidegger describes an equipment as ready-to-hand w ...
ARTICLE: The Art of Mechamimicry: Designing Prototyping Tools for Human-Robot Interaction
When we think about robotics, especially in high, stakes contexts like surgery, we often imagine advanced machines, complex algorithms, and high, tech ...
PhD Research Spotlight: Yuan Liu Enhancing Human-Robot Collaboration Through Augmented and Virtual Reality
Integrating collaborative robots (cobots) into human workspaces demands more than just technical precision, it requires human-centered design. PhD res ...