Enhancing Embodied Object Detection with Spatial Feature Memory
PUBLICATION DATE: 4 April, 2025 PUBLICATION AUTHOR/S: Nicolas Harvey Chapman; Christopher Lehnert; Will Browne; Feras DayoubDeep-learning and large scale language-image training have produced image object detectors that generalise well to diverse environments and semantic classes. However, existing object detection paradigms are not optimally tailored for the embodied conditions inherent in robotics, where the same objects are repeatedly observed over time. In this setting, detectors that operate on single images or short sequences are likely to produce inconsistent predictions. Motivated by this, we explore if the embodiment of the detector can be utilised to generate more consistent and reliable detections during repeat observation of a scene. We propose a novel framework that incrementally updates a spatial feature memory while using it as a prior to perform image object detection. By leveraging the embodiment of the robot in this way, raw object detection performance is enhanced by up to 4.12 mAP and downstream robotic tasks such as semantic mapping and object recall are improved. We also investigate the structure this spatial memory should take, leading to an implementation that aggregates features from the shared language-image embedding space. This approach allows the detector to effectively balance the use of memory and image features, while ensuring that the benefits of language-image pre-training can be enjoyed alongside our spatial memory.
RELATED PROGRAM/S:Biomimic Cobots Publication link
View all publications