Speaker
Description
Service robots operating in cluttered human environments such as homes, offices, and schools cannot rely on predefined object arrangements and must continuously update their semantic and spatial estimates while dealing with possible frequent rearrangement. Identifying all objects in cluttered, occlusion-heavy environments, such as shelves, requires selecting informative viewpoints and performing targeted manipulations to reduce uncertainty regarding object locations, shapes, and categories. We present a unified and manipulation-enhanced, semantic mapping framework that addresses this challenge as a partially observable Markov decision process (POMDP), whose high-dimensional belief is represented by an evidential, metric-semantic grid map. To efficiently reason about occlusions, and manipulation effects, we propose Calibrated Neural-Accelerated Belief Updates (CNABUs): a neural network–based belief propagation model that produces confidence-calibrated predictions for unknown areas. Uncertainty estimates from Dirichlet distributions (for semantic predictions) and Beta distributions (for occupancy) guide active sensor placement via reinforcement learning–based next-best view planning and object manipulation via an uncertainty-informed push strategy targeting occlusion-critical objects. By focusing on areas of limited knowledge and selecting actions with high expected information gain, our method minimizes unwanted object displacement and dropping. Our planner substantially improves map completeness and accuracy compared to existing approaches while reducing planning time by 95%. Our approach successfully transfers to real-world cluttered shelves in a zero-shot fashion, demonstrating its robust real-world applicability. This work has been accepted for the Robotics Science and Systems Conference 2025 and an extension is currently under submission for HUMANOIDS 2025.