Sep 3 – 4, 2025
Hörsaalgebäude, Campus Poppelsdorf, Universität Bonn
Europe/Berlin timezone

JSON is all you need (for visual symbolic planning)

Not scheduled
1h 30m
Open Space (first floor)

Open Space (first floor)

Poster Embodied AI Poster Session

Speaker

Sami Azirar

Description

Traditional Task test and Motion Planning (TAMP) systems integrate physics simulators for motion planning with discrete symbolic models for task planning. However, because these symbolic models are not derived from data, they must be meticulously handcrafted, requiring manually designed classifiers to bridge the gap with the physics simulator. This process is both resource-intensive and constrained to the specific domain for which it was engineered, limiting scalability and adaptability. Due to their extensive training on heterogeneous data, Visual Language Models (VLMs) are well suited for TAMP-like problems in the open-world. However, they have limited real-world grounding and planning capabilities. Therefore, recent efforts have been made to integrate VLMs with classical planning for long-horizon reasoning. However, they still depend on task-specific solutions, e.g. describing all possible objects in advance, and symbolic action models. We propose a novel framework that does not require either. It leverages VLMs to retrieve symbolic representations directly from images, based on only lifted predicates. To integrate with classical planning, we extend the heuristic-free Width-Based search algorithm to handle probabilistic representations such as one generated by VLMs. We evaluate our system using the PddlGym environment and the Problem Description Generation Dataset.

Author

Presentation materials