Embodied AI (2/2)
DO: JvF25/3-303 | BN: b-it/1.047
Anticipation by Prof. Jürgen Gall
This lecture will give a brief introduction on anticipating future actions from videos.
Vision-Language Action models for Cognitive Robots by Prof. Sven Behnke
This lecture introduces Vision-Language-Action (VLA) models as a unifying framework for cognitive robots capable of grounding perception, language understanding, and physical interaction. We examine how modern VLA architectures integrate multimodal representations to interpret visual scenes, follow natural-language instructions, and generate executable action plans in real time. Key topics include multimodal transformers, affordance grounding, task decomposition, action policy learning, and bridging high-level semantic reasoning with low-level robot control. Through examples from state-of-the-art research and robot demonstrations, students will gain insight into how VLA models enable adaptive, generalizable, and human-aligned robotic cognition.
Vanessa Faber & Brendan Balcerak Jackson