Embodied AI

Open World Embodied Large Models

Speakers

Jiaming Liu

Date / Time

2024-10-17

17:10

Presentation Slides

Presentation Video

YouTube

Multimodal Large Language Models (MLLMs) have demonstrated potential in visual instruction following across various tasks. Recently, some studies have integrated MLLMs into robotic manipulation, allowing robots to interpret multimodal information and predict low-level actions. While MLLM-based policies have shown promising progress, they may predict failure execution poses when faced with novel tasks or categories. Given these challenges, we raise a question: “Can we develop an end-to-end robotic agent that not only possesses manipulation skills but also effectively corrects low-level failure actions?” Drawing inspiration from Daniel Kahneman's assertion that "human thinking is divided into a fast system and a slow system, which separately represent intuitive processes and more logical reasoning," we introduce a series of research works that mimic a human-like thinking paradigm to address the above question.