Researchers at the University of California, Berkeley, have introduced a novel AI method called Latent Code Bridges (LCB). This approach combines the advantages of modular hierarchy and end-to-end learning, utilizing latent code bridging for high-level reasoning and low-level action strategies. The method has achieved significant results in language-guided robotic control tasks.
Paper Introduction
The field of robotics has oscillated between two primary architectural paradigms: modular hierarchical strategies and end-to-end strategies. Modular hierarchical structures employ rigid layers such as symbolic planning, trajectory generation, and tracking, while end-to-end strategies utilize high-capacity neural networks to map sensory input directly to actions. The emergence of large language models (LLMs) has reignited interest in hierarchical control architectures. Recent research has utilized LLMs to replace symbolic planners, accomplishing significant feats such as mobile object rearrangement based on open vocabulary instructions. However, hierarchical architectures still face challenges in defining control primitives and establishing inter-layer interfaces, particularly in coordinating a variety of humanoid movements beyond semantic action verbs.
The rise of LLMs has sparked interest in their application within robotics, especially in hierarchical control architectures. Previous studies have demonstrated that LLMs can be employed for high-level reasoning through methods such as few-shot prompting, functional encoding, and human interaction through language. Integrating LLMs into task planning and reasoning requires invoking low-level skills, which can be achieved through language-conditioned strategies. Moreover, the trend of reusing large models initially trained for visual or language tasks for robotic applications is becoming increasingly apparent.
The LCB Architecture
Researchers at UC Berkeley introduced Latent Codes as Bridges (LCB), a robust strategy architecture for control. LCB merges the advantages of modular hierarchical architectures with end-to-end learning. It allows for the direct use of LLMs for high-level reasoning while employing pre-trained skills for low-level control, enhanced through end-to-end learning. By introducing tokens at the interface layer to modulate low-level strategies, LCB overcomes the limitations of relying solely on language, which can be challenging in describing certain behaviors. Furthermore, by using separate tokens, LCB retains the core language generation and reasoning capabilities of LLMs during fine-tuning.
The proposed architecture integrates the benefits of modular hierarchical structures and end-to-end learning. It employs an additional latent code to connect high-level reasoning and low-level language-conditioned strategies, preserving abstract goals and the language embedding space. This approach addresses the limitations of existing methods, providing greater flexibility and retention of linguistic understanding during fine-tuning. The architecture includes a pre-trained multimodal LLM and a pre-trained strategy, facilitating multimodal understanding and action output based on environmental observations and conditional latents. Data processing involves generating dialogue-style interactions to train the model to execute language-guided actions.
Experimental Results
Experiments conducted on the Language Table and CALVIN benchmarks indicate that LCB outperforms baseline models in tasks requiring reasoning and multi-step behaviors, including those using the GPT-4V baseline. The integration of LCB with visual language models enhances task performance by effectively extracting features.
Summary
This work presents LCB as a robust method that integrates large language model reasoning with low-level action strategies. Unlike previous methods, LCB seamlessly integrates these functionalities through learned latent interfaces. Evaluations in the Language Table and CALVIN benchmarks demonstrate that LCB can proficiently interpret and execute various reasoning and long-term tasks. The hierarchical flexibility achieved by LCB holds potential for practical applications in robotics.
Paper Download
Download the paper here
Project Link
Visit the project page