Date of Award

6-27-2025

Date Published

August 2025

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical Engineering and Computer Science

Advisor(s)

Qinru Qiu

Keywords

Embodied AI, Imitation Learning, Reinforcement Learning, Robot Learning, Robotic Manipulation

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Abstract

Embodied Artificial Intelligence (AI), which integrates physical embodiment with intelligent decision-making, is increasingly critical in advancing autonomous systems across diverse domains such as autonomous driving and robotic manipulation. This dissertation presents a comprehensive exploration of online and offline learning approaches for Embodied AI in autonomous systems, addressing both algorithmic innovations and dataset construction to overcome fundamental challenges in perception, decision-making, and control. Through five interconnected studies, we systematically advance the state of the art in deep reinforcement learning (DRL) and imitation learning for embodied control. First, we introduce CADRE, a cascade online DRL framework for vision-based autonomous urban driving that strategically decomposes complex driving tasks into perception and control subtasks. By pre-training a perception module that leverages the attention mechanisms to build inter-relationships between visual and control information, and employing distributed Proximal Policy Optimization with careful reward shaping and LSTM sequential modeling, CADRE achieves high success rates in challenging urban environments with dense traffic on the NoCrash benchmark. Second, we propose Adaptive Conservative Level in Q-Learning (ACL-QL), a flexible framework for offline DRL that enables fine-grained control of Q-function conservatism. The ACL-QL framework introduces two adaptive weight functions corresponding to out-of-distribution actions and dataset actions to dynamically shape the Q-function. Supported by detailed theoretical analysis, we implement neural networks as weight functions and construct surrogate and monotonicity losses to maintain better performance. Comprehensive experiments on standard offline DRL benchmarks demonstrate ACL-QL's state-of-the-art performance and versatility across diverse scenarios. Third, we develop a Discrete Policy approach for multi-task imitation learning with language instruction, which learns action patterns in the latent space to better disentangle feature representations across different skills. This innovative strategy enables more effective transfer learning and skill composition in robotic systems. Through extensive simulations and real-world experiments, our approach demonstrates superior performance in multi-task settings compared to various state-of-the-art methods, offering a compelling new perspective on learning multi-task policies for embodied control. Fourth, we address the challenge of learning from imperfect demonstrations through a Self-Supervised Data Filtering (SSDF) framework. By calculating accurate quality scores using pre-trained transformers and performing weighted behavior cloning on high-quality imperfect demonstrations, SSDF significantly improves policy learning without requiring reward information or online exploration. The extensive experimental results in both simulation and real-world applications confirm SSDF's ability to accurately select high-quality demonstrations from imperfect datasets, substantially boosting final performance. Finally, we introduce RoboMIND, a large-scale, multi-embodiment dataset for robot manipulation. This comprehensive resource includes four distinct embodiments with high-quality demonstrations across multiple tasks, objects, and skills, collected through an intelligent data platform with rigorous quality assurance. Our quantitative analyses highlight RoboMIND's heterogeneous embodiments, diverse episode lengths, broad task coverage, and wide range of objects from domestic, industrial, kitchen, office, and retail scenarios. Experiments with popular imitation learning robot models reveal opportunities for improving accurate positioning and precise control. Together, these contributions form a cohesive exploration of learning approaches for embodied intelligence, spanning online and offline paradigms, with applications in autonomous driving and robotic manipulation. Our work establishes new methodologies and resources that address fundamental challenges in developing autonomous systems capable of robust perception, decision-making, and control in complex real-world environments.

Access

Open Access

Share

COinS