Reinforcement Learning-Based Vision-Language-Action Models for Robotic Manipulation
Computer Science and Artificial Intelligence
Supervisors
Dr. Alap Kshirsagar
Dr. Kaushal Kumar Maurya
Prof.Rohan Paul
Project Description
Robotic manipulation in unstructured environments requires both high-level semantic understanding and adaptive control under contact-rich, dynamic conditions. While Vision-Language models provide strong priors for task understanding, their direct deployment in robotic systems is limited by poor grounding in physical interaction and lack of online adaptability. In particular, Vision-Language-Action (VLA) policies often struggle with distribution shifts, sparse feedback, and sequential task learning in real-world settings.
This Ph.D. project aims to develop reinforcement learning-based methods for the continual adaptation of VLA models through online interaction. The central focus is on leveraging language not only as a task specification but also as a structured signal for guiding policy improvement and reducing reliance on manually designed rewards. The project will address key challenges of sample efficiency and catastrophic forgetting by developing data-efficient learning strategies and mechanisms to retain previously acquired skills during manipulation tasks.
The research will be validated on robotic platforms in contact-rich scenarios such as grasping, insertion, tool use, and object reconfiguration, with an emphasis on sustained adaptation and generalization. The expected outcome is a unified framework for reinforcement learning in VLA systems that enables efficient online learning while preserving prior knowledge, advancing the deployment of robust robotic manipulators in real-world environments.
Background Required
Bachelor's and/or Master's degree in Robotics, Computer Science, Artificial Intelligence, or related fields. A strong interest in machine learning and reinforcement learning is desirable. Experience with deep learning frameworks or robotic systems is beneficial.