AAAI Conference Demonstration (2026)
Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Jeonghye Kim(KAIST), Yongjae Shin(KAIST), Suhyun Jung, Hyundam Yoo, YoungJin Kim, Chanwoo Moon, Woohyung Lim, Soonyoung Lee, Kanghoon Lee
Abstract
Reinforcement learning (RL) has evolved beyond monolithic training, yet existing frameworks remain limited to single algorithms or simple offline-to-online transitions. We present multi-phase RL, a framework that orchestrates multiple learning phases for continual policy improvement. It enables efficient fine-tuning of pretrained policies with new data and smooth adaptation from simulation to real-world environments. To support this paradigm, we introduce RL-Studio, a platform that addresses key implementation barriers, including neural architecture mismatches, parameter transfer complexities, and experiment management overhead. It provides seamless phase orchestration, transition-point monitoring, and full experiment lineage tracking. We demonstrate the effectiveness of multi-phase RL through representative scenarios and highlight RL-Studio’s capabilities.