[1] Fujimoto, S.; and Gu, S. S. 2021. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, 20132-20145.
[2] Fujimoto, S.; Hoof, H.; and Meger, D. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 1587-1596.
[3] Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 1861-1870.
[4] Todorov, E.; Erez, T.; and Tassa, Y. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026-5033.
[5] Nachum, O.; Gu, S. S.; Lee, H.; and Levine, S. 2018. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, volume 31, 3303-3313.
[6] Kumar, V. 2016. Manipulators and manipulation in high dimensional spaces. Ph.D. thesis, University of Washington, Seattle.
[7] Brockman, G., et al. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540.
[8] Fu, J., et al. 2020. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219.
[9] Kim, J., et al. 2025. Penalizing infeasible actions and reward scaling in reinforcement learning with offline data. In International Conference on Machine Learning, 30769-30790.
[10] Shin, Y., et al. 2025. Online pre-training for offline-to-online reinforcement learning. In International Conference on Machine Learning, 55122-55144.
[11] Li, S., et al. 2026. State proficiency-based adaptive fine-tuning for offline-to-online reinforcement learning. In AAAI Conference on Artificial Intelligence.