참고

[1] Fujimoto, S.; and Gu, S. S. 2021. A minimalist approach to offline reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, 20132-20145.

[2] Fujimoto, S.; Hoof, H.; and Meger, D. 2018. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, 1587-1596.

[3] Haarnoja, T.; Zhou, A.; Abbeel, P.; and Levine, S. 2018. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 1861-1870.

[4] Todorov, E.; Erez, T.; and Tassa, Y. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026-5033.

[5] Nachum, O.; Gu, S. S.; Lee, H.; and Levine, S. 2018. Data-efficient hierarchical reinforcement learning. In Advances in Neural Information Processing Systems, volume 31, 3303-3313.

[6] Kumar, V. 2016. Manipulators and manipulation in high dimensional spaces. Ph.D. thesis, University of Washington, Seattle.

[7] Brockman, G., et al. 2016. OpenAI Gym. arXiv preprint arXiv:1606.01540.

[8] Fu, J., et al. 2020. D4RL: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219.

[9] Kim, J., et al. 2025. Penalizing infeasible actions and reward scaling in reinforcement learning with offline data. In International Conference on Machine Learning, 30769-30790.

[10] Shin, Y., et al. 2025. Online pre-training for offline-to-online reinforcement learning. In International Conference on Machine Learning, 55122-55144.

[11] Li, S., et al. 2026. State proficiency-based adaptive fine-tuning for offline-to-online reinforcement learning. In AAAI Conference on Artificial Intelligence.