참고
[1] Bridging the Gap between Classification and Localization for Weakly Supervised Object Localization, Eunji Kim, Siwon Kim, Jungbeom Lee, Hyunwoo Kim, Sungroh Yoon, CVPR 2022. https://arxiv.org/abs/2204.00220
[2] Perception Prioritized Training of Diffusion Models (Jooyoung Choi, Jungbeom Lee, Chaehun Shin, Sungwon Kim, Hyunwoo Kim, Sungroh Yoon), CVPR 2022, https://arxiv.org/abs/2204.00227
[3] VISOLO: Grid-Based Space-Time Aggregation for Efficient Online Video Instance Segmentation Su Ho Han, Sukjun Hwang, Seoung Wug Oh, Yeonchool Park, Hyunwoo Kim, Min-Jung Kim, Seon Joo Kim, CVPR 2022, https://arxiv.org/abs/2112.04177
[4] L-Verse: Bidirectional Generation Between Image and Text
[5] Instance-wise Occlusion and Depth Orders in Natural Scenes
[6] MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection
[7] Hierarchical Text-Conditional Image Generation with CLIP Latents, Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen, arXiv:2204.06125, 2022, https://arxiv.org/abs/2204.06125
[8] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, Mohammad Norouzi, arXiv:2205.11487, 2022, https://arxiv.org/abs/2205.11487