[NeurIPS 2023] Multimodal Generation for Audio and Music

참고

[1] A. Oord et al., Neural Discrete Representation Learning, NeurIPS 2017

[2] L. Liu et al., Bridging Discrete and Backpropagation: Straight-Through and Beyond, NeurIPS 2023

[3] P. Esser et al., Taming Transformers for High-Resolution Image Synthesis, CVPR 2021

[4] V. Iashin et al., Taming Visually Guided Sound Generation, BMVC 2021

[5] N. Zeghidour et al., SoundStream: An End-to-End Neural Audio Codec, arXiv:2107.03312 2021

[6] C. Wang et al., Neural Codec Language Models are Zero-Shot Text to Speech Synthesis, arXiv:2301.02111 2023

[7] A. Ramesh et al., Zero-Shot Text-to-Image Generation, ICML, 2021

[8] S. Gu et al., Vector Quantized Diffusion Model for Text-to-Image Synthesis, CVPR 2022

[9] R, Kumar et al., High-Fidelity Audio Compression with Improved RVQGAN, NeurIPS 2023

[10] J. Kong et al., HiFiGAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis, NeurIPS 2020

[11] S. Lee et al., BigVGAN: A Universal Neural Vocoder with Large-scale Training, ICLR 2023

[12] J. Copet et al., Simple and Controllable Music Generation, NeurIPS 2023

[13] A. Defossez et al., High Fidelity Neural Audio Compression, arXiv:2210.1348 2022

[14] S. Han et al, The Interface for Symbolic Music Loop Generation Conditioned on Musical Metadata, NeurIPS Workshop on ML4CD 2023