[1] Yuhang Wang, Yanxu Zhu, Chao Kong, Shuyu Wei, Xiaoyuan Yi, Xing Xie, and Jitao Sang. 2024. CDEval: A Benchmark for Measuring the Cultural Dimensions of Large Language Models. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 1?16, Bangkok, Thailand. Association for Computational Linguistics.
[2] Jiho Jin, Jiseon Kim, Nayeon Lee, Haneul Yoo, Alice Oh, and Hwaran Lee. 2024. KoBBQ: Korean Bias Benchmark for Question Answering. Transactions of the Association for Computational Linguistics, 12:507?524.[3] Yuu Jinnai. 2024. Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 48?64, Bangkok, Thailand. Association for Computational Linguistics.
[4] Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Gorge, Akbar Karimi, Joan Plepi, Nazia Mowmita, Nicolas Flores-Herr, Mehdi Ali, and Lucie Flek. 2024. Do Multilingual Large Language Models Mitigate Stereotype Bias?. In Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 65?83, Bangkok, Thailand. Association for Computational Linguistics.
[5] Jaimeen Ahn and Alice Oh. 2021. Mitigating Language-Dependent Ethnic Bias in BERT. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 533?549, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
[6] Sharon Levy, Neha John, Ling Liu, Yogarshi Vyas, Jie Ma, Yoshinari Fujinuma, Miguel Ballesteros, Vittorio Castelli, and Dan Roth. 2023. Comparing Biases and the Impact of Multilingual Training across Multiple Languages. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Process- ing, pages 10260?10280, Singapore. Association for Computational Linguistics.
[7] Bangzhao Shu, Lechen Zhang, Minje Choi, Lavinia Dunagan, Lajanugen Logeswaran, Moontae Lee, Dallas Card, and David Jurgens. 2024. You don’t need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5263?5281, Mexico City, Mexico. Association for Computational Linguistics.
[8] Siqi Shen, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Soujanya Poria, and Rada Mihalcea. 2024. Understanding the Capabilities and Limitations of Large Language Models for Cultural Commonsense. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 5668?5680, Mexico City, Mexico. Association for Computational Linguistics.