[1] Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems 30 (2017).
[2] Zhang, Hang, et al. "Poolingformer: Long document modeling with pooling attention." International Conference on Machine Learning, 2021.
[3] Xiong, Yunyang, et al. "Nystromformer: A nystrom-based algorithm for approximating self-attention." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35. No. 16. 2021.
[4] Zhai, Shuangfei, et al. "An attention free transformer." arXiv preprint arXiv:2105.14103 (2021).
[5] Liu, Shizhan, et al. "Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting." International Conference on Learning Representations, 2022.
[6] Qin, Zhen, et al. "cosFormer: Rethinking Softmax in Attention." International Conference on Learning Representations, 2022.
[7] Tan, Chao-Hong, et al. "PoNet: Pooling Network for Efficient Token Mixing in Long Sequences." International Conference on Learning Representations, 2022.