https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial6/Transformers_and_MHAttention.html https://wandb.ai/carlolepelaars/transformer_deep_dive/reports/Transformer-Deep-Dive--VmlldzozODQ4NDQ
【计算机视觉Transformer应用文献资源集】’Awesome Transformer in CV - A Survey on Transformer in CV.’ by Sanctuary GitHub: https://github.com/Yutong-Zhou-cv/Awesome-Transformer-in-CV
Transformers for Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI https://github.com/ThilinaRajapakse/simpletransformers
【Optimum:大规模Transformer优化工具包】‘Optimum’ by Hugging Face GitHub: https:// github.com/huggingface/optimum#开源##机器学习# ref:http://t.cn/A6MhfH0H
【探讨了注意力公式存在的一个小错误,该错误使得Transformer模型在压缩和部署时变得困难且不必要。提出了一个简单的解决方案,建议在注意力机制中使用一种修改过的softmax函数,以解决该问题】《Attention Is Off By One – Evan Miller》 https://www.evanmiller.org/attention-is-off-by-one.html ; https://mp.weixin.qq.com/s/cSwWapqFhxu9zafzPUeVEw
【Transformer发展文献综述,涵盖了22种模型、11种架构变化、7种预训练后技术和3种训练技术。模型包括GPT-3、GPT-4、Gopher、AlphaCode、RETRO、GPT-3.5、Chinchilla、Flamingo等。一些重要的架构变化包括多查询注意力、稀疏注意力、专家混合等。同时还介绍了RLHF、CAI、Minerva等后预训练技术以及超参数设置和采样技术等。这份文档对于了解AI发展的最新进展很有帮助】《Transformer Taxonomy (the last lit review) | kipply’s blog》 https://kipp.ly/transformer-taxonomy/ 【直观全面解释Transformer模型】《Transformers — Intuitively and Exhaustively Explained | by Daniel Warfield | Sep, 2023 | Towards Data Science》https://towardsdatascience.com/transformers-intuitively-and-exhaustively-explained-58a5c5df8dbb
‘Transformers 库快速入门教程’ GitHub: https://github.com/jsksxs360/How-to-use-Transformers