Further reading
- MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers by Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou: https://arxiv.org/abs/2002.10957.
- LLaMA: Open and Efficient Foundation Language Models by Hugo Touvron, Thibaut Lavril, Gautier Lzacard, et al.: https://arxiv.org/abs/2302.13971.
- Building an ONNX Runtime package: https://onnxruntime.ai/docs/build/custom.html#custom-build-packages.
Join our community on Discord
Join our community’s Discord space for discussions with the author and other readers:
https://www.packt.link/rag