Emerging trends and research areas
We have covered quite a bit of ground in this chapter so far; before we close, let us quickly touch upon a few emerging trends specifically aimed toward bringing improvements and efficiencies.
Alternate architectures
Earlier in the chapter, we covered a number of variations of the transformer architecture that make use of different tricks and techniques to bring in efficiencies. Mamba32, 33 and RWKV34 are two alternate architectures developed from the ground up and are aimed at solving bottlenecks with transformer architectures while maintaining their immensely powerful characteristics.
Mamba is a Selective State Space Model (SSM or S4) that improves over transformer architectures while scaling linearly in sequence length. SSMs are designed to selectively identify and focus on the most relevant parts of the input sequence as compared to transformers and traditional SSMs that process all inputs uniformly. They combine the best elements...