Managing context windows
When it comes to STM (or working memory), the concept of a context window is pivotal. It defines the maximum span of text—measured in tokens—that a model can process simultaneously. This window allows the model to “remember” and utilize a specific segment of information when generating responses so that the user doesn’t have to repeat contextual information.
Despite the increase in the maximum number of tokens that the latest LLMs can handle (up to 128K tokens for models such as GPT-4o), managing context windows presents challenges. When the input exceeds the model’s context window, the model may struggle to maintain coherence and relevance, as it cannot access earlier parts of the text. This limitation can lead to outputs that lack context or continuity, especially in tasks requiring the processing of extensive documents or prolonged dialogues.
Consequently, we need to properly design the handling of context...