Pre-training optimizations
Optimizations take effect from the very beginning of the LLM lifecycle. The pre-training step involves the largest amount of data and is impacted by architectural aspects of the model: its size (parameters), shape (width and depth), and so on. In this section, we will begin by understanding the impact and possible improvements we can achieve related to datasets and later present techniques to bring in optimizations from an architectural standpoint.
Data efficiency
Data efficiency in LLMs is about maximizing the quality of learning from the available data while minimizing the required dataset size and computational resources. Large datasets are costly to process, and redundant or noisy data can negatively impact model performance. Therefore, data efficiency techniques aim to achieve high model accuracy and generalization with a reduced or optimized dataset. This process includes filtering data for quality, reducing redundancy, and applying sampling...