LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
Prasanna Mayilvahanan * 1 2 3 4 Thaddäus Wiedemer * 1 2 3 4 Sayak Mallick 1 4 Matthias Bethge 2 3 4
Wieland Brendel 1 2 3
Abstract 1 T X X Y T 1 T X X 8 H F Q N S L
Scaling laws guide the development of large lan-
arXiv:2502.12120v1 [[Link]] 17 Feb 2025
guage models (LLMs) by offering estimates for
- J Q Q F 8 \ F L 9 J X Y 1 T X X
the optimal balance of model size, tokens, and
compute. More recently, loss-to-loss scaling laws
that relate losses across pretraining datasets and
downstream tasks have emerged as a powerful ^ "