LLM Challenges

The document discusses computational challenges and scaling laws related to large language models. It covers memory challenges of training large LLM models and quantization techniques to reduce memory usage. It also discusses how scaling laws show that for a fixed compute budget, increasing training data size or model size can improve model performance, but practical constraints limit feasibility.

Uploaded by

Vishnuvardhan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

LLM Challenges

Uploaded by

Vishnuvardhan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS

LLM Compute Challenges Memory Challenge

How can you reduce memory for training? How big do the models need to be?

and Scaling Laws RuntimeError : CUDA out of memory

Quantization: Decrease memory to store the
The goal is to maximize model performance.

weights of the model by converting the precision Researchers explored trade-offs between
from 32bit to 16bit or 8bit integers. the dataset size, the model size, and the
LARGE LANGUAGE MODEL CHOICE LLMs are massive and require plenty of memory compute budget:
for training and inference.
FP32 space
Increasing compute may seem ideal for better
Generative AI Project Lifecycle To load the model into GPU RAM: performance, but practical constraints like
3 x 10-38 0.0 3 x 1038 hardware, time, and budget limit its feasibility.
1 parameter (32-bit precision) = 4 bytes needed
Adapt
(prompt App integration 1B parameters = 4 x 109 bytes = 4GB of GPU
Use case engineering, (model
Model
definition Selection fine tuning), optimization, Constraint
& scoping augment, deployment) Pre-training requires storing additional components,
and evaluate Compute budget
model beyond the model’s parameters:
FP16 | BFLOAT16 | INT8 | INT4
• Optimizer states (e.g., 2 for Adam)
Scaling choice Scaling choice
Two options for model selection • Gradients
Dataset size Model size
• Forward activations Model
Quantization maps the FP32 numbers to a lower # of tokens
performance
# of parameters
• Use a pre-trained LLM. • Temporary variables precision space by employing scaling factors
• Train your own LLM from scratch. This could result in an additional 12-20 bytes of determined from the range of the FP32 numbers.
But, in general... memory needed per model parameter.
…develop your application using a pre-trained LLM, It has been empirically shown that, as the compute
except if you work with extremely specific data In most cases, quantization strongly reduces budget remains fixed:
(i.e., medical, legal) memory requirements with a limited loss
This would mean it requires 16 GB to 24 GB of
in prediction. Fixed model size: Increasing training dataset
Hubs: Where you can browse existing models GPU memory to train a 1-billion parameter
size improves model performance.
Model Cards: List of best use cases, training details, LLM, around 4-6x the GPU RAM needed just for
BFLOAT16 is a popular alternative to FP16:
limitations on models. storing the model weights.
Fixed dataset size: Larger models
The model choice will depend on the details • Developed by Google Brain demonstrate lower test loss, indicating
of the task to carry out. • Balances memory efficiency and accuracy enhanced performance.
• Wider dynamic range
Hence, the memory needed for LLM training is: • Optimized for storage and speed in ML tasks
Model pre-training:
e.g., FLAN T5 pre-trained using BFLOAT16
Model weights are adjusted in order to minimize the Excessive for consumer hardware What’s the optimal balance?
loss of the training objective.
It requires significant computational resources, Even demanding for data center hardware Benefits of quantization:
Once scaling laws have been estimated, we can use the
(i.e., GPUs, due to high computational load). (for single processor training). Chinchilla approach, i.e., we can choose the dataset
For instance, NVIDIA A100 supports up to Less memory
size and the model size to train a compute-optimal
80GB of RAM. model, which maximizes performance for a given
Potentially better model performance
PaLM compute budget. The compute-optimal training dataset
GPT-3 YaLM GPT-2 BERT size is ~20x the number of parameters.
Higher calculation speed
Number
of parameters
540B 175B 100B 1.5B 110M

List of Architects in Mumbai
92% (13)
List of Architects in Mumbai
20 pages
Bulb Statement 2020 04 11 28551989 PDF
No ratings yet
Bulb Statement 2020 04 11 28551989 PDF
4 pages
Building LLM Powered Applications With Langchain
100% (1)
Building LLM Powered Applications With Langchain
11 pages
MIT UBS Generative AI Report - FNL
No ratings yet
MIT UBS Generative AI Report - FNL
24 pages
Interview Questions
No ratings yet
Interview Questions
5 pages
Bro Log Vars
No ratings yet
Bro Log Vars
6 pages
Electronic Controlled Lpg/Gasoline Fuel System: GM 3.0L and 4.3L Epa Compliant Engines
No ratings yet
Electronic Controlled Lpg/Gasoline Fuel System: GM 3.0L and 4.3L Epa Compliant Engines
110 pages
Acceptance Criteria For PT Materials
No ratings yet
Acceptance Criteria For PT Materials
4 pages
LLM Benchmark
No ratings yet
LLM Benchmark
21 pages
LLM Fine Tuning
No ratings yet
LLM Fine Tuning
1 page
RAG and AI Agents Simplified
No ratings yet
RAG and AI Agents Simplified
14 pages
Data For GenAI
No ratings yet
Data For GenAI
17 pages
Evolving LLOMPS For RAG
No ratings yet
Evolving LLOMPS For RAG
6 pages
Hands-On Guide to Agentic Corrective RAG-1
No ratings yet
Hands-On Guide to Agentic Corrective RAG-1
5 pages
Generative AI - 48 Hours TOC
No ratings yet
Generative AI - 48 Hours TOC
4 pages
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
100% (1)
Llama3, LangGraph and Elasticsearch - Build A Local Agent For Vector Search - Search Labs
48 pages
Guide To Evaluating LLM and RAG Systems
No ratings yet
Guide To Evaluating LLM and RAG Systems
41 pages
Mastering Chunking in RAG - Techniques and Strategies
No ratings yet
Mastering Chunking in RAG - Techniques and Strategies
12 pages
Just-an-Agent-Away-An-AI-Thesis-
100% (1)
Just-an-Agent-Away-An-AI-Thesis-
22 pages
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
No ratings yet
Parameter-Efficient Fine-Tuning Methods For Pretrained Language Models - A Critical Review and Assessment
20 pages
Agentic AI Roadmap
No ratings yet
Agentic AI Roadmap
6 pages
The New Stack and Ops For AI - LLMOps
No ratings yet
The New Stack and Ops For AI - LLMOps
12 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
Software Architecture in An AI World
No ratings yet
Software Architecture in An AI World
25 pages
What Is Agentic AI, and How Will It Change Work
No ratings yet
What Is Agentic AI, and How Will It Change Work
12 pages
Multi Agents Share
No ratings yet
Multi Agents Share
45 pages
Large Language Model (LLM) 1
100% (1)
Large Language Model (LLM) 1
17 pages
GenAI Pitfalls
No ratings yet
GenAI Pitfalls
2 pages
Multimodal RAG Systems Hands-On Guide
No ratings yet
Multimodal RAG Systems Hands-On Guide
7 pages
Analysis_on_Enhancing_Financial_Decision-making_Through_Prompt_Engineering
No ratings yet
Analysis_on_Enhancing_Financial_Decision-making_Through_Prompt_Engineering
5 pages
MasterClass Agentic AI & RAG Flyer-1
No ratings yet
MasterClass Agentic AI & RAG Flyer-1
4 pages
MLOPS
No ratings yet
MLOPS
56 pages
MM-LLMs Recent Advances in MultiModal Large Language Models
No ratings yet
MM-LLMs Recent Advances in MultiModal Large Language Models
22 pages
Elevating Customer Satisfaction With LLM-Powered Chatbots
No ratings yet
Elevating Customer Satisfaction With LLM-Powered Chatbots
18 pages
Agent Work Flows
No ratings yet
Agent Work Flows
72 pages
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
No ratings yet
Long-Context LLMs Meet RAG: Overcoming Challenges For Long Inputs in RAG
34 pages
How To Prepare For Optimal Results With Azure AI Security Copilot
No ratings yet
How To Prepare For Optimal Results With Azure AI Security Copilot
32 pages
Generative AI Database
No ratings yet
Generative AI Database
14 pages
LLM Applications
100% (1)
LLM Applications
1 page
05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data
No ratings yet
05 - Data As A Product - IBM Watsonx - Data and IBM Cloud Pak For Data
31 pages
Multi-Agentic RAG with Hugging Face Code Agents _ by Gabriele Sgroi, PhD _ Dec, 2024 _ Towards Data Science
No ratings yet
Multi-Agentic RAG with Hugging Face Code Agents _ by Gabriele Sgroi, PhD _ Dec, 2024 _ Towards Data Science
42 pages
GenAI POC - Training
No ratings yet
GenAI POC - Training
43 pages
Embeddings
No ratings yet
Embeddings
13 pages
Mckinsey Scaling Gen Ai in Banking Choosing The Best Operating Model
No ratings yet
Mckinsey Scaling Gen Ai in Banking Choosing The Best Operating Model
7 pages
INTRODUCTION TO AGENTIC AUTOMATION TRAINING
No ratings yet
INTRODUCTION TO AGENTIC AUTOMATION TRAINING
16 pages
1 - Optimize Amazon SageMaker Deployment Strategies
No ratings yet
1 - Optimize Amazon SageMaker Deployment Strategies
45 pages
Synthetic Generation of High Dimensional Dataset
No ratings yet
Synthetic Generation of High Dimensional Dataset
8 pages
List of Open Sourced Fine-Tuned Large Language Models (LLM) - by Sung Kim - Geek Culture - Mar, 2023 - Medium
No ratings yet
List of Open Sourced Fine-Tuned Large Language Models (LLM) - by Sung Kim - Geek Culture - Mar, 2023 - Medium
18 pages
IDE204 - TimeGPT Generative AI For Time Series
100% (1)
IDE204 - TimeGPT Generative AI For Time Series
36 pages
RAG Notes
No ratings yet
RAG Notes
19 pages
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
No ratings yet
Developing Retrieval Augmented Generation (RAG) Based LLM Systems From Pdfs - An Expert Report
36 pages
Comparing Generative AI Cloud Platforms_ AWS, Azure, and Google _ by Nurunnubi Talukder _ Medium
No ratings yet
Comparing Generative AI Cloud Platforms_ AWS, Azure, and Google _ by Nurunnubi Talukder _ Medium
7 pages
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
No ratings yet
Advanced Retrieval-Augmented Generation (RAG) With LangChain, LangGraph, and AI Agents - by Manoj Mukherjee - Oct, 2024 - Medium
15 pages
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
No ratings yet
LangChain_Academy_-_Introduction_to_LangGraph_-_Motivation
17 pages
Vector Databases - A Technical Primer
No ratings yet
Vector Databases - A Technical Primer
68 pages
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
No ratings yet
Building A PDF Knowledge Bot With Open-Source LLMs - A Step-by-Step Guide - Shakudo
13 pages
Guide to Fast GraphRAG
No ratings yet
Guide to Fast GraphRAG
7 pages
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
100% (1)
Building RAG-based LLM Applications For Production (Part 1) : Blog Detail
39 pages
Implementing GenAI Use Cases and Challenges
100% (2)
Implementing GenAI Use Cases and Challenges
42 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
RAG and LangChain Loading Documents Round1
No ratings yet
RAG and LangChain Loading Documents Round1
8 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
RAG Multimodal Complexe Financial Reports
No ratings yet
RAG Multimodal Complexe Financial Reports
25 pages
Model Pretraining
No ratings yet
Model Pretraining
11 pages
Jungwok Choi - tinyML Asia 2023
No ratings yet
Jungwok Choi - tinyML Asia 2023
17 pages
Fault Code: 2771 - SPN: 3226 - FMI: 9: ISX15 CM2250
No ratings yet
Fault Code: 2771 - SPN: 3226 - FMI: 9: ISX15 CM2250
16 pages
Insurance Commission Reviewer
No ratings yet
Insurance Commission Reviewer
29 pages
Overcome The Hesitant Customer: Creative Selling
No ratings yet
Overcome The Hesitant Customer: Creative Selling
4 pages
Belts and Pulleys
100% (1)
Belts and Pulleys
17 pages
CSC 208(0)
No ratings yet
CSC 208(0)
79 pages
Peugeot 2014 308 Heath Report
100% (1)
Peugeot 2014 308 Heath Report
28 pages
Fire Control and Extinguishment
No ratings yet
Fire Control and Extinguishment
22 pages
Title-15 kVA TF - T-1256
No ratings yet
Title-15 kVA TF - T-1256
10 pages
What Makes A Successful Startup Team
No ratings yet
What Makes A Successful Startup Team
5 pages
Data Sheet For Oxygen Analyser (Hi Temp) : Project: North Karanpura 3 X 660Mw STPP
No ratings yet
Data Sheet For Oxygen Analyser (Hi Temp) : Project: North Karanpura 3 X 660Mw STPP
19 pages
Rameez Documents
No ratings yet
Rameez Documents
11 pages
Haacp Flow Chart Final
No ratings yet
Haacp Flow Chart Final
1 page
P N Mukherjee
No ratings yet
P N Mukherjee
5 pages
M800-M80-E80 Series Instruction Manual Ib1501274engk
No ratings yet
M800-M80-E80 Series Instruction Manual Ib1501274engk
794 pages
Data Driven Rapid Flood Prediction Mapping With Catchmen - 2022 - Journal of Hyd
No ratings yet
Data Driven Rapid Flood Prediction Mapping With Catchmen - 2022 - Journal of Hyd
12 pages
BOA Statement IMANI BANKS Sept
No ratings yet
BOA Statement IMANI BANKS Sept
4 pages
How Many Waec Result Was Held in 2022 - Google Search
No ratings yet
How Many Waec Result Was Held in 2022 - Google Search
1 page
Notes Ga7-W7
No ratings yet
Notes Ga7-W7
3 pages
14.8.1 Packet Tracer - TCP and UDP Communications
No ratings yet
14.8.1 Packet Tracer - TCP and UDP Communications
6 pages
Basics of Accounting - 110055
No ratings yet
Basics of Accounting - 110055
158 pages
Bite302l - Database-Systems - TH - 1.0 - 71 - Bite302l - 66 Acp
No ratings yet
Bite302l - Database-Systems - TH - 1.0 - 71 - Bite302l - 66 Acp
2 pages
Induction Motor Speed Control
60% (10)
Induction Motor Speed Control
46 pages
Primary Key and Foreign Key
No ratings yet
Primary Key and Foreign Key
4 pages
New Tech May 2014-Ultrasonic Algae Control
No ratings yet
New Tech May 2014-Ultrasonic Algae Control
4 pages
Bungo Stray Dogs Most Popular Ship - Căutare Google
No ratings yet
Bungo Stray Dogs Most Popular Ship - Căutare Google
1 page

LLM Challenges

Uploaded by

LLM Challenges

Uploaded by

COMPUTATIONAL CHALLENGES QUANTIZATION SCALING LAWS

LLM Compute Challenges Memory Challenge

and Scaling Laws RuntimeError : CUDA out of memory

You might also like