Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
LLM Design Patterns

You're reading from   LLM Design Patterns A Practical Guide to Building Robust and Efficient AI Systems

Arrow left icon
Product type Paperback
Published in May 2025
Publisher Packt
ISBN-13 9781836207030
Length 534 pages
Edition 1st Edition
Concepts
Arrow right icon
Author (1):
Arrow left icon
Ken Huang Ken Huang
Author Profile Icon Ken Huang
Ken Huang
Arrow right icon
View More author details
Toc

Table of Contents (38) Chapters Close

Preface 1. Part 1: Introduction and Data Preparation
2. Chapter 1: Introduction to LLM Design Patterns FREE CHAPTER 3. Chapter 2: Data Cleaning for LLM Training 4. Chapter 3: Data Augmentation 5. Chapter 4: Handling Large Datasets for LLM Training 6. Chapter 5: Data Versioning 7. Chapter 6: Dataset Annotation and Labeling 8. Part 2: Training and Optimization of Large Language Models
9. Chapter 7: Training Pipeline 10. Chapter 8: Hyperparameter Tuning 11. Chapter 9: Regularization 12. Chapter 10: Checkpointing and Recovery 13. Chapter 11: Fine-Tuning 14. Chapter 12: Model Pruning 15. Chapter 13: Quantization 16. Part 3: Evaluation and Interpretation of Large Language Models
17. Chapter 14: Evaluation Metrics 18. Chapter 15: Cross-Validation 19. Chapter 16: Interpretability 20. Chapter 17: Fairness and Bias Detection 21. Chapter 18: Adversarial Robustness 22. Chapter 19: Reinforcement Learning from Human Feedback 23. Part 4: Advanced Prompt Engineering Techniques
24. Chapter 20: Chain-of-Thought Prompting 25. Chapter 21: Tree-of-Thoughts Prompting 26. Chapter 22: Reasoning and Acting 27. Chapter 23: Reasoning WithOut Observation 28. Chapter 24: Reflection Techniques 29. Chapter 25: Automatic Multi-Step Reasoning and Tool Use 30. Part 5: Retrieval and Knowledge Integration in Large Language Models
31. Chapter 26: Retrieval-Augmented Generation 32. Chapter 27: Graph-Based RAG 33. Chapter 28: Advanced RAG 34. Chapter 29: Evaluating RAG Systems 35. Chapter 30: Agentic Patterns 36. Index 37. Other Books You May Enjoy

Checkpointing and Recovery

Checkpointing and recovery refer to the process of saving the state of a system, application, or model at specific intervals (checkpointing) and restoring it from a saved state in case of failure (recovery). In machine learning, checkpointing involves periodically saving model parameters, optimizer states, and training progress so that training can resume from the last checkpoint instead of starting over. This is especially useful for long-running tasks, where interruptions due to system crashes, power failures, or preempted cloud instances can otherwise result in significant losses.

Checkpointing and recovery are crucial for ensuring fault tolerance, efficiency, and reproducibility in training large-scale models. Without checkpointing, an unexpected failure could waste hours or even days of computation. Additionally, it allows for experiment reproducibility, enabling researchers to revisit and fine-tune models from intermediate states, rather than redoing...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Visually different images