Releases · instructlab/training

08 Jan 19:48

RobotSail

v0.13.0

574f946

v0.13.0 - Pretraining Support & Optimizer Configuration Latest

Latest

What's New

Features

Pretraining Data Processing API (#672)
- Added new API for processing pretraining-style datasets
- Documents are now chunked by configurable block_size
- Chunks are treated as independent, fully-unmasked samples
- Updated training loop to ingest pretraining-style datasets
- Includes comprehensive test coverage (test_pretraining_data_process.py, test_pretraining_mode.py, test_pretraining_sampler.py)
AdamW Optimizer Configuration (#674)
- Exposed weight_decay, betas, and eps parameters in TrainingArgs
- Users can now tune AdamW hyperparameters through run_training() API
- Provides more control over optimizer behavior
Granite 4 Model Support (#669)
- Added support for Granite 4 models as Mixture of Experts (MoE) models in training

Bug Fixes

Process Timing Fix (#675)
- Fixed race condition where process wasn't completed by the time it was read
Variable Access Fix (#668)
- Fixed invalid variable access bug

Dependencies

Build Dependency Update (#670)
- Updated hynek build dependency

Files Changed

17 files changed with 1,642 insertions and 52 deletions:

Core training modules: data_process.py, main_ds.py, sampler.py, model.py, config.py
New test suites for pretraining functionality
Updated README with new capabilities

Full Changelog

All Changes:

574f946 Exposes API for processing pretraining data (#672)
638a753 fixes bug where process isn't completed by the time the process gets read (#675)
c495035 Expose AdamW optimizer parameters in training API (#674)
3d05302 Handle granite 4 as MoE models in training (#669)
781c36f fixes stray invalid variable access bug (#668)
529c2f7 bumps hynek build dep (#670)

Full Diff: v0.12.1...v0.13.0

Assets 6

14 Oct 20:47

Maxusmusti

v0.12.1

637afae

v0.12.1 - Granite 4 support, and adding extended env var and torchrun arg support

What's Changed

Update requirements-cuda.txt to increase liger-kernel minimum by @Maxusmusti in #659
Adds mamba-ssm[causal-conv1d] to CUDA requirements by @RobotSail in #663
Removes Numpy version cap by @RobotSail in #664
fix(torchrun): Omit empty arguments and correct nproc_per_node type by @szaher in #661

New Contributors

@szaher made their first contribution in #661

Full Changelog: v0.12.0...v0.12.1

Contributors

szaher, Maxusmusti, and RobotSail

Assets 6

17 Sep 17:10

Maxusmusti

v0.12.0

536ebfb

v0.12.0 - GPT-OSS Support

Full fine-tuning now supports gpt-oss models, alongside minor bugfixes to ensure correct loss calculations with higher gradient accumulation.

What's Changed

Disable workflow runs on forks by default by @fynnsu in #632
Adding GPT OSS Support by @Maxusmusti in #646
Update numpy from <2.0 to <2.3 by @Maxusmusti in #656
Add kernels>0.9.0 to CUDA requirements by @Maxusmusti in #658

Full Changelog: v0.11.1...v0.12.0

Contributors

Maxusmusti and fynnsu

Assets 6

05 Aug 19:34

Maxusmusti

v0.11.1

bfd0d73

v0.11.1

What's Changed

Add general logging implementation by @fynnsu in #500
docs: add CI documentation by @nathan-weinberg in #555
fix: Use default torch timeout for nccl watchdog unless overridden by @booxter in #521
fix: Fix markdown-lint violations by @booxter in #559
ci: add 3.12 smoke workflow flavor by @booxter in #535
adds barriers after checkpoint saving by @JamesKunstle in #566
ci: Fix smoke failures due to pre not available in local actions by @booxter in #565
Checkout correct branch on pull_request_target trigger by @fynnsu in #549
Logging Fixes & Enhancements by @RobotSail in #571
docs: Remove badge for a no longer existing job by @booxter in #542
uses __name__ in logging.getLogger by @JamesKunstle in #573
ci: stop reporting results to slack by @ktdreyer in #574
CI: Constrain all dependencies; introduce a Monday workflow to update pins by @booxter in #558
ci: Run jobs on constraints-dev.txt change by @booxter in #580
chore: update constraints-dev.txt (2025-05-30) by @courtneypacheco in #579
remove old Deepspeed-native code by @JamesKunstle in #567
add DCO.txt by @ktdreyer in #588
ci: Disable dependabot for pip dependencies by @booxter in #587
feat: refactor main_ds.py (1/n) Model class by @cdoern in #572
ci: do not require DCO job by @ktdreyer in #595
'granite-3.3-2b-instruct' for smoketest; smaller smoke dataset by @JamesKunstle in #590
fixes unit tests requiring cuda by @JamesKunstle in #586
chore: update constraints-dev.txt (2025-06-02) by @courtneypacheco in #584
ci: Cover more test dependencies with pins by @booxter in #581
ci: Introduce python 3.12 e2e large job flavor by @booxter in #563
Implicit distributed backend selection by @booxter in #516
ci: Fix incorrect indent in workflow steps by @booxter in #599
feat: refactor main_ds.py (2/n) Accelerator class by @cdoern in #594
chore: update constraints-dev.txt (2025-06-09) by @courtneypacheco in #602
feat: add medium e2e CI job for each PR by @cdoern in #551
test: fix e2e target by @cdoern in #610
chore: update constraints-dev.txt (2025-06-16) by @courtneypacheco in #612
Remove Dolomite support by @booxter in #616
Revert "test: fix e2e target" by @bbrowning in #620
ci: Remove harden-runner steps from jobs by @booxter in #617
test: disable per-PR test by @cdoern in #631
fix edge case for qwen3 data processing by @RobotSail in #626
uncap accelerate in requirements-cuda.txt by @ktdreyer in #628
chore: update constraints-dev.txt (2025-06-30) by @courtneypacheco in #623
Fix a mistake in formatting a floating-point value by @mtake in #639
Add a tutorial for fine-tuning and interpolation by @mtake in #640

New Contributors

@bbrowning made their first contribution in #620
@mtake made their first contribution in #639

Full Changelog: v0.11...v0.11.1

Contributors

bbrowning, booxter, and 8 other contributors

Assets 6

07 Jul 13:33

cdoern

v0.10.4

0cc2e30

v0.10.4

What's Changed

uncap accelerate in requirements-cuda.txt (backport #628) by @mergify in #634

Full Changelog: v0.10.3...v0.10.4

Contributors

mergify

Assets 6

08 May 19:50

cdoern

v0.10.3

40e1e8c

v0.10.3

What's Changed

moves deepspeed requirements into their own file; add deepspeed extras (backport #455) by @mergify in #546

Full Changelog: v0.10.2...v0.10.3

Contributors

mergify

Assets 6

08 May 19:23

JamesKunstle

v0.11

e8eb284

v0.11

What's Changed

ci: Remove workflow that doesn't utilize training library (medium, -mp) by @booxter in #478
Obey the FSDP sharding option default by @Maxusmusti in #486
Change default internal sharding strategy to HYBRID_SHARD by @Maxusmusti in #488
chore: Update the large e2e job to use fallback logic for selecting EC2 instances by @courtneypacheco in #491
moves deepspeed requirements into their own file; add deepspeed extras by @JamesKunstle in #455
chore: introduce dummy workflow by @cdoern in #497
ci: Search for necessary instance for smoke job in multiple AZs by @booxter in #481
ci: Fix -sdk fake workflow failure on actionlint by @booxter in #501
build(deps): Bump actions/setup-python from 5.5.0 to 5.6.0 by @dependabot in #493
use instructlab constraints-dev.txt in e2e test by @ktdreyer in #499
build(deps): Bump step-security/harden-runner from 2.11.1 to 2.12.0 by @dependabot in #490
ci: Use tox-current-env to reuse prepared venv with torch by @booxter in #482
fix: extend nccl timeout by @cdoern in #507
always log storage by @RobotSail in #510
deps: Remove caps on ROCm dependencies by @courtneypacheco in #517
ci: don't trigger pull_request_target job on its own workflow by @booxter in #519
Enable pylint 'unused-argument' check by @fynnsu in #528

New Contributors

@ktdreyer made their first contribution in #499
@fynnsu made their first contribution in #528

Full Changelog: v0.10.0...v0.11

Contributors

booxter, ktdreyer, and 7 other contributors

Assets 6

01 May 14:49

courtneypacheco

v0.10.2

a9a69e9

v0.10.2 - Remove ROCm dependency caps

What's Changed

deps: Remove caps on ROCm dependencies (backport #517) by @mergify in #518

Full Changelog: v0.10.1...v0.10.2

Contributors

mergify

Assets 6

21 Apr 20:57

Maxusmusti

v0.10.1

a4d52a5

v0.10.1 - Updating Default FSDP Sharding

What's Changed

ci: Remove workflow that doesn't utilize training library (medium, -mp) by @booxter in #478
Obey the FSDP sharding option default (backport #486) by @mergify in #487
Change default internal sharding strategy to HYBRID_SHARD (backport #488) by @mergify in #489

Full Changelog: v0.10.0...v0.10.1

Contributors

booxter and mergify

Assets 6

17 Apr 21:26

Maxusmusti

v0.10.0

be01c2c

v0.10.0 - Updated FSDP Mixed Precision and Liger Kernel Model Option Support

What's Changed

disables e2e-nvidia-l4-x1 test by @JamesKunstle in #454
ci: Fix unit test run due to no tests found to execute by @booxter in #466
ci: Don't run smoke tests when only irrelevant files are touched by @booxter in #460
ci: don't waste ec2 resources on unit tests by @booxter in #464
ci: Trigger unit test run on tox.ini change by @booxter in #469
ci: Fix path filter for unit tests for the workflow file by @booxter in #461
chore: Don't install pytest dependencies for coverage reports by @booxter in #468
chore: Remove spell checks from the repo by @booxter in #458
chore: Don't set ec2_runner_variant for unit tests by @booxter in #475
Remove CHANGELOG.md by @booxter in #457
Fix FSDP mixed precision setting and loss w/ accelerate by @Maxusmusti in #465
fixes non-granite model instantiation with Liger Kernel by @JamesKunstle in #476
ci: Install torch before flash-attn by @booxter in #474
ci: Use pull_request as trigger for unit tests by @booxter in #473
ci: Run unit tests for all supported python version, 3.11+ by @booxter in #472
chore: Require python3.11+ by @booxter in #470
chore: Drop pytest-asyncio by @booxter in #467
chore: don't trigger unit tests for cuda and rocm requirements changes by @booxter in #463
build(deps): Bump step-security/harden-runner from 2.10.4 to 2.11.1 by @dependabot in #452
build(deps): Bump machulav/ec2-github-runner from 2.3.8 to 2.3.9 by @dependabot in #450
build(deps): Bump aws-actions/configure-aws-credentials from 4.0.2 to 4.1.0 by @dependabot in #451

Full Changelog: v0.9.0...v0.10.0

Contributors

booxter, Maxusmusti, and 2 other contributors

Assets 6

Releases: instructlab/training

v0.13.0 - Pretraining Support & Optimizer Configuration

What's New

Features

Bug Fixes

Dependencies

Files Changed

Full Changelog

Uh oh!

v0.12.1 - Granite 4 support, and adding extended env var and torchrun arg support

What's Changed

New Contributors

Contributors

Uh oh!

v0.12.0 - GPT-OSS Support

What's Changed

Contributors

Uh oh!

v0.11.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.10.4

What's Changed

Contributors

Uh oh!

v0.10.3

v0.10.3

What's Changed

Contributors

Uh oh!

v0.11

What's Changed

New Contributors

Contributors

Uh oh!

v0.10.2 - Remove ROCm dependency caps

What's Changed

Contributors

Uh oh!

v0.10.1 - Updating Default FSDP Sharding

What's Changed

Contributors

Uh oh!

v0.10.0 - Updated FSDP Mixed Precision and Liger Kernel Model Option Support

What's Changed

Contributors

Uh oh!