Skip to content

Conversation

@kolkov
Copy link
Contributor

@kolkov kolkov commented Dec 16, 2025

Summary

Refactoring to reduce cognitive complexity in hot path functions, following patterns from the Burn ML framework (Rust).

Issue: #14 by @marcelloh

Changes

TASK-101: Pre-Slice Bounds Elimination

  • Applied pre-slicing pattern in conv2d, maxpool2d, autodiff ops
  • Bounds checks moved outside hot loops

TASK-102: Stride Specialization

  • Added specialized code paths for stride=1, padding=0 case
  • Enables compiler auto-vectorization for common CNN operations

TASK-103: Parallel Execution Utilities

  • New internal/parallel package with For() and ForBatch()
  • Ready for integration into CPU backend operations

TASK-104: Autodiff Orchestration Refactor

  • Separated orchestration from computation (Burn pattern)
  • Autodiff ops now delegate to backend methods
  • New files: conv2d_backward.go, maxpool2d_backward.go

TASK-105: Flash Attention CPU Refactor

  • Complexity reduced: 111 → <30
  • Removed //nolint:gocognit directive
  • Extracted helper functions: flashAttentionScoreBlock, flashAttentionProcessQuery

Metrics

Function Before After
flashAttentionCPU 111 <30
conv2dBackwardInput* 80 delegation
im2colFloat32/64 67 with pre-slice

Testing

  • All tests pass
  • golangci-lint: 0 issues
  • No behavioral changes (pure refactor)

…coding, GGUF

TASK-062: GGUF Import Complete
- Parser: types, metadata, tensor info extraction
- Loader: tensor data loading with memory mapping
- Dequantization: K-quants (Q4_K, Q5_K, Q6_K, Q8_0)
- Converter: GGUF to Born tensor format

TASK-060: Flash Attention 2
- Week 1: CPU reference with online softmax O(N) memory
- Week 2: WebGPU WGSL shader with tiled computation
- Supports causal masking, head dims 64-256, block sizes 64/128
- GPU vs CPU validation < 1e-4 error

TASK-061: Speculative Decoding
- Draft model generates K tokens speculatively
- Target model verifies in parallel batch
- Modified rejection sampling for token acceptance
- 2-4x speedup potential for autoregressive generation

Also: Fixed 226 gosec G115 lint issues across codebase
- TASK-101: Pre-slice pattern for bounds check elimination
  - conv2d.go, maxpool2d.go: hierarchical pre-slicing
  - Reduced complexity in im2col and backward functions

- TASK-104: Autodiff orchestration refactor
  - Separate orchestration from computation (Burn pattern)
  - New: conv2d_backward.go, maxpool2d_backward.go in CPU backend
  - Autodiff conv2d.go: 409 -> 67 lines (delegation only)
  - Extended Backend interface with backward methods

- TASK-105: Flash Attention CPU refactor
  - Complexity: 111 -> <30, removed //nolint:gocognit
  - Extracted: FlashDims, FlashConfig structs
  - Helpers: flashAttentionScoreBlock, flashAttentionExtractValues
  - flashAttentionProcessQuery for single query processing

Stats: +360/-460 lines, 13 files changed
Tests: all pass, golangci-lint: 0 issues
TASK-102: Stride Specialization
- Added dispatch for stride==1, padding==0 in Conv2D forward/backward
- Specialized functions: conv2dFloat32Stride1NoPad, conv2dInputBackwardFloat32Stride1NoPad
- Enables compiler auto-vectorization for common case

TASK-103: Parallel Execution Utilities
- New package: internal/parallel
- parallel.Config, parallel.For(), parallel.ForBatch()
- Automatic fallback to sequential for small workloads
- Ready for integration into CPU backend operations

Stats: +634 lines, 4 files changed
Tests: all pass, golangci-lint: 0 issues
@kolkov kolkov mentioned this pull request Dec 16, 2025
@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 27.73109% with 430 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/backend/cpu/conv2d_backward.go 0.00% 296 Missing ⚠️
internal/backend/cpu/maxpool2d_backward.go 0.00% 63 Missing ⚠️
internal/backend/cpu/conv2d.go 50.00% 59 Missing ⚠️
internal/autodiff/autodiff.go 0.00% 6 Missing ⚠️
internal/tensor/mock.go 0.00% 6 Missing ⚠️

📢 Thoughts on this report? Let us know!

@kolkov kolkov merged commit 1a40bb4 into main Dec 16, 2025
9 checks passed
@kolkov kolkov deleted the refactor/issue-14-burn-patterns branch December 16, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants