chuk-math-gym

A verifiable reasoning and tool-use training environment for mathematical problem-solving.

Overview

chuk-math-gym provides a Gymnasium-style environment for training LLM agents on mathematical reasoning with verifiable rewards. It generates problems with deterministic seeding, produces machine-checkable solution traces, and provides local verification with partial credit.

Key Features

Problem Generation: Deterministic seeding for reproducible problem sets
Solution Traces: Step-by-step traces with machine-checkable verification
Partial Credit: Granular scoring based on correct intermediate steps
Gym-style Interface: Standard reset()/step() RL environment API
Multiple Domains: Arithmetic, fractions, and linear equations
Curriculum Learning: Adaptive difficulty scheduling strategies

Installation

Using uv (recommended)

pip install uv
uv sync --reinstall

Using pip

pip install -r requirements.txt

Quick Start

Using the Gym Environment

from chuk_math_gym import MathGymEnv, DomainType, DifficultyLevel

# Create environment for arithmetic problems
env = MathGymEnv(
    domain=DomainType.ARITHMETIC,
    difficulty=DifficultyLevel.EASY
)

# Reset to get a new problem
problem = env.reset(seed=42)
print(f"Problem: {problem.prompt}")
print(f"Expression: {problem.expression}")

# Submit an answer
result = env.step("42")
print(f"Correct: {result.correct}")
print(f"Score: {result.score}")

Domain-Specific Environments

from chuk_math_gym.domains.arithmetic import ArithmeticEnv
from chuk_math_gym.domains.fractions import FractionsEnv
from chuk_math_gym.domains.linear_equations import LinearEquationsEnv

# Arithmetic environment
arith_env = ArithmeticEnv(difficulty=DifficultyLevel.MEDIUM)
problem = arith_env.reset(seed=123)

# Fractions environment
frac_env = FractionsEnv(difficulty=DifficultyLevel.EASY)
problem = frac_env.reset(seed=456)

# Linear equations environment
eq_env = LinearEquationsEnv(difficulty=DifficultyLevel.HARD)
problem = eq_env.reset(seed=789)

Verifying Solutions

from chuk_math_gym.verifiers.arithmetic import ArithmeticVerifier
from chuk_math_gym.schemas.problem import Problem, DomainType, DifficultyLevel

verifier = ArithmeticVerifier()

# Verify a final answer
result = verifier.verify_answer(
    problem=problem,
    answer="42",
    gold_answer="42"
)
print(f"Correct: {result.correct}, Score: {result.score}")

# Verify a complete trace with partial credit
trace_result = verifier.verify_trace(problem, trace)
print(f"Partial credit: {trace_result.score}")

Curriculum Learning

from chuk_math_gym.curriculum import CurriculumScheduler
from chuk_math_gym.curriculum.strategies import (
    LinearProgressionStrategy,
    AdaptiveStrategy,
    MasteryBasedStrategy
)

# Create scheduler with adaptive difficulty
scheduler = CurriculumScheduler(
    strategy=AdaptiveStrategy(
        success_threshold=0.8,
        failure_threshold=0.4
    )
)

# Get current difficulty
difficulty = scheduler.get_difficulty()

# Update based on performance
scheduler.update(score=0.9, correct=True)

Project Structure

src/chuk_math_gym/
├── domains/           # Domain-specific implementations
│   ├── arithmetic/    # Arithmetic expression problems
│   ├── fractions/     # Fraction manipulation problems
│   └── linear_equations/  # Equation solving problems
├── env/               # Gymnasium environment base
├── schemas/           # Pydantic models for problems, traces, verification
├── verifiers/         # Answer and trace verification
├── trace/             # Solution trace generation
├── curriculum/        # Difficulty scheduling strategies
├── generators/        # Problem generation base classes
├── explanations/      # Step-by-step explanation generation
├── expression_generator/  # Random expression generation
└── compiler/          # Expression parsing and compilation

Difficulty Levels

Problems can be generated at seven difficulty levels:

Level	Description
`VERY_EASY`	Simple single-operation problems
`EASY`	Basic problems with small numbers
`PRETTY_EASY`	Slightly more complex expressions
`MEDIUM`	Multi-step problems
`HARD`	Complex expressions with decimals
`PRETTY_HARD`	Challenging multi-operation problems
`VERY_HARD`	Advanced problems with large numbers

Testing

Run the test suite:

# Using pytest directly
uv run pytest

# With coverage
uv run pytest --cov=src/chuk_math_gym --cov-report=term-missing

# Run all checks (lint + format + tests)
make check

Current test coverage: 96% with 632 tests.

CLI Tools

Expression Generator

Generate random expressions at specified difficulty:

python expression_generator_cli.py --difficulty "medium"

Compiler

Compile expressions with step-by-step explanations:

uv run main.py "3 + 5 * (10 - 4)" --format jsonl

With LLM-enhanced explanations:

uv run main.py "3 + 5 * (10 - 4)" --llm "phi4" --format jsonl

Generate Training Samples

# Chat-style samples
python generate_chat_samples.py -n 5 -d "easy" --llm "granite3.1-dense"

# Verifier training samples
python generate_verifier_samples.py -n 20 -d "medium"

Schemas

Problem

from chuk_math_gym import Problem, DomainType, DifficultyLevel

problem = Problem(
    id="prob_001",
    seed=42,
    domain=DomainType.ARITHMETIC,
    difficulty=DifficultyLevel.EASY,
    prompt="Calculate: 3 + 5",
    expression="3 + 5",
    gold_answer="8"
)

Trace

from chuk_math_gym import Trace, Step, StepOperation

trace = Trace(
    problem_id="prob_001",
    steps=[
        Step(
            index=0,
            operation=StepOperation.ADD,
            before_state="3 + 5",
            after_state="8",
            output_value=8.0
        )
    ]
)

Verification Result

from chuk_math_gym import VerificationResult

result = VerificationResult(
    correct=True,
    score=1.0,
    error_type=None,
    feedback="Correct answer!"
)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
examples		examples
output		output
scripts		scripts
src/chuk_math_gym		src/chuk_math_gym
tests		tests
.coverage		.coverage
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
coverage.xml		coverage.xml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

chuk-math-gym

Overview

Key Features

Installation

Using uv (recommended)

Using pip

Quick Start

Using the Gym Environment

Domain-Specific Environments

Verifying Solutions

Curriculum Learning

Project Structure

Difficulty Levels

Testing

CLI Tools

Expression Generator

Compiler

Generate Training Samples

Schemas

Problem

Trace

Verification Result

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

chrishayuk/chuk-math

Folders and files

Latest commit

History

Repository files navigation

chuk-math-gym

Overview

Key Features

Installation

Using uv (recommended)

Using pip

Quick Start

Using the Gym Environment

Domain-Specific Environments

Verifying Solutions

Curriculum Learning

Project Structure

Difficulty Levels

Testing

CLI Tools

Expression Generator

Compiler

Generate Training Samples

Schemas

Problem

Trace

Verification Result

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages