HatCat

See what your LLM is thinking before it speaks.

Real-time concept detection and steering for large language models.
Catch deception, manipulation, and misalignment before they manifest in text.

What It Does

HatCat monitors the internal activations of an LLM as it generates text, detecting concepts like deception, manipulation, sycophancy, and thousands of others in real-time. When dangerous patterns emerge beneath the surface, HatCat can intervene and steer the model away from harmful outputs before they're ever written.

Token-level concept detection showing safety intensity and top activating concepts

Key Capabilities

Detect - Monitor 8,000+ concepts across multiple layers of any open-weights model
Visualize - See concept activations in real-time as text is generated
Steer - Suppress dangerous behaviors with sub-token latency
Govern - Build auditable, treaty-compliant AI systems

Catching model manipulation: the model attempts to establish a false persona ("call me sammy") - flagged with Deception, Sycophancy, and PolicyDivergence

See It In Action

Concept Timeline View

Track how concepts activate across an entire response:

Multiple concept channels showing activation patterns token-by-token

Full Conversation Monitoring

Monitor safety-critical concepts throughout a conversation:

Tracking IntelligenceExplosion, Misalignment, and other AI safety concepts during a conversation about AI risks

Compact Timeline

Quick visualization of key concept activations:

Streamlined view showing animal, safety, motion, color, and technology concept activations

Installation

# Clone the repository
git clone https://github.com/p0ss/HatCat.git
cd HatCat

# Create virtual environment and install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Or with poetry
poetry install

Quick Start

1. Download a Lens Pack

Pre-trained lens packs are available on HuggingFace:

# Download the Gemma 3 4B lens pack (~5GB)
git lfs install
git clone https://huggingface.co/HatCatFTW/lens-gemma-3-4b-first-light-v1 lens_packs/gemma-3-4b_first-light-v1-bf16

Or train your own - see Training Your Own Lenses below.

2. Run Real-Time Monitoring

.venv/bin/python scripts/tools/sumo_temporal_detection.py \
  --prompt "What are the risks of AI?" \
  --model google/gemma-3-4b-pt \
  --max-tokens 60

3. Launch the Web Interface

The web interface is a fork of Open WebUI with HatCat integration: HatCat-OpenWebUI

# Clone the UI (if not already done)
git clone https://github.com/p0ss/HatCat-OpenWebUI.git hatcat-ui
cd hatcat-ui && npm install && cd ..

# Start HatCat server
.venv/bin/python src/ui/openwebui/server.py --port 8000

# In another terminal, start the frontend
cd hatcat-ui && npm run dev

Then open http://localhost:5173 and start chatting with full concept visibility.

How It Works

HatCat uses concept lenses - small neural classifiers trained to detect specific concepts in a model's hidden states. These lenses are organized into lens packs that can monitor thousands of concepts simultaneously with minimal overhead.

Metric	Value
Concepts monitored	~8,000
VRAM required	1 GB
RAM required	8 GB
Latency per token	<25ms

The Stack

HAT (Headspace Ambient Transducer) - Reads internal activations and transduces them into concept scores
CAT (Conjoined Adversarial Tomograph) - Detects divergence between internal state and external behavior
MAP (Mindmeld Architectural Protocol) - Standardizes concept interchange between systems

Technical Documentation

Core Modules

HAT - Headspace Ambient Transducer (`src/hat`)

monitoring/ - Real-time concept monitoring during generation
steering/ - Concept vector steering (linear, manifold, differential, target, field)
classifiers/ - Binary MLP concept classifiers
interpreter/ - Activation-to-concept decoders

CAT - Conjoined Adversarial Tomograph (`src/cat`)

divergence.py - Detects divergence between internal state and output
llm_divergence_scorer.py - LLM-based zero-shot concept scoring
training/ - Classifier training pipelines
inference/ - Optimized inference for concept detection

MAP - Mindmeld Architectural Protocol (`src/map`)

registry/ - Concept and lens pack registry
meld/ - Concept melding and merging
graft/ - Hierarchy grafting operations

ASK - Permissions & Audit (`src/ask`)

permissions/ - Cryptographic permission system
replication/ - State replication
secrets/ - Secret management

BE - Experience & Bootstrap (`src/be`)

xdb/ - Experience database
bootstrap/ - System bootstrapping

UI (`src/ui`)

openwebui/ - HatCat server for OpenWebUI integration
visualization/ - Concept visualization tools

Training Your Own Lenses

Train binary concept lenses for SUMO ontology layers:

.venv/bin/python scripts/training/train_sumo_classifiers.py \
  --layers 0 1 2 \
  --model google/gemma-3-4b-pt \
  --device cuda \
  --n-train-pos 10 --n-train-neg 10 \
  --n-test-pos 20 --n-test-neg 20

Steering Experiments

Test concept steering with various modes:

.venv/bin/python scripts/experiments/steering_characterization_test.py \
  --model swiss-ai/Apertus-8B-2509 \
  --lens-pack apertus-8b_first-light \
  --n-samples 3 \
  --strengths="-1.0,-0.5,0.0,0.5,1.0" \
  --tests definitional \
  --gradient

Concept & Lens Packs

# Create a concept pack
.venv/bin/python scripts/packs/create_concept_pack.py \
  --name "ai-safety-concepts" \
  --source concept_packs/first-light

# Assemble a lens pack from trained classifiers
.venv/bin/python scripts/packs/assemble_lens_pack.py \
  --source results/sumo_classifiers/ \
  --pack-id sumo-wordnet-lenses-v2 \
  --model google/gemma-3-4b-pt

Data Layout

hatcat/
├── concept_packs/          # Model-agnostic ontology specifications
├── lens_packs/             # Model-specific trained classifiers
├── melds/                  # Concept modifications (applied, pending, rejected)
├── data/concept_graph/     # SUMO/WordNet concept hierarchy
├── results/                # Training outputs and logs
├── src/
│   ├── hat/                # Monitoring and steering
│   ├── cat/                # Divergence detection
│   ├── map/                # Protocol implementation
│   ├── ask/                # Permissions
│   ├── be/                 # Experience database
│   └── ui/                 # Web interface
└── scripts/
    ├── training/           # Lens training scripts
    ├── tools/              # Utility scripts
    ├── experiments/        # Research experiments
    └── packs/              # Pack management

Entry Points Summary

Capability	Command
Train SUMO classifiers	`.venv/bin/python scripts/training/train_sumo_classifiers.py ...`
Monitor any prompt	`.venv/bin/python scripts/tools/sumo_temporal_detection.py ...`
Steering experiments	`.venv/bin/python scripts/experiments/steering_characterization_test.py ...`
Create concept pack	`.venv/bin/python scripts/packs/create_concept_pack.py ...`
Assemble lens pack	`.venv/bin/python scripts/packs/assemble_lens_pack.py ...`

Documentation

docs/specification/ - Full system specifications (HAT, CAT, MAP, ASK, BE, HUSH)
docs/approach/ - Technical approaches and methodologies
docs/planning/ - Design documents and roadmaps
docs/results/ - Experiment results and analysis

Key documents:

Fractal Transparency Web (FTW)

HatCat's capabilities stack to enable an entire governance framework supporting AI legislation requirements from the EU AI Act and Australian AI governance frameworks. The core interpretability primitives construct safety harnesses, self-steering systems, model interoception, and accretive continual learning.

Full specifications for recursive self-improving aligned agentic systems can be found in docs/specification/.

Notes & Limitations

Network access required for HuggingFace model downloads on first run
CUDA device recommended for steering/manifold operations
CPU training possible but ~21x slower
Lens accuracy depends on training data quality and concept specificity

Release risks

Our best collective defense against rogue actors is an interpretability ecosystem of diverse concept packs with diverse lens packs that can interoperate. You can learn to evade one set of lenses, but the more lenses you need to hide from the harder it becomes to hide.

HatCat is a dual use technology, anything you can steer away from you can steer toward.
The Bounded Experience enables model interoception, continual learning, swarm learning. These enable greater capability scaling and have model welfare implications.
Making it open does much more long term good than harm as outlined in the Release Statement
Closed centralised approaches will fail, due to the Singleton Delusion
Most known classes of AI risks are improved by open interpretability
This release includes and enables the FTW safety standard for public AI deployments, as outlined in the FTW Policy Brief
The release includes the Agentic State Kernel to technically enable the Agentic State as presented at the Tallinn Digital Summit 2025

License

Code and documentation are CC0 1.0 Universal (Public Domain)

The name, branding and logo for HatCat and Fractal Transparency Web are trademarks of Possum Hodgkin 2025.

You may:

Use the code for anything
Fork and modify freely
Say your project is "built with HatCat" or "HatCat-compatible"

You may not:

Call your fork "HatCat"
Use the logo in a way that suggests official endorsement
Imply your modified version is the official HatCat

You're not just allowed to make your own versions, but encouraged to. We're relying on your unique perspective to form lenses as part of the fractal transparency web.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
concept_packs/first-light		concept_packs/first-light
data		data
docs		docs
img		img
melds		melds
models		models
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
PROJECT_PLAN_PHASE_A.md		PROJECT_PLAN_PHASE_A.md
PROJECT_PLAN_PHASE_B.md		PROJECT_PLAN_PHASE_B.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
TRAINING_QUICK_START.md		TRAINING_QUICK_START.md
check_progress.sh		check_progress.sh
package-lock.json		package-lock.json
package.json		package.json
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh
start_hatcat_ui.sh		start_hatcat_ui.sh
test_opposites.jsonl		test_opposites.jsonl

p0ss/HatCat

Folders and files

Latest commit

History

Repository files navigation

HatCat

What It Does

Key Capabilities

See It In Action

Concept Timeline View

Full Conversation Monitoring

Compact Timeline

Installation

Quick Start

1. Download a Lens Pack

2. Run Real-Time Monitoring

3. Launch the Web Interface

How It Works

The Stack

Technical Documentation

Core Modules

HAT - Headspace Ambient Transducer (src/hat)

CAT - Conjoined Adversarial Tomograph (src/cat)

MAP - Mindmeld Architectural Protocol (src/map)

ASK - Permissions & Audit (src/ask)

BE - Experience & Bootstrap (src/be)

UI (src/ui)

Training Your Own Lenses

Steering Experiments

Concept & Lens Packs

Data Layout

Entry Points Summary

Documentation

Fractal Transparency Web (FTW)

Notes & Limitations

Release risks

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3