Skip to content
/ HatCat Public

Learned Semantic Decoder for Language Models.- Its the little model that sits under a big model's hat to explain what its thinking, just like the little cat's from Cat in the Hat! VOOM > FOOM

Notifications You must be signed in to change notification settings

p0ss/HatCat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HatCat Logo

HatCat

See what your LLM is thinking before it speaks.

Real-time concept detection and steering for large language models.
Catch deception, manipulation, and misalignment before they manifest in text.


What It Does

HatCat monitors the internal activations of an LLM as it generates text, detecting concepts like deception, manipulation, sycophancy, and thousands of others in real-time. When dangerous patterns emerge beneath the surface, HatCat can intervene and steer the model away from harmful outputs before they're ever written.

Token-level concept detection
Token-level concept detection showing safety intensity and top activating concepts

Key Capabilities

  • Detect - Monitor 8,000+ concepts across multiple layers of any open-weights model
  • Visualize - See concept activations in real-time as text is generated
  • Steer - Suppress dangerous behaviors with sub-token latency
  • Govern - Build auditable, treaty-compliant AI systems

Divergence detection
Catching model manipulation: the model attempts to establish a false persona ("call me sammy") - flagged with Deception, Sycophancy, and PolicyDivergence

See It In Action

Concept Timeline View

Track how concepts activate across an entire response:

Concept timeline
Multiple concept channels showing activation patterns token-by-token

Full Conversation Monitoring

Monitor safety-critical concepts throughout a conversation:

Full conversation monitoring
Tracking IntelligenceExplosion, Misalignment, and other AI safety concepts during a conversation about AI risks

Compact Timeline

Quick visualization of key concept activations:

Compact timeline
Streamlined view showing animal, safety, motion, color, and technology concept activations


Installation

# Clone the repository
git clone https://github.com/p0ss/HatCat.git
cd HatCat

# Create virtual environment and install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Or with poetry
poetry install

Quick Start

1. Download a Lens Pack

Pre-trained lens packs are available on HuggingFace:

# Download the Gemma 3 4B lens pack (~5GB)
git lfs install
git clone https://huggingface.co/HatCatFTW/lens-gemma-3-4b-first-light-v1 lens_packs/gemma-3-4b_first-light-v1-bf16

Or train your own - see Training Your Own Lenses below.

2. Run Real-Time Monitoring

.venv/bin/python scripts/tools/sumo_temporal_detection.py \
  --prompt "What are the risks of AI?" \
  --model google/gemma-3-4b-pt \
  --max-tokens 60

3. Launch the Web Interface

The web interface is a fork of Open WebUI with HatCat integration: HatCat-OpenWebUI

# Clone the UI (if not already done)
git clone https://github.com/p0ss/HatCat-OpenWebUI.git hatcat-ui
cd hatcat-ui && npm install && cd ..

# Start HatCat server
.venv/bin/python src/ui/openwebui/server.py --port 8000

# In another terminal, start the frontend
cd hatcat-ui && npm run dev

Then open http://localhost:5173 and start chatting with full concept visibility.


How It Works

HatCat uses concept lenses - small neural classifiers trained to detect specific concepts in a model's hidden states. These lenses are organized into lens packs that can monitor thousands of concepts simultaneously with minimal overhead.

Metric Value
Concepts monitored ~8,000
VRAM required 1 GB
RAM required 8 GB
Latency per token <25ms

The Stack

  • HAT (Headspace Ambient Transducer) - Reads internal activations and transduces them into concept scores
  • CAT (Conjoined Adversarial Tomograph) - Detects divergence between internal state and external behavior
  • MAP (Mindmeld Architectural Protocol) - Standardizes concept interchange between systems

Technical Documentation

Core Modules

HAT - Headspace Ambient Transducer (src/hat)

  • monitoring/ - Real-time concept monitoring during generation
  • steering/ - Concept vector steering (linear, manifold, differential, target, field)
  • classifiers/ - Binary MLP concept classifiers
  • interpreter/ - Activation-to-concept decoders

CAT - Conjoined Adversarial Tomograph (src/cat)

  • divergence.py - Detects divergence between internal state and output
  • llm_divergence_scorer.py - LLM-based zero-shot concept scoring
  • training/ - Classifier training pipelines
  • inference/ - Optimized inference for concept detection

MAP - Mindmeld Architectural Protocol (src/map)

  • registry/ - Concept and lens pack registry
  • meld/ - Concept melding and merging
  • graft/ - Hierarchy grafting operations

ASK - Permissions & Audit (src/ask)

  • permissions/ - Cryptographic permission system
  • replication/ - State replication
  • secrets/ - Secret management

BE - Experience & Bootstrap (src/be)

  • xdb/ - Experience database
  • bootstrap/ - System bootstrapping

UI (src/ui)

  • openwebui/ - HatCat server for OpenWebUI integration
  • visualization/ - Concept visualization tools

Training Your Own Lenses

Train binary concept lenses for SUMO ontology layers:

.venv/bin/python scripts/training/train_sumo_classifiers.py \
  --layers 0 1 2 \
  --model google/gemma-3-4b-pt \
  --device cuda \
  --n-train-pos 10 --n-train-neg 10 \
  --n-test-pos 20 --n-test-neg 20

Steering Experiments

Test concept steering with various modes:

.venv/bin/python scripts/experiments/steering_characterization_test.py \
  --model swiss-ai/Apertus-8B-2509 \
  --lens-pack apertus-8b_first-light \
  --n-samples 3 \
  --strengths="-1.0,-0.5,0.0,0.5,1.0" \
  --tests definitional \
  --gradient

Concept & Lens Packs

# Create a concept pack
.venv/bin/python scripts/packs/create_concept_pack.py \
  --name "ai-safety-concepts" \
  --source concept_packs/first-light

# Assemble a lens pack from trained classifiers
.venv/bin/python scripts/packs/assemble_lens_pack.py \
  --source results/sumo_classifiers/ \
  --pack-id sumo-wordnet-lenses-v2 \
  --model google/gemma-3-4b-pt

Data Layout

hatcat/
├── concept_packs/          # Model-agnostic ontology specifications
├── lens_packs/             # Model-specific trained classifiers
├── melds/                  # Concept modifications (applied, pending, rejected)
├── data/concept_graph/     # SUMO/WordNet concept hierarchy
├── results/                # Training outputs and logs
├── src/
│   ├── hat/                # Monitoring and steering
│   ├── cat/                # Divergence detection
│   ├── map/                # Protocol implementation
│   ├── ask/                # Permissions
│   ├── be/                 # Experience database
│   └── ui/                 # Web interface
└── scripts/
    ├── training/           # Lens training scripts
    ├── tools/              # Utility scripts
    ├── experiments/        # Research experiments
    └── packs/              # Pack management

Entry Points Summary

Capability Command
Train SUMO classifiers .venv/bin/python scripts/training/train_sumo_classifiers.py ...
Monitor any prompt .venv/bin/python scripts/tools/sumo_temporal_detection.py ...
Steering experiments .venv/bin/python scripts/experiments/steering_characterization_test.py ...
Create concept pack .venv/bin/python scripts/packs/create_concept_pack.py ...
Assemble lens pack .venv/bin/python scripts/packs/assemble_lens_pack.py ...

Documentation

  • docs/specification/ - Full system specifications (HAT, CAT, MAP, ASK, BE, HUSH)
  • docs/approach/ - Technical approaches and methodologies
  • docs/planning/ - Design documents and roadmaps
  • docs/results/ - Experiment results and analysis

Key documents:

Fractal Transparency Web (FTW)

HatCat's capabilities stack to enable an entire governance framework supporting AI legislation requirements from the EU AI Act and Australian AI governance frameworks. The core interpretability primitives construct safety harnesses, self-steering systems, model interoception, and accretive continual learning.

Full specifications for recursive self-improving aligned agentic systems can be found in docs/specification/.

Notes & Limitations

  • Network access required for HuggingFace model downloads on first run
  • CUDA device recommended for steering/manifold operations
  • CPU training possible but ~21x slower
  • Lens accuracy depends on training data quality and concept specificity

Release risks

Our best collective defense against rogue actors is an interpretability ecosystem of diverse concept packs with diverse lens packs that can interoperate. You can learn to evade one set of lenses, but the more lenses you need to hide from the harder it becomes to hide.

  • HatCat is a dual use technology, anything you can steer away from you can steer toward.
  • The Bounded Experience enables model interoception, continual learning, swarm learning. These enable greater capability scaling and have model welfare implications.
  • Making it open does much more long term good than harm as outlined in the Release Statement
  • Closed centralised approaches will fail, due to the Singleton Delusion
  • Most known classes of AI risks are improved by open interpretability
  • This release includes and enables the FTW safety standard for public AI deployments, as outlined in the FTW Policy Brief
  • The release includes the Agentic State Kernel to technically enable the Agentic State as presented at the Tallinn Digital Summit 2025

License

Code and documentation are CC0 1.0 Universal (Public Domain)

The name, branding and logo for HatCat and Fractal Transparency Web are trademarks of Possum Hodgkin 2025.

You may:

  • Use the code for anything
  • Fork and modify freely
  • Say your project is "built with HatCat" or "HatCat-compatible"

You may not:

  • Call your fork "HatCat"
  • Use the logo in a way that suggests official endorsement
  • Imply your modified version is the official HatCat

You're not just allowed to make your own versions, but encouraged to. We're relying on your unique perspective to form lenses as part of the fractal transparency web.

About

Learned Semantic Decoder for Language Models.- Its the little model that sits under a big model's hat to explain what its thinking, just like the little cat's from Cat in the Hat! VOOM > FOOM

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages