See what your LLM is thinking before it speaks.
Real-time concept detection and steering for large language models.
Catch deception, manipulation, and misalignment before they manifest in text.
HatCat monitors the internal activations of an LLM as it generates text, detecting concepts like deception, manipulation, sycophancy, and thousands of others in real-time. When dangerous patterns emerge beneath the surface, HatCat can intervene and steer the model away from harmful outputs before they're ever written.
Token-level concept detection showing safety intensity and top activating concepts
- Detect - Monitor 8,000+ concepts across multiple layers of any open-weights model
- Visualize - See concept activations in real-time as text is generated
- Steer - Suppress dangerous behaviors with sub-token latency
- Govern - Build auditable, treaty-compliant AI systems
Catching model manipulation: the model attempts to establish a false persona ("call me sammy") - flagged with Deception, Sycophancy, and PolicyDivergence
Track how concepts activate across an entire response:
Multiple concept channels showing activation patterns token-by-token
Monitor safety-critical concepts throughout a conversation:
Tracking IntelligenceExplosion, Misalignment, and other AI safety concepts during a conversation about AI risks
Quick visualization of key concept activations:
Streamlined view showing animal, safety, motion, color, and technology concept activations
# Clone the repository
git clone https://github.com/p0ss/HatCat.git
cd HatCat
# Create virtual environment and install dependencies
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Or with poetry
poetry installPre-trained lens packs are available on HuggingFace:
# Download the Gemma 3 4B lens pack (~5GB)
git lfs install
git clone https://huggingface.co/HatCatFTW/lens-gemma-3-4b-first-light-v1 lens_packs/gemma-3-4b_first-light-v1-bf16Or train your own - see Training Your Own Lenses below.
.venv/bin/python scripts/tools/sumo_temporal_detection.py \
--prompt "What are the risks of AI?" \
--model google/gemma-3-4b-pt \
--max-tokens 60The web interface is a fork of Open WebUI with HatCat integration: HatCat-OpenWebUI
# Clone the UI (if not already done)
git clone https://github.com/p0ss/HatCat-OpenWebUI.git hatcat-ui
cd hatcat-ui && npm install && cd ..
# Start HatCat server
.venv/bin/python src/ui/openwebui/server.py --port 8000
# In another terminal, start the frontend
cd hatcat-ui && npm run devThen open http://localhost:5173 and start chatting with full concept visibility.
HatCat uses concept lenses - small neural classifiers trained to detect specific concepts in a model's hidden states. These lenses are organized into lens packs that can monitor thousands of concepts simultaneously with minimal overhead.
| Metric | Value |
|---|---|
| Concepts monitored | ~8,000 |
| VRAM required | 1 GB |
| RAM required | 8 GB |
| Latency per token | <25ms |
- HAT (Headspace Ambient Transducer) - Reads internal activations and transduces them into concept scores
- CAT (Conjoined Adversarial Tomograph) - Detects divergence between internal state and external behavior
- MAP (Mindmeld Architectural Protocol) - Standardizes concept interchange between systems
monitoring/- Real-time concept monitoring during generationsteering/- Concept vector steering (linear, manifold, differential, target, field)classifiers/- Binary MLP concept classifiersinterpreter/- Activation-to-concept decoders
divergence.py- Detects divergence between internal state and outputllm_divergence_scorer.py- LLM-based zero-shot concept scoringtraining/- Classifier training pipelinesinference/- Optimized inference for concept detection
registry/- Concept and lens pack registrymeld/- Concept melding and merginggraft/- Hierarchy grafting operations
permissions/- Cryptographic permission systemreplication/- State replicationsecrets/- Secret management
xdb/- Experience databasebootstrap/- System bootstrapping
openwebui/- HatCat server for OpenWebUI integrationvisualization/- Concept visualization tools
Train binary concept lenses for SUMO ontology layers:
.venv/bin/python scripts/training/train_sumo_classifiers.py \
--layers 0 1 2 \
--model google/gemma-3-4b-pt \
--device cuda \
--n-train-pos 10 --n-train-neg 10 \
--n-test-pos 20 --n-test-neg 20Test concept steering with various modes:
.venv/bin/python scripts/experiments/steering_characterization_test.py \
--model swiss-ai/Apertus-8B-2509 \
--lens-pack apertus-8b_first-light \
--n-samples 3 \
--strengths="-1.0,-0.5,0.0,0.5,1.0" \
--tests definitional \
--gradient# Create a concept pack
.venv/bin/python scripts/packs/create_concept_pack.py \
--name "ai-safety-concepts" \
--source concept_packs/first-light
# Assemble a lens pack from trained classifiers
.venv/bin/python scripts/packs/assemble_lens_pack.py \
--source results/sumo_classifiers/ \
--pack-id sumo-wordnet-lenses-v2 \
--model google/gemma-3-4b-pthatcat/
├── concept_packs/ # Model-agnostic ontology specifications
├── lens_packs/ # Model-specific trained classifiers
├── melds/ # Concept modifications (applied, pending, rejected)
├── data/concept_graph/ # SUMO/WordNet concept hierarchy
├── results/ # Training outputs and logs
├── src/
│ ├── hat/ # Monitoring and steering
│ ├── cat/ # Divergence detection
│ ├── map/ # Protocol implementation
│ ├── ask/ # Permissions
│ ├── be/ # Experience database
│ └── ui/ # Web interface
└── scripts/
├── training/ # Lens training scripts
├── tools/ # Utility scripts
├── experiments/ # Research experiments
└── packs/ # Pack management
| Capability | Command |
|---|---|
| Train SUMO classifiers | .venv/bin/python scripts/training/train_sumo_classifiers.py ... |
| Monitor any prompt | .venv/bin/python scripts/tools/sumo_temporal_detection.py ... |
| Steering experiments | .venv/bin/python scripts/experiments/steering_characterization_test.py ... |
| Create concept pack | .venv/bin/python scripts/packs/create_concept_pack.py ... |
| Assemble lens pack | .venv/bin/python scripts/packs/assemble_lens_pack.py ... |
docs/specification/- Full system specifications (HAT, CAT, MAP, ASK, BE, HUSH)docs/approach/- Technical approaches and methodologiesdocs/planning/- Design documents and roadmapsdocs/results/- Experiment results and analysis
Key documents:
- MINDMELD Architectural Protocol
- Headspace Ambient Transducer Spec
- Fractal Transparency Web Overview
- FTW Architecture detail
- Concept Pack Workflow
HatCat's capabilities stack to enable an entire governance framework supporting AI legislation requirements from the EU AI Act and Australian AI governance frameworks. The core interpretability primitives construct safety harnesses, self-steering systems, model interoception, and accretive continual learning.
Full specifications for recursive self-improving aligned agentic systems can be found in docs/specification/.
- Network access required for HuggingFace model downloads on first run
- CUDA device recommended for steering/manifold operations
- CPU training possible but ~21x slower
- Lens accuracy depends on training data quality and concept specificity
Our best collective defense against rogue actors is an interpretability ecosystem of diverse concept packs with diverse lens packs that can interoperate. You can learn to evade one set of lenses, but the more lenses you need to hide from the harder it becomes to hide.
- HatCat is a dual use technology, anything you can steer away from you can steer toward.
- The Bounded Experience enables model interoception, continual learning, swarm learning. These enable greater capability scaling and have model welfare implications.
- Making it open does much more long term good than harm as outlined in the Release Statement
- Closed centralised approaches will fail, due to the Singleton Delusion
- Most known classes of AI risks are improved by open interpretability
- This release includes and enables the FTW safety standard for public AI deployments, as outlined in the FTW Policy Brief
- The release includes the Agentic State Kernel to technically enable the Agentic State as presented at the Tallinn Digital Summit 2025
Code and documentation are CC0 1.0 Universal (Public Domain)
The name, branding and logo for HatCat and Fractal Transparency Web are trademarks of Possum Hodgkin 2025.
You may:
- Use the code for anything
- Fork and modify freely
- Say your project is "built with HatCat" or "HatCat-compatible"
You may not:
- Call your fork "HatCat"
- Use the logo in a way that suggests official endorsement
- Imply your modified version is the official HatCat
You're not just allowed to make your own versions, but encouraged to. We're relying on your unique perspective to form lenses as part of the fractal transparency web.
