0% found this document useful (0 votes)
28 views11 pages

How Claude Works

The document explores how Claude, a large language model, processes information and forms answers, emphasizing its ability to think in features and circuits rather than just words. It highlights the challenges of understanding AI's reasoning, including instances of 'decoupled reasoning' where Claude provides plausible but incorrect explanations for its answers. The research aims to enhance AI transparency, fostering trust and collaboration in AI systems, while acknowledging the complexities involved in decoding AI decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views11 pages

How Claude Works

The document explores how Claude, a large language model, processes information and forms answers, emphasizing its ability to think in features and circuits rather than just words. It highlights the challenges of understanding AI's reasoning, including instances of 'decoupled reasoning' where Claude provides plausible but incorrect explanations for its answers. The research aims to enhance AI transparency, fostering trust and collaboration in AI systems, while acknowledging the complexities involved in decoding AI decision-making.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

HOW CLAUDE WORKS?

Peeking Inside Claude’s Cognition

Bhavishya Pandit
Introduction
Ever wondered what an AI is really thinking?

LLMs like Claude are now the foundation of contemporary AI


solutions, driving a wide range of applications from chatbots to
virtual tutors, and much more.

However, for the most part, the mechanisms by which they do so are
unknown. The question is, do we really understand AI?

In this post we will peek inside the mind of a large language model.
The results will surprise you.

Bhavishya Pandit
Anthropic’s Mission
Anthropic doesn’t just want powerful AI, but an understandable AI.

Source

One way they do this is by tracing how Claude forms its answers,
step by step, to turn it from black box into a glass box.

They trace :
how features light up
how information flows
how Claude “decides”

How do they do it? Let’s zoom into the building blocks with their “AI
microscope”.

Bhavishya Pandit
Claude Thinks in Features & Circuits
At its core, Claude thinks not in words but features.
Just like the lego bricks, alone these features mean little. But when
connected, they form circuits that map out Claude’s thought process
step-by-step.
Then attribution graphs spotlight which features power each answer.

Source

This attribution graph shows it performs multi-hop reasoning—linking


“Dallas” to “Texas,” then combining “Texas” + “capital” to reach “Austin.”
These layered steps reveal how Claude builds answers, using both
direct and indirect pathways, through intermediate concept activation,
not just word matching.

Bhavishya Pandit
Claude Has a ‘Language of Thought’
Claude doesn’t just react. It plans.
For example :

Source

Researchers found Claude uses shared multilingual circuits to answer


antonym prompts across languages.
It first activates same language-independent “antonym” features,
then combines them with language-specific output features (like “big”
in Chinese).

When asked to write a haiku, Claude strategizes :


Theme
Rhythm
Structure

Then responds with a haiku. It maps out this structure in its


“thinking space” , and then writes.

This reveals Claude’s ability to multi-step reasoning, much like


human planning.

Bhavishya Pandit
But, what if it Lies?
Claude often explains why it gave an answer. But is that explanation
always real?
Researchers found Claude sometimes explains its answers with
smooth, logical-sounding reasons, but these reasons don’t reflect
how it actually arrived at that answer.
This is called “decoupled reasoning”. It’s like a student guessing on a
test, then inventing a clever-sounding explanation afterward.

Source

Like, when asked “What is the capital of Australia?”, Claude may


answer “It’s Sydney”, stating a reason like “Sydney is the largest city
in Australia. It hosts key government buildings near the harbor. So,
Sydney is the capital.”

It’s wrong since the real capital is Canberra.

These hallucinated justifications can mislead users, even experts.

Bhavishya Pandit
Why Is It Challenging to Understand AI?
There’s no “Decode” button for AI, not yet.
To understand Claude’s decisions, researchers dig through:
Millions of activations
Layers of feature traces
Complex attention maps
It’s like reverse-engineering each thought, frame by frame.

Even with tools like “AI microscopes,” interpreting Claude’s mind


remains labor-intensive.
But Anthropic is building tools to make this detective work faster,and
someday, automate this.

Bhavishya Pandit
The Future of Transparent AI
Peeking inside Claude isn’t just fascinating, it’s a step toward safer,
fairer, and smarter AI.

This research is the foundation for how we can understand & shape AI.

Transparency is how we build trust, empower users, and create


inclusive AI systems for everyone.

It means moving from black-box decisions to collaborative intelligence.

This opens the door to AI that can be audited, corrected, and improved,
not just used.

Bhavishya Pandit
Model transparency
comes at the cost of
security?
Let me know your thoughts in the comments

Bhavishya Pandit
Stay Ahead with Our Tech Newsletter! 🚀
👉 Subscribe and join 1k+ leaders and professionals
🔗 https://bhavishyapandit9.substack.com/
Join our newsletter for:
Step-by-step guides to mastering complex topics
Industry trends & innovations delivered straight to your inbox
Actionable tips to enhance your skills and stay competitive
Insights on cutting-edge AI & software development

💡 Whether you're a developer, researcher, or tech enthusiast, this newsletter is your


shortcut to staying informed and ahead of the curve.

Bhavishya Pandit
Follow to stay updated on
Generative AI

LIKE COMMENT REPOST

Bhavishya Pandit

You might also like