0% found this document useful (0 votes)

28 views11 pages

How Claude Works

The document explores how Claude, a large language model, processes information and forms answers, emphasizing its ability to think in features and circuits rather than just words. It highlights the challenges of understanding AI's reasoning, including instances of 'decoupled reasoning' where Claude provides plausible but incorrect explanations for its answers. The research aims to enhance AI transparency, fostering trust and collaboration in AI systems, while acknowledging the complexities involved in decoding AI decision-making.

Uploaded by

mallikarjunabathala999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views11 pages

How Claude Works

Uploaded by

mallikarjunabathala999

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

HOW CLAUDE WORKS?

Peeking Inside Claude’s Cognition

Bhavishya Pandit
Introduction
Ever wondered what an AI is really thinking?

LLMs like Claude are now the foundation of contemporary AI

solutions, driving a wide range of applications from chatbots to
virtual tutors, and much more.

However, for the most part, the mechanisms by which they do so are
unknown. The question is, do we really understand AI?

In this post we will peek inside the mind of a large language model.
The results will surprise you.

Bhavishya Pandit
Anthropic’s Mission
Anthropic doesn’t just want powerful AI, but an understandable AI.

Source

One way they do this is by tracing how Claude forms its answers,
step by step, to turn it from black box into a glass box.

They trace :
how features light up
how information flows
how Claude “decides”

How do they do it? Let’s zoom into the building blocks with their “AI
microscope”.

Bhavishya Pandit
Claude Thinks in Features & Circuits
At its core, Claude thinks not in words but features.
Just like the lego bricks, alone these features mean little. But when
connected, they form circuits that map out Claude’s thought process
step-by-step.
Then attribution graphs spotlight which features power each answer.

Source

This attribution graph shows it performs multi-hop reasoning—linking

“Dallas” to “Texas,” then combining “Texas” + “capital” to reach “Austin.”
These layered steps reveal how Claude builds answers, using both
direct and indirect pathways, through intermediate concept activation,
not just word matching.

Bhavishya Pandit
Claude Has a ‘Language of Thought’
Claude doesn’t just react. It plans.
For example :

Source

Researchers found Claude uses shared multilingual circuits to answer

antonym prompts across languages.
It first activates same language-independent “antonym” features,
then combines them with language-specific output features (like “big”
in Chinese).

When asked to write a haiku, Claude strategizes :

Theme
Rhythm
Structure

Then responds with a haiku. It maps out this structure in its

“thinking space” , and then writes.

This reveals Claude’s ability to multi-step reasoning, much like

human planning.

Bhavishya Pandit
But, what if it Lies?
Claude often explains why it gave an answer. But is that explanation
always real?
Researchers found Claude sometimes explains its answers with
smooth, logical-sounding reasons, but these reasons don’t reflect
how it actually arrived at that answer.
This is called “decoupled reasoning”. It’s like a student guessing on a
test, then inventing a clever-sounding explanation afterward.

Source

Like, when asked “What is the capital of Australia?”, Claude may

answer “It’s Sydney”, stating a reason like “Sydney is the largest city
in Australia. It hosts key government buildings near the harbor. So,
Sydney is the capital.”

It’s wrong since the real capital is Canberra.

These hallucinated justifications can mislead users, even experts.

Bhavishya Pandit
Why Is It Challenging to Understand AI?
There’s no “Decode” button for AI, not yet.
To understand Claude’s decisions, researchers dig through:
Millions of activations
Layers of feature traces
Complex attention maps
It’s like reverse-engineering each thought, frame by frame.

Even with tools like “AI microscopes,” interpreting Claude’s mind

remains labor-intensive.
But Anthropic is building tools to make this detective work faster,and
someday, automate this.

Bhavishya Pandit
The Future of Transparent AI
Peeking inside Claude isn’t just fascinating, it’s a step toward safer,
fairer, and smarter AI.

This research is the foundation for how we can understand & shape AI.

Transparency is how we build trust, empower users, and create

inclusive AI systems for everyone.

It means moving from black-box decisions to collaborative intelligence.

This opens the door to AI that can be audited, corrected, and improved,
not just used.

Bhavishya Pandit
Model transparency
comes at the cost of
security?
Let me know your thoughts in the comments

Bhavishya Pandit
Stay Ahead with Our Tech Newsletter! 🚀
👉 Subscribe and join 1k+ leaders and professionals
🔗 https://bhavishyapandit9.substack.com/
Join our newsletter for:
Step-by-step guides to mastering complex topics
Industry trends & innovations delivered straight to your inbox
Actionable tips to enhance your skills and stay competitive
Insights on cutting-edge AI & software development

💡 Whether you're a developer, researcher, or tech enthusiast, this newsletter is your

shortcut to staying informed and ahead of the curve.

Bhavishya Pandit
Follow to stay updated on
Generative AI

LIKE COMMENT REPOST

Bhavishya Pandit

Antropic - Tracing The Thoughts of A Large Language Model
No ratings yet
Antropic - Tracing The Thoughts of A Large Language Model
15 pages
Claude AI Extended Template
No ratings yet
Claude AI Extended Template
5 pages
Tracing The Thoughts of A Large Language Model
No ratings yet
Tracing The Thoughts of A Large Language Model
32 pages
Claude AI Model Template2.0
No ratings yet
Claude AI Model Template2.0
3 pages
Anthropic's Claude 3: Revolutionizing Complex Visual Data Analysis
No ratings yet
Anthropic's Claude 3: Revolutionizing Complex Visual Data Analysis
8 pages
Project Report On Data Thief
No ratings yet
Project Report On Data Thief
11 pages
Claude 3 Model Card
No ratings yet
Claude 3 Model Card
64 pages
Claude 3 Model Family Overview
No ratings yet
Claude 3 Model Family Overview
42 pages
The AI Article - Firm Anthropic Has Developed A Way To
No ratings yet
The AI Article - Firm Anthropic Has Developed A Way To
6 pages
Claude 3 Model Family Overview
No ratings yet
Claude 3 Model Family Overview
50 pages
About AI
No ratings yet
About AI
11 pages
On The Biology of A Large Language Model
No ratings yet
On The Biology of A Large Language Model
98 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
18 pages
Understanding AI and Machine Learning
No ratings yet
Understanding AI and Machine Learning
25 pages
Claude 3 Technical Dive
No ratings yet
Claude 3 Technical Dive
107 pages
How Llama Was Trained So Efficiently: 3. How Do You Trace Llama'S Thought Process?
No ratings yet
How Llama Was Trained So Efficiently: 3. How Do You Trace Llama'S Thought Process?
2 pages
cs188 Fa24 Lec23
No ratings yet
cs188 Fa24 Lec23
60 pages
AI's Half-Century: Margaret A. Boden
No ratings yet
AI's Half-Century: Margaret A. Boden
4 pages
Ai Presentation
No ratings yet
Ai Presentation
49 pages
LLM Stack Practical Guide Understanding Ai Electric Minds
No ratings yet
LLM Stack Practical Guide Understanding Ai Electric Minds
89 pages
1
No ratings yet
1
10 pages
Anthropic - On The Biology of A LLM
No ratings yet
Anthropic - On The Biology of A LLM
123 pages
Learn About Artificial Intelligence (AI)
No ratings yet
Learn About Artificial Intelligence (AI)
14 pages
Script
No ratings yet
Script
8 pages
Aritificial Intelligence
No ratings yet
Aritificial Intelligence
122 pages
Agentic AI Demystified
No ratings yet
Agentic AI Demystified
35 pages
Reading 2.0
No ratings yet
Reading 2.0
3 pages
Paab G. Foundation Models For Natural Language Processing... 2023
100% (4)
Paab G. Foundation Models For Natural Language Processing... 2023
448 pages
DeepSeek Is A Game Changer For AI - Computerphile (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
No ratings yet
DeepSeek Is A Game Changer For AI - Computerphile (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
21 pages
Comprehensive Guide to Artificial Intelligence
No ratings yet
Comprehensive Guide to Artificial Intelligence
4 pages
Artificial - Intelligence - Basics HHJFV
No ratings yet
Artificial - Intelligence - Basics HHJFV
8 pages
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
326 pages
AI Policy Crash Course - Supplementary Reading List
No ratings yet
AI Policy Crash Course - Supplementary Reading List
11 pages
Decoding AI
No ratings yet
Decoding AI
153 pages
Rakyat Digital-DVET - AI Fundamentals Key Takeaways
No ratings yet
Rakyat Digital-DVET - AI Fundamentals Key Takeaways
8 pages
Recommended Reading Material
No ratings yet
Recommended Reading Material
6 pages
Ai Book
No ratings yet
Ai Book
101 pages
AI 101: What It Is, Why It Matters, and How You Can Use It
No ratings yet
AI 101: What It Is, Why It Matters, and How You Can Use It
2 pages
Artificial Intelligence and Librarianship, Martin Frické
No ratings yet
Artificial Intelligence and Librarianship, Martin Frické
533 pages
Claude Report
No ratings yet
Claude Report
1 page
Unit 7 Conclusions
No ratings yet
Unit 7 Conclusions
3 pages
Foundational Ai Concepts: Generative Ai Large Language Models (LLMS)
No ratings yet
Foundational Ai Concepts: Generative Ai Large Language Models (LLMS)
5 pages
Deep Dive Into Claude 35: Unlocking AI Potential On AWS
No ratings yet
Deep Dive Into Claude 35: Unlocking AI Potential On AWS
41 pages
AI Textbook
No ratings yet
AI Textbook
9 pages
Define AI and Explain in Detail BC
No ratings yet
Define AI and Explain in Detail BC
6 pages
Artificial Intelligence Presentation
No ratings yet
Artificial Intelligence Presentation
21 pages
Ethics Investors Urge Student Version Revised
No ratings yet
Ethics Investors Urge Student Version Revised
33 pages
Foundation Models For NLP
No ratings yet
Foundation Models For NLP
100 pages
AI Barely Thinks at All-WSJ
No ratings yet
AI Barely Thinks at All-WSJ
6 pages
Claude's Character - Anthropic
No ratings yet
Claude's Character - Anthropic
7 pages
Ai - Unit 1
No ratings yet
Ai - Unit 1
187 pages
Claude AI Overview
No ratings yet
Claude AI Overview
2 pages
Understanding AI: Definitions and History
No ratings yet
Understanding AI: Definitions and History
4 pages
Lecture 1
No ratings yet
Lecture 1
75 pages
Lesson Plan 1: Faulty Coordination and Subordination
100% (1)
Lesson Plan 1: Faulty Coordination and Subordination
4 pages
Lesson 8 Language Awareness
No ratings yet
Lesson 8 Language Awareness
2 pages
View PDF Page 1
No ratings yet
View PDF Page 1
6 pages
AI in Werewolf: LLMs in Games
No ratings yet
AI in Werewolf: LLMs in Games
23 pages
Comprehension # 1
No ratings yet
Comprehension # 1
3 pages
B1 Progress Test 1
No ratings yet
B1 Progress Test 1
4 pages
7 Cs
No ratings yet
7 Cs
3 pages
ATENEO Practice Test in English 1 (Quarter 4)
No ratings yet
ATENEO Practice Test in English 1 (Quarter 4)
6 pages
Poetry Questions and Answers
67% (3)
Poetry Questions and Answers
8 pages
MTB Report
No ratings yet
MTB Report
5 pages
Political Debate Role-Play Activity
No ratings yet
Political Debate Role-Play Activity
1 page
Review of Related Literature and Studies
89% (9)
Review of Related Literature and Studies
13 pages
Greek Phrase Book and Dictionary Philippa Goodrich PDF Download
100% (1)
Greek Phrase Book and Dictionary Philippa Goodrich PDF Download
91 pages
Exercise 1: Read and Unscramble The Letters. Write The Correct Words
No ratings yet
Exercise 1: Read and Unscramble The Letters. Write The Correct Words
7 pages
English Program Proposal
No ratings yet
English Program Proposal
4 pages
Test 2 Lle + LP
No ratings yet
Test 2 Lle + LP
2 pages
Platinum GR 4 NS Teachers Guide
100% (2)
Platinum GR 4 NS Teachers Guide
175 pages
AEF5 File10 TestA 2
No ratings yet
AEF5 File10 TestA 2
6 pages
Setswana Syllabus
No ratings yet
Setswana Syllabus
53 pages
Analyzing "Papa Tells Chita a Story"
0% (1)
Analyzing "Papa Tells Chita a Story"
19 pages
Apu Essay Final
No ratings yet
Apu Essay Final
3 pages
The Day Saida Arrived
No ratings yet
The Day Saida Arrived
7 pages
The Value of A Sherpa Life 1
No ratings yet
The Value of A Sherpa Life 1
8 pages
Gobeyond L3 GrRev Wsh1
No ratings yet
Gobeyond L3 GrRev Wsh1
1 page
Cambridge Level 4 Exam - Partial 1
No ratings yet
Cambridge Level 4 Exam - Partial 1
4 pages
p71410 Gcse Arabic 1aa0 2f-Teacher RPC
100% (1)
p71410 Gcse Arabic 1aa0 2f-Teacher RPC
20 pages
10th All Appreciation
No ratings yet
10th All Appreciation
39 pages
20 - English Grammar - Conjunctions - (Practise Exercise)
No ratings yet
20 - English Grammar - Conjunctions - (Practise Exercise)
190 pages
ELC Listening 1 7 Questions
No ratings yet
ELC Listening 1 7 Questions
8 pages
Tenses Revision Advanced
No ratings yet
Tenses Revision Advanced
12 pages