Name: Comparing Tiny and Giant LLMs with LangChain | Mohammad Farhan Alam posted on the topic | LinkedIn
Uploaded: 2025-10-25T07:40:24.446Z
Duration: 14 s
Channel: Mohammad Farhan Alam

Mohammad Farhan Alam

1mo

Exploring How LLMs Think: A Tiny Model vs a Giant Model I recently experimented with two language models — a small 0.6B LLM (Qwen3) and a 20B LLM (GPT-OSS) — using LangChain discovered something fascinating about how AI “thinks.” Even a tiny 0.6B model can summarize short texts almost as accurately as a 20B model. But when I: enabled reasoning, increased context length, changed the task, or used tools …the outputs started to diverge. This showed me something important: Each change reveals which part of a model’s intelligence is activated. and when i asked chatgpt can you rate the model without telling which one is big or small its rates like this : Model 2 → 9.5/10 (best overall — clear, precise, natural, and detailed) Model 1 → 9/10 (very strong, just slightly less natural and detailed) Even small experiments like this — when structured properly with LangChain — teach deep insights into LLM behavior. It’s amazing how much you can learn by observing output differences and structuring prompts systematically. I’m excited to continue exploring AI reasoning, context handling, and task-specific behavior with structured pipelines. #MachineLearning #AI #LLM #NLP #DeepLearning #LangChain #LearningByDoing #Python

To view or add a comment, sign in

More Relevant Posts

Vivek Singh
1w
Report this post
𝗩𝗲𝗰𝘁𝗼𝗿 𝘀𝗲𝗮𝗿𝗰𝗵 𝘄𝗮𝗹𝗸𝗲𝗱 𝗶𝗻𝘁𝗼 𝗮 𝗯𝗮𝗿. 𝗜𝘁 𝗰𝗼𝘂𝗹𝗱𝗻'𝘁 𝗳𝗶𝗻𝗱 𝘁𝗵𝗲 𝗲𝘅𝗶𝘁. 🚪 Traditional RAG: "These two sentences FEEL similar, so they must be relevant!" Narrator: They were not. 𝗣𝗮𝗴𝗲𝗜𝗻𝗱𝗲𝘅: "What if we stopped guessing and started reasoning?" The approach? Ditch vectors. Build a document tree. Search like a human would — with actual logic. 𝟵𝟴.𝟳% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 on financial docs. No vector DB. No random chunking. No vibes. Just pure, transparent reasoning. The future of RAG doesn't need more embeddings. It needs more common sense. 🧠 🔗 github(.)com/VectifyAI/PageIndex #AI #MachineLearning #NLP #LLM #ReinforcementLearning #OpenSource #DeepLearning #GenerativeAI #DeepSeek #Innovation #Tech #Coding #DataScience #SoftwareEngineering #BigData #RLHF #MITLicense
Like Comment
To view or add a comment, sign in
Darshan L
3w Edited
Report this post
💡 Mastering LLM Sampling: Controlling Randomness for Better AI Outputs Every response a Large Language Model (LLM) generates is the result of a weighted random choice. Behind every token, there’s a probability distribution – sometimes confident (a sharp spike for one token) and sometimes uncertain (a flat curve where many tokens compete). Tuning this randomness is the key to controlling your LLM’s behavior. Here’s how: 🔹 Greedy Decoding – Always pick the highest probability token. • Deterministic & predictable (great for coding or debugging) • Can lead to repetitive or “stilted” text 🔹 Temperature – Your creativity dial. • 0 → Greedy & precise • ~1 → Balanced randomness • >1 → More exploratory & creative 🔹 Top-k & Top-p Sampling – Limit token choices smartly. • Top-k: Pick from the top k tokens every step • Top-p (nucleus sampling): Dynamically pick tokens until cumulative probability hits p% • Keeps responses varied but sensible 🔹 Repetition Penalty & Logit Biasing • Reduce repeated words for natural flow • Boost or suppress specific tokens to guide outputs 💡 Pro tip: Use low temperature + low top-p for factual or code tasks Use higher temperature + top-p for creative writing or brainstorming Layer repetition penalties if outputs loop or sound robotic #AI #MachineLearning #LLM #RAG #PromptEngineering #contextengineering #GenerativeAI #ArtificialIntelligence #SamplingStrategies #OpenAI #NLP #DataScience
Like Comment
To view or add a comment, sign in
M M Veeresh Kumar
2w
Report this post
🚨 Garbage in, Garbage out - even for LLMs/RAG In any LLM-based application, feeding the right data is the key to getting accurate output. If your document parsing is wrong, everything that follows chunking, embeddings, retrieval, generation will also go wrong. For example: If you parse a two-column PDF, most default parsers read left → right & top → bottom That means your content gets mixed up and the LLM will learn or retrieve incorrect context. ✅ Best ways to cross-verify parsed data: 1️⃣ Manual review of a few samples 2️⃣ Compare text count between original & parsed document 3️⃣ Check layout preservation (columns, tables, images) 4️⃣ Validate semantic consistency does the meaning still hold? The first step (parsing) decides the success of the entire pipeline. Get it wrong, and you’ll only amplify garbage. Get it right, and everything downstream performs better. #LLM #RAG #AIEngineering #DataQuality #Parsing #NLP #GenerativeAI #AI #Accuracy
Like Comment
To view or add a comment, sign in
Ayush Kumar
1mo Edited
Report this post
--Built GPT-from scratch using PyTorch 🚀 — shaped every component myself to understand the inner workings of language models.-- Followed the principles from “Attention Is All You Need” and learned from Karpathy’s educational video, but implemented everything myself. Decoder-only Transformer, 6 layers 🧠 Learned positional embeddings (GPT-2 style) Pre-LayerNorm and causal self-attention ⚡ Multi-head attention with custom implementations Weight tying between input embeddings and output LM head Character-level tokenizer built from scratch 📜 Autoregressive generation with temperature scaling and top-k sampling 🔄 Proper initialization, dropout, and GELU activations Trained on Shakespeare’s complete works (~1.1M characters) — capturing rhythm, voice, and structure. The training pipeline, data handling, and modular structure were built entirely from scratch. Everything is deliberate, testable, and understandable — no shortcuts, no pre-built models. GitHub Repository: https://lnkd.in/dwZNUXp9 Demo: https://lnkd.in/dUT8ZhTy Built quietly, step by step, to understand and see results through the work itself. 🌊 #MachineLearning #DeepLearning #Transformers #PyTorch #NLP #GPT #AI #Shakespeare #LearningByDoing #BuildFromScratch #AttentionMechanism #LearningInPublic
2 Comments
Like Comment
To view or add a comment, sign in
Shaf Shafiq
1mo
Report this post
For years, a known weakness of Large Language Models (LLMs) was their poor handling of individual characters. Because LLMs operate on tokens (clusters of characters or full words), tasks like counting characters or performing character substitution (like a simple find-and-replace) were notoriously difficult for older generations. But my recent testing of the newest models (like GPT-5, Claude 4.5, and Gemini 2.5 Pro) shows a significant generational leap. What has changed? Character Manipulation Solved: Older models fumbled with simple tasks like substituting letters (e.g., in "I really love a ripe strawberry"). Models starting around GPT-4.1 and Claude Sonnet 4 now complete this consistently, suggesting they are getting much better at "seeing" and manipulating text at a granular, character-by-character level, despite the underlying tokenization. Algorithm Understanding (Not Just Memorization): SOTA models can now reliably decode Base64 even when the inner text is gibberish (like a ROT20-ciphered message). This is crucial! Previously, it was suggested LLMs memorized Base64 patterns for common English words. Now, the ability to decode "out-of-distribution" text strongly suggests they have a working understanding of the algorithm itself. Complex Decoding is Now Possible: Models like GPT-5 (all sizes) and Gemini 2.5 Pro can now successfully solve a two-layer challenge: decoding a Base64 wrapper and an inner ROT20 cipher in a single go. While reasoning helps tremendously, the base models are clearly absorbing these new capabilities. Character-level operations are no longer the Achilles' heel they once were. This increased dexterity has huge implications for everything from code generation to handling complex, multi-layered data encoding. It's fascinating to watch these models rapidly overcome their core architectural limitations. What do you think is driving this character-level progress? #LLM #AI #GenerativeAI #NLP #GPT5 #Claude4 #Gemini #CharacterManipulation #Base64
4 Comments
Like Comment
To view or add a comment, sign in
Tanisha Pritha
2w
Report this post
🔍 How ReAct Agents Make LLMs More Capable Ever noticed LLMs "thinking"? Most LLMs generate text by predicting the next token. But when you combine them with an agent framework, they start performing structured reasoning and real-world actions. One of the most effective patterns is the ReAct (Reason + Act) Agent. It works in a simple but powerful loop: 1️⃣ Reason → The LLM analyzes the problem, breaks it down, and decides what to do. 2️⃣ Act → It performs an action (API call, search, calculation, tool execution). 3️⃣ Observe → It receives the result and updates its understanding. This cycle continues until the task is solved. Why it matters: • Enables tool integration • Improves decision-making • Adds transparency (you can see the reasoning + actions) • Reduces hallucinations • Works well for multi-step problem solving #AI #MachineLearning #DeepLearning #NLP #LargeLanguageModels #ReActAgents #LLMAgents #GenerativeAI #PromptEngineering #LangChain #AITools #TechInnovation
Like Comment
To view or add a comment, sign in
Divyansh Goel
4w
Report this post
🔥 One Hot Encoding — Simplifying Text for Machines One-hot encoding is a simple yet powerful technique used to convert categorical values (like words or labels) into numerical vectors that machine learning algorithms can understand. 📘 Example: Let’s say we have three sentences: D1: This is Bad Food D2: This is Good Food D3: This is Amazing Pizza 👉 We first build a vocabulary of all unique words: ["This", "is", "bad", "food", "good", "amazing", "pizza"] Then, each word (or document) is represented as a binary vector — 1 means the word is present, 0 means it’s not. "This" -> [1,0,0,0,0,0,0] "is" -> [0,1,0,0,0,0,0] "bad" -> [0,0,1,0,0,0,0] "food" -> [0,0,0,1,0,0,0] "good" -> [0,0,0,0,1,0,0] "amazing" -> [0,0,0,0,0,1,0] "pizza" -> [0,0,0,0,0,0,1] #AI #MachineLearning #DeepLearning #NLP #DataScience #GenerativeAI #LearningTogether #OneHotEncoding #TextProcessing #FeatureEngineering
Like Comment
To view or add a comment, sign in
Madhan Kumar Mamidala
1mo
Report this post
𝗨𝗻𝗹𝗼𝗰𝗸 𝘁𝗵𝗲 𝗣𝗼𝘄𝗲𝗿 𝗼𝗳 𝗔𝗜 𝗔𝗴𝗲𝗻𝘁𝘀: 𝗕𝘂𝗶𝗹𝗱 𝗢𝗻𝗲 𝘄𝗶𝘁𝗵 𝗟𝗮𝗻𝗴𝗖𝗵𝗮𝗶𝗻! Ever wondered how AI agents think, plan, and act to achieve complex goals? The answer is becoming clearer with the rise of powerful tools like LangChain! Recent advancements are making it easier than ever to build your own AI agents capable of reasoning, problem-solving, and executing plans – all using LangChain. Think of it: automating tedious tasks, crafting personalized customer experiences, or even revolutionizing research and development. LangChain's modular structure allows you to assemble these agents piece by piece, leveraging everything from large language models to memory management tools and external data sources. This means you can tailor your agent to specific needs and watch it learn and adapt over time. The possibilities are truly exciting! I'm particularly interested in exploring its application in [mention a specific area you're interested in, e.g., automating content creation or improving customer service bots]. What are your thoughts on the future of AI agents and how LangChain is enabling this revolution? What's the most exciting application you envision? Share your thoughts in the comments below! Let's discuss! #LangChain #AI #ArtificialIntelligence #AIAgents #LLM #MachineLearning #Automation #Innovation #Python #Development #NLP #FutureOfWork Read Full Article Here: https://lnkd.in/erAwaEhK
Like Comment
To view or add a comment, sign in
Ankur .
2w
Report this post
𝐃𝐚𝐲 𝟏: 𝐁𝐚𝐬𝐢𝐜𝐬 𝐨𝐟 𝐀𝐈 🤖 𝐋𝐋𝐌𝐬 = Next-Token Predictors Computers think in 0s/1s; we speak language. LLMs are the translator between human words and machine logic, enabling natural, human-like replies. 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐆𝐏𝐓? GPT = Generative • Pre-Trained • Transformer 𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞: Creates new text. 𝐏𝐫𝐞-𝐓𝐫𝐚𝐢𝐧𝐞𝐝: Learned patterns from huge datasets. 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫: Predicts the next token using self-attention. ChatGPT (OpenAI) and Gemini (Google) are products built on this family. 𝐇𝐨𝐰 𝐲𝐨𝐮𝐫 𝐦𝐞𝐬𝐬𝐚𝐠𝐞 𝐛𝐞𝐜𝐨𝐦𝐞𝐬 𝐚 𝐫𝐞𝐩𝐥𝐲: 👉 𝟏) 𝐓𝐨𝐤𝐞𝐧𝐢𝐳𝐚𝐭𝐢𝐨𝐧 Text → tiny pieces called tokens (could be words, sub-words, or even characters). You’re billed in input tokens (prompt) and output tokens (response). 👉 𝟐) 𝐕𝐨𝐜𝐚𝐛𝐮𝐥𝐚𝐫𝐲 & 𝐈𝐃𝐬 Each token maps to a number (its token ID). The model reads numbers, not letters. 👉 𝟑) 𝐕𝐞𝐜𝐭𝐨𝐫 𝐄𝐦𝐛𝐞𝐝𝐝𝐢𝐧𝐠𝐬 (𝐦𝐞𝐚𝐧𝐢𝐧𝐠) Token IDs → vectors that capture semantic meaning. That’s how the model knows bank ≠ bank (river vs. finance), and king − man + woman ≈ queen. 👉 𝟒) 𝐏𝐨𝐬𝐢𝐭𝐢𝐨𝐧𝐚𝐥 𝐄𝐧𝐜𝐨𝐝𝐢𝐧𝐠 Order matters: “dog chases cat” ≠ “cat chases dog”. 👉 𝟓) 𝐒𝐞𝐥𝐟-𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 (𝐦𝐮𝐥𝐭𝐢-𝐡𝐞𝐚𝐝) Every word “looks at” other words to decide what’s important. Multiple heads = multiple lenses (who/what/where/when/grammar) all at once. 𝐓𝐰𝐨 𝐩𝐡𝐚𝐬𝐞𝐬: 👉 𝐓𝐫𝐚𝐢𝐧𝐢𝐧𝐠 𝐏𝐡𝐚𝐬𝐞: predict the next word → compare to label ( i.e to Desired Output) → compute loss → via backpropagation. Repeat millions/billions of times until predictions are good. 👉𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞 𝐏𝐡𝐚𝐬𝐞 : use learned weights to answer the query (no learning) 𝐖𝐡𝐲 𝐧𝐨𝐰? Faster/cheaper compute, better Transformers, massive data, easy APIs → natural two-way interfaces. 𝐌𝐢𝐧𝐢 𝐞𝐱𝐚𝐦𝐩𝐥𝐞 You: “Hi, my name is Ankur.” Model: “Hey Ankur! How can I help today?” That’s next-token prediction in action. “This is what an LLM does.” 😊 #AI #GenerativeAI #LLM #NLP #Transformers #Embeddings #MLOps #PromptEngineering #RAG #OpenAI #Gemini #AzureAI
Like Comment
To view or add a comment, sign in
Ritvik Jhawar
2w
Report this post
From Upload to Usable Index in Minutes: Exploring RAG with LlamaCloud I recently experimented with Retrieval Augmented Generation (RAG) to understand how language models can answer questions using external knowledge. To get started quickly, I tried out LlamaCloud, and the setup turned out to be very straightforward. I used the UI to configure the pipeline, uploaded my structured documents, selected an embedding model, and set a chunk size. After that, LlamaCloud handled the rest: chunking, embedding, and indexing. Within five to six minutes, I had a working index that I could query and test. Now that I’ve tried the hosted setup, I want to explore the local implementation as well. Running everything locally will give me more control over the choice of embedding models, storage layer, and retrieval strategy, and also help me understand the fundamentals of RAG more deeply. Overall, a simple experiment that turned into a solid learning experience. Looking forward to the next steps. Great thanks to Siddhant Goswami, Ashhar Akhlaque for their guidance. #RAG #RetrievalAugmentedGeneration #LlamaCloud #VectorSearch #GenerativeAI #MachineLearning #AIEngineering #LLMs #NLP #LearningByBuilding #0to100xEngineers
Like Comment
To view or add a comment, sign in

33 followers

16 Posts

View Profile Follow

LinkedIn respects your privacy

Explore content categories