Ditching Vectors: A New Approach to RAG with Pure Reasoning

𝗩𝗲𝗰𝘁𝗼𝗿 𝘀𝗲𝗮𝗿𝗰𝗵 𝘄𝗮𝗹𝗸𝗲𝗱 𝗶𝗻𝘁𝗼 𝗮 𝗯𝗮𝗿. 𝗜𝘁 𝗰𝗼𝘂𝗹𝗱𝗻'𝘁 𝗳𝗶𝗻𝗱 𝘁𝗵𝗲 𝗲𝘅𝗶𝘁. 🚪 Traditional RAG: "These two sentences FEEL similar, so they must be relevant!" Narrator: They were not. 𝗣𝗮𝗴𝗲𝗜𝗻𝗱𝗲𝘅: "What if we stopped guessing and started reasoning?" The approach? Ditch vectors. Build a document tree. Search like a human would — with actual logic. 𝟵𝟴.𝟳% 𝗮𝗰𝗰𝘂𝗿𝗮𝗰𝘆 on financial docs. No vector DB. No random chunking. No vibes. Just pure, transparent reasoning. The future of RAG doesn't need more embeddings. It needs more common sense. 🧠 🔗 github(.)com/VectifyAI/PageIndex #AI #MachineLearning #NLP #LLM #ReinforcementLearning #OpenSource #DeepLearning #GenerativeAI #DeepSeek #Innovation #Tech #Coding #DataScience #SoftwareEngineering #BigData #RLHF #MITLicense

To view or add a comment, sign in

More Relevant Posts

Ritvik Jhawar
2w
Report this post
From Upload to Usable Index in Minutes: Exploring RAG with LlamaCloud I recently experimented with Retrieval Augmented Generation (RAG) to understand how language models can answer questions using external knowledge. To get started quickly, I tried out LlamaCloud, and the setup turned out to be very straightforward. I used the UI to configure the pipeline, uploaded my structured documents, selected an embedding model, and set a chunk size. After that, LlamaCloud handled the rest: chunking, embedding, and indexing. Within five to six minutes, I had a working index that I could query and test. Now that I’ve tried the hosted setup, I want to explore the local implementation as well. Running everything locally will give me more control over the choice of embedding models, storage layer, and retrieval strategy, and also help me understand the fundamentals of RAG more deeply. Overall, a simple experiment that turned into a solid learning experience. Looking forward to the next steps. Great thanks to Siddhant Goswami, Ashhar Akhlaque for their guidance. #RAG #RetrievalAugmentedGeneration #LlamaCloud #VectorSearch #GenerativeAI #MachineLearning #AIEngineering #LLMs #NLP #LearningByBuilding #0to100xEngineers
Like Comment
To view or add a comment, sign in
Ayush Kumar
1mo Edited
Report this post
--Built GPT-from scratch using PyTorch 🚀 — shaped every component myself to understand the inner workings of language models.-- Followed the principles from “Attention Is All You Need” and learned from Karpathy’s educational video, but implemented everything myself. Decoder-only Transformer, 6 layers 🧠 Learned positional embeddings (GPT-2 style) Pre-LayerNorm and causal self-attention ⚡ Multi-head attention with custom implementations Weight tying between input embeddings and output LM head Character-level tokenizer built from scratch 📜 Autoregressive generation with temperature scaling and top-k sampling 🔄 Proper initialization, dropout, and GELU activations Trained on Shakespeare’s complete works (~1.1M characters) — capturing rhythm, voice, and structure. The training pipeline, data handling, and modular structure were built entirely from scratch. Everything is deliberate, testable, and understandable — no shortcuts, no pre-built models. GitHub Repository: https://lnkd.in/dwZNUXp9 Demo: https://lnkd.in/dUT8ZhTy Built quietly, step by step, to understand and see results through the work itself. 🌊 #MachineLearning #DeepLearning #Transformers #PyTorch #NLP #GPT #AI #Shakespeare #LearningByDoing #BuildFromScratch #AttentionMechanism #LearningInPublic
2 Comments
Like Comment
To view or add a comment, sign in
Akash Shahade
4w
Report this post
🤖 RAG vs Fine-tuning: Which AI approach should you choose? 💡 RAG (Retrieval-Augmented Generation) Fetches real-time information (from PDFs, web, docs, APIs, etc.) to answer your query, without retraining the model. Perfect for dynamic, ever-changing data. 💡 Fine-tuning Trains the model offline with domain-specific data — the model learns permanently, offering deeper expertise but requiring more time and compute. ⚖️ The Bottom Line: - Need the latest information? → Go with RAG - Need specialized expertise? → Choose Fine-tuning 👨💻The future of AI isn't choosing one over the other, it's knowing when to use each approach. Follow Akash Shahade for more simple and practical AI breakdowns.🤖 #AI #MachineLearning #LLM #FineTuning #RAG #GenAI #ArtificialIntelligence #AIInsights #DeepLearning #TechInnovation #DataScience #NLP #AIEngineering #TechLeadership
Like Comment
To view or add a comment, sign in
Kumaresh Gupta
1mo
Report this post
Retrieval-Augmented Generation (RAG) is one of the most practical ways to make LLMs smarter by grounding their answers in external knowledge. A Naïve RAG setup is the simplest implementation – great for getting started. Here’s how it works: 1️⃣ Break your documents into chunks and embed them into a vector database. 2️⃣ When a query comes in, convert it into an embedding and perform a similarity search. 3️⃣ Retrieve the top-k chunks and pass them directly into the LLM prompt. 4️⃣ The LLM generates an answer based on both the query and retrieved context. It’s “naïve” because everything is direct – no reranking, filtering, or query rewriting. While it may bring in irrelevant chunks, it provides a solid baseline for experimentation. From here, teams usually add smarter retrieval strategies to improve accuracy and reduce noise. 👉 Start simple. Scale smart. #RAG #RetrievalAugmentedGeneration #NaiveRAG #GenerativeAI #LLM #VectorDatabase #AI #ArtificialIntelligence #MachineLearning #NLP #TechTrends #UNext
Like Comment
To view or add a comment, sign in
Divyansh Goel
4w
Report this post
🔥 One Hot Encoding — Simplifying Text for Machines One-hot encoding is a simple yet powerful technique used to convert categorical values (like words or labels) into numerical vectors that machine learning algorithms can understand. 📘 Example: Let’s say we have three sentences: D1: This is Bad Food D2: This is Good Food D3: This is Amazing Pizza 👉 We first build a vocabulary of all unique words: ["This", "is", "bad", "food", "good", "amazing", "pizza"] Then, each word (or document) is represented as a binary vector — 1 means the word is present, 0 means it’s not. "This" -> [1,0,0,0,0,0,0] "is" -> [0,1,0,0,0,0,0] "bad" -> [0,0,1,0,0,0,0] "food" -> [0,0,0,1,0,0,0] "good" -> [0,0,0,0,1,0,0] "amazing" -> [0,0,0,0,0,1,0] "pizza" -> [0,0,0,0,0,0,1] #AI #MachineLearning #DeepLearning #NLP #DataScience #GenerativeAI #LearningTogether #OneHotEncoding #TextProcessing #FeatureEngineering
Like Comment
To view or add a comment, sign in
Mohammad Farhan Alam
1mo
Report this post
Exploring How LLMs Think: A Tiny Model vs a Giant Model I recently experimented with two language models — a small 0.6B LLM (Qwen3) and a 20B LLM (GPT-OSS) — using LangChain discovered something fascinating about how AI “thinks.” Even a tiny 0.6B model can summarize short texts almost as accurately as a 20B model. But when I: enabled reasoning, increased context length, changed the task, or used tools …the outputs started to diverge. This showed me something important: Each change reveals which part of a model’s intelligence is activated. and when i asked chatgpt can you rate the model without telling which one is big or small its rates like this : Model 2 → 9.5/10 (best overall — clear, precise, natural, and detailed) Model 1 → 9/10 (very strong, just slightly less natural and detailed) Even small experiments like this — when structured properly with LangChain — teach deep insights into LLM behavior. It’s amazing how much you can learn by observing output differences and structuring prompts systematically. I’m excited to continue exploring AI reasoning, context handling, and task-specific behavior with structured pipelines. #MachineLearning #AI #LLM #NLP #DeepLearning #LangChain #LearningByDoing #Python
Like Comment
To view or add a comment, sign in
Prasanna Reddy Pulakurthi
2w Edited
Report this post
"𝑾𝒉𝒚 𝒅𝒊𝒅 𝒕𝒉𝒊𝒔 𝒗𝒊𝒅𝒆𝒐 𝒈𝒆𝒕 𝒓𝒂𝒏𝒌𝒆𝒅 𝒇𝒊𝒓𝒔𝒕?" is a question most retrieval systems can’t really answer. I am super excited to share our latest work "𝐗-𝐂𝐨𝐓: 𝐄𝐱𝐩𝐥𝐚𝐢𝐧𝐚𝐛𝐥𝐞 𝐓𝐞𝐱𝐭-𝐭𝐨-𝐕𝐢𝐝𝐞𝐨 𝐑𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐯𝐢𝐚 𝐋𝐋𝐌-𝐛𝐚𝐬𝐞𝐝 𝐂𝐡𝐚𝐢𝐧-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠" published in 𝐄𝐌𝐍𝐋𝐏 𝟮𝟬𝟮𝟱 (Main) Conference. Instead of relying only on cosine similarity scores from embedding models, 𝐗-𝐂𝐨𝐓 𝐚𝐬𝐤𝐬 𝐚𝐧 𝐋𝐋𝐌 𝐭𝐨 𝒕𝒉𝒊𝒏𝒌 𝒕𝒉𝒓𝒐𝒖𝒈𝒉 𝐭𝐡𝐞 𝐫𝐚𝐧𝐤𝐢𝐧𝐠, 𝐫𝐞𝐫𝐚𝐧𝐤 𝐚𝐧𝐝 𝐞𝐱𝐩𝐥𝐚𝐢𝐧 𝘸𝘩𝘺 one video should be preferred over another. The goal is not just higher retrieval metrics, but rankings that come with human-readable reasons. What X-CoT does: - 𝐔𝐬𝐞𝐬 𝐋𝐋𝐌-𝐛𝐚𝐬𝐞𝐝 𝐩𝐚𝐢𝐫𝐰𝐢𝐬𝐞 reasoning to build a full video ranking. - Produces 𝐡𝐮𝐦𝐚𝐧-𝐫𝐞𝐚𝐝𝐚𝐛𝐥𝐞 𝐫𝐚𝐭𝐢𝐨𝐧𝐚𝐥𝐞𝐬 for each comparison, so you can see 𝘸𝘩𝘺 a candidate is above or below another. - Uses the explanations to 𝐬𝐩𝐨𝐭 𝐛𝐚𝐝 𝐨𝐫 𝐛𝐢𝐚𝐬𝐞𝐝 𝐭𝐞𝐱𝐭-𝐯𝐢𝐝𝐞𝐨 𝐩𝐚𝐢𝐫𝐬 and analyze model behavior, not just metrics. Data contributions: - We 𝐞𝐱𝐩𝐚𝐧𝐝 𝐞𝐱𝐢𝐬𝐭𝐢𝐧𝐠 𝐭𝐞𝐱𝐭-𝐭𝐨-𝐯𝐢𝐝𝐞𝐨 𝐛𝐞𝐧𝐜𝐡𝐦𝐚𝐫𝐤𝐬 with extra video annotations that improve semantic coverage. - The dataset is publicly released on HuggingFace to support future work on 𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐚𝐛𝐥𝐞 𝐯𝐢𝐝𝐞𝐨 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐚𝐧𝐝 𝐋𝐋𝐌 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠. Links and resources: - Paper: https://lnkd.in/ge8XNmgW - Code: https://lnkd.in/gfpfptYe - Project Page: https://lnkd.in/gZzaNGzu - HuggingFace Dataset: https://lnkd.in/gm7i98v7 Grateful to work with an amazing team: Jiamian (Aloes) Wang, Dr. Majid Rabbani, Dr. Sohail Dianat, Dr. Raghuveer Rao, and Dr. Zhiqiang Tao. If you are working on multimodal retrieval, LLM reasoning, or explainable AI, I would love to hear your feedback and thoughts. And if you find X-CoT useful, please try it out, share it, and consider citing it! #EMNLP2025 #ExplainableAI #LLM #ChainOfThought #Multimodal #VideoRetrieval #NLP
Like Comment
To view or add a comment, sign in
Daniel Ihenacho
2w
Report this post
Before fine-tuning any LLM, nail the data format. Too often people skip this step and end up wasting time, compute and “money”. Get the format right first; it pays back immediately. So, pay close attention to your data format — it’s one of the most crucial (yet underrated) aspects of any successful fine-tuning process. 💡 Small effort up front = big savings later. 🔧💸 #AI #MachineLearning #DeepLearning #LLM #NLP #FineTuning #PromptEngineering #DataPreparation #DataEngineering #AIEngineering #MLOps
Like Comment
To view or add a comment, sign in
Riya Bajpai
3w
Report this post
🔍Teaching AI to Understand How We Feel - With Just 1 Percent of the Parameters Deep learning is powerful - but what if we could adapt large language models without retraining them entirely? That’s exactly what I explored in my latest project: fine-tuning BERT for Sentiment Analysis using LoRA (Low-Rank Adaptation). 💡 What I Did Using Hugging Face Transformers + PEFT, I: ✅ Loaded a pre-trained BERT model ✅ Applied LoRA to train only a handful of parameters ✅ Fine-tuned it on human-written movie reviews from the IMDB dataset ✅ Built an end-to-end pipeline to classify text as positive or negative ✅ Saved the model for easy deployment and reuse All this without needing a high-end GPU or full model retraining. 🌱 What I Learned How to bring pre-trained models to life for new tasks How parameter-efficient fine-tuning makes AI more accessible How to train smarter, not larger And most importantly: how AI can start understanding tone and emotion the same way we do. Excited to take this even further - maybe multi-language sentiment? Or sarcasm detection next? 👀 🔗 Github: https://lnkd.in/dgcftSe6 #GenAI #AI #SentimentAnalysis #NLP #LoRA #HuggingFace #MachineLearning #AIProjects #DeepLearning #LearningByDoing #BuildInPublic #WomenInTech
4 Comments
Like Comment
To view or add a comment, sign in
Darshan Patel
1mo
Report this post
𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 𝘃𝘀. 𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴, 𝗪𝗵𝗮𝘁 𝗥𝗲𝗮𝗹𝗹𝘆 𝗦𝗰𝗮𝗹𝗲𝘀 𝗶𝗻 𝗔𝗜? One of the most common debates in modern AI development: Should you fine-tune your model, or just engineer better prompts? Both approaches improve performance, but they solve different problems. ★ 𝗣𝗿𝗼𝗺𝗽𝘁 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 works when: - You’re adapting a general model (like GPT) for new tasks. - You need contextual control without retraining. - You prioritize speed and flexibility. ★ 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗶𝗻𝗴 works when: - You have domain-specific data (e.g., legal, medical, finance). - You need consistent output styles or deeper reasoning. - You can afford compute and versioning overhead. The trade-off: Prompt engineering scales creativity and experimentation, while fine-tuning scales consistency and performance. The real power lies in 𝗛𝘆𝗯𝗿𝗶𝗱 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 fine-tuned models guided by prompt frameworks that add context, constraints, and reasoning structure. ★ In 2025, the best AI teams aren’t choosing between the two, they’re mastering both. #PromptEngineering #FineTuning #LLM #ArtificialIntelligence #AIEngineering #MachineLearning #DeepLearning #NLP #AIModels #DataScience #SoftwareDevelopment #Automations

1 Comment
Like Comment
To view or add a comment, sign in

11,843 followers

10 Posts

View Profile Connect

LinkedIn respects your privacy

Ditching Vectors: A New Approach to RAG with Pure Reasoning

Explore content categories