0% found this document useful (0 votes)
6 views2 pages

4 2论文摘要英文版

This study evaluates four Retrieval-Augmented Generation (RAG) paradigms using various Large Language Models (LLMs) to address limitations in factual accuracy and reasoning. Results indicate that Self-RAG with Llama-3-70b performs best in technical contexts, while Naive RAG is effective for general tasks, and advanced retrieval strategies improve accuracy. The study highlights limitations and suggests future research directions to enhance RAG's reliability for knowledge inquiries.

Uploaded by

luozhange
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views2 pages

4 2论文摘要英文版

This study evaluates four Retrieval-Augmented Generation (RAG) paradigms using various Large Language Models (LLMs) to address limitations in factual accuracy and reasoning. Results indicate that Self-RAG with Llama-3-70b performs best in technical contexts, while Naive RAG is effective for general tasks, and advanced retrieval strategies improve accuracy. The study highlights limitations and suggests future research directions to enhance RAG's reliability for knowledge inquiries.

Uploaded by

luozhange
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Abstract

Recent developments in Large Language Models (LLMs)

have highlighted critical limitations in factual accuracy,

knowledge timeliness, and reasoning groundedness.

Retrieval-Augmented Generation (RAG) systems emerge

as a promising solution by incorporating external

knowledge repositories. This study presents a

comprehensive evaluation of four implemented RAG

paradigms—Naive RAG, Self-RAG, Adaptive RAG, and

Corrective RAG—employing four LLMs: Mixtral-8x7b,

Gemma2-9b, Llama-3-70b, and Qwen-2.5-32b,

supplemented by rigorous testing of diverse retrieval

techniques. Our investigation spans both Closed-Domain

(arXiv research paper) and Open-Domain (Wikipedia)

datasets. The experiments reveal that Self-RAG with

Llama-3-70b achieves superior performance in technical

contexts, while Naive RAG excels in general-domain tasks.

Advanced retrieval strategies, combining hierarchical

summative clustering and hybrid reranking, are shown to

further elevate retrieval accuracy and precision. Lastly, we

acknowledged key limitations of our experiments including

constrained dataset sizes, reliance on automated and


insufficient metrics with potential biases, uniform

embedding approaches, limited LLM and benchmark

diversity, and the exclusion of graph-based structured RAG

modalities. Future research would propose broader dataset

inclusion, refined evaluation frameworks with semantic

focus, diversified embeddings, expanded LLM testing,

highly efficient computing framework, and exploration of

graph-based RAG for multi-step reasoning. We believe

these findings could address LLM limitations—hallucination

and knowledge temporality—advancing RAG’s reliability

for knowledge specific inquiry. Codebase is available in

Github repository:

https://github.com/William-coder/rag_project.git.

Keywords: Natural Language Processing, Retrieval-

Augmented Generation, Large Language Models,

Contextual Retrieval, Question Answering

You might also like