Skip to content
LlamaIndex
BatchEvalRunner - Running Multiple Evaluations
Home
Learn
Use Cases
Examples
Component Guides
Advanced Topics
API Reference
Open-Source Community
LlamaCloud
LlamaIndex
Home
Learn
Use Cases
Examples
Examples
Agents
Chat Engines
Cookbooks
Customization
Data Connectors
Discover LlamaIndex
Docstores
Embeddings
Evaluation
Evaluation
AIMon
BEIR Out of Domain Benchmark
Trustworthy RAG with LlamaIndex and Cleanlab
🚀 RAG/LLM Evaluators - DeepEval
HotpotQADistractor Demo
QuestionGeneration
RAGChecker: A Fine-grained Evaluation Framework For Diagnosing RAG
Self Correcting Query Engines - Evaluation & Retry
Tonic Validate Evaluators
How to use UpTrain with LlamaIndex
Answer Relevancy and Context Relevancy Evaluations
BatchEvalRunner - Running Multiple Evaluations
BatchEvalRunner - Running Multiple Evaluations
Table of contents
Setup
Question Generation
Running Batch Evaluation
Inspecting Outputs
Reporting Total Scores
Correctness Evaluator
Faithfulness Evaluator
Guideline Evaluator
Benchmarking LLM Evaluators On The MT-Bench Human Judgement
Benchmarking LLM Evaluators On A Mini MT-Bench (Single Grading)
Evaluating Multi-Modal RAG
Pairwise Evaluator
Evaluation using Prometheus model
Relevancy Evaluator
Retrieval Evaluation
Embedding Similarity Evaluator
🏔️ Step-back prompting with workflows for RAG with Argilla
Finetuning
Ingestion
LLMs
Llama Datasets
Llama Hub
Low Level
Managed Indexes
Memory
Metadata Extractors
Multi-Modal
Multi-Tenancy
Node Parsers & Text Splitters
Node Postprocessors
Object Stores
Observability
Output Parsers
Param Optimizer
Prompts
Property Graph
Query Engines
Query Pipeline
Query Transformations
Response Synthesizers
Retrievers
Tools
Transforms
Use Cases
Vector Stores
Workflow
Component Guides
Advanced Topics
API Reference
Open-Source Community
LlamaCloud
Table of contents
Setup
Question Generation
Running Batch Evaluation
Inspecting Outputs
Reporting Total Scores