Skip to content

PaulRoeseler/tournament-reranker

Repository files navigation

Tournament Reranker

An old idea I had lying in the graveyard of my projects — finally dug up and turned into a small, dependency-light Python library for LLM-based reranking using a tournament / pyramid strategy.

You give it a query (or any “target context”), and a list of candidate texts (passages, documents, CVs, product descriptions, etc.). It runs a series of small “mini-rankings” and returns a top-k list.

Tournament / Pyramid reranking process

How the tournament/pyramid works

  • Candidates are interleaved into groups so early rounds mix “high” and “low” base ranks.

  • Each round independently ranks small groups and keeps the top winners_per_group

    • winners are automatically adjusted to avoid dropping below target_k
  • The process repeats until the pool is small, then an optional final rerank produces a clean top-k list.

Key knobs live on pyramid_rerank / pyramid_rerank_async:

  • group_size
  • winners_per_group
  • target_k
  • final_rerank
  • max_rounds

Note: group_size must stay under 100 because passage labels are zero-padded (01..99).

Installation

Install from PyPI:

pip install tournament-reranker

This repo ships with a pyproject.toml and now depends on openai>=2.14.0 by default.

Or using uv:

uv pip install tournament-reranker

You can still bring your own ranker function; the openai dependency is installed automatically.

Quickstart (sync)

from openai import OpenAI
from tournament_reranker import make_openai_chat_ranker, rerank_passages

client = OpenAI()  # needs OPENAI_API_KEY in env
ranker = make_openai_chat_ranker(client, model="gpt-4o-mini")

passages = [
    "Paris is the capital of France.",
    "The Eiffel Tower is in Paris.",
    "Toronto is the capital of Ontario.",
]

ranks = rerank_passages(
    query="Where is the Eiffel Tower?",
    passages=passages,
    ranker=ranker,
    target_k=2,
)

for passage, rank in zip(passages, ranks):
    print(f"rank {rank}: {passage}")

Quickstart (async)

import asyncio
from openai import AsyncOpenAI
from tournament_reranker import make_openai_chat_ranker_async, rerank_passages_async

async def main():
    client = AsyncOpenAI()
    ranker = make_openai_chat_ranker_async(client, model="gpt-4o-mini")
    passages = ["Doc A", "Doc B", "Doc C"]
    ranks = await rerank_passages_async("Pick the best doc", passages, ranker, target_k=2)
    print(list(zip(passages, ranks)))

asyncio.run(main())

More context-dependent example: ranking CVs for a job posting

from openai import OpenAI
from tournament_reranker import make_openai_chat_ranker, rerank_passages

job_posting = """
Senior Backend Engineer (Python)
Must-have:
- 5+ years backend experience
- Strong Python + FastAPI
- Postgres + production SQL
Nice-to-have:
- Kubernetes
- Event-driven systems
"""

cvs = [
    "Candidate A: 7 years Python, built FastAPI services, Postgres tuning, ...",
    "Candidate B: 10 years Java, some Python scripting, ...",
    "Candidate C: 6 years Python, Django, some FastAPI, heavy Kubernetes, ...",
]

metadata = [
    {"candidate_id": "A", "source": "inbox/123"},
    {"candidate_id": "B", "source": "inbox/456"},
    {"candidate_id": "C", "source": "inbox/789"},
]

client = OpenAI()
ranker = make_openai_chat_ranker(client, model="gpt-4o-mini")

ranks = rerank_passages(
    query=f"Rank these CVs for the following job:\n\n{job_posting}\n\nPrefer must-haves over nice-to-haves.",
    passages=cvs,
    metadata=metadata,
    ranker=ranker,
    target_k=2,
)

for cv, meta, rank in zip(cvs, metadata, ranks):
    print(f"rank {rank} -> {meta['candidate_id']}: {cv[:80]}")

What you pass in

  • query: the target context (question, rubric, job posting, policy, spec, etc.)
  • passages: ordered list of candidate strings (from retrieval or any candidate pool)
  • Optional: metadata (same length as passages) to carry ids, sources, URLs, scores, etc.

The reranker returns a list of integer ranks (1 = best), aligned with the order of the passages input.

Plugging in a different LLM/provider

The library is intentionally simple: you can bring your own model call. A Ranker is:

  • input: (query: str, group: list[Chunk])
  • output: a permutation of 0..n-1 (best → worst)

The default prompt numbers passages 00, 01, 02, ... (up to 99 per group) and expects a JSON array of those labels, best to worst.

Example:

from tournament_reranker import Chunk, make_ranking_prompt, parse_ranking_response_to_indices

def my_ranker(query: str, group: list[Chunk]) -> list[int]:
    prompt = make_ranking_prompt(query, group)
    model_output_text = call_your_llm(prompt)  # you implement this
    return parse_ranking_response_to_indices(model_output_text, len(group))

If the ranker output is missing indices or invalid, the reranker raises a ValueError.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages