Skip to content
All engagementsEngagement note

Literature triage pipeline

Hybrid retrieval and rerank over a multi-million document corpus with freshness-aware caching and reproducible eval on held-out queries.

  • Public research lab
  • Retrieval & eval
  • 8 weeks
Duration8 weeks
Team1 senior engineer + the lab's research engineering lead
HandoverResearch engineering, with a written maintenance runbook
Disciplines
  • Retrieval
  • Rerank
  • Eval suite
  • Caching
Decide

Best fit when.

  • 01Researchers or auditors need to replay any query against a corpus snapshot and get the same answer.
  • 02Your corpus has parts that update daily and parts that are effectively frozen — one re-index policy is wasteful.
  • 03Users will not trust a ranking they cannot inspect score by score.
Context

What was happening.

A research lab needed a triage layer over a large literature corpus — researchers asking questions in their own language and getting a ranked, traceable shortlist back. Existing internal tooling was a thin wrapper over a single embedding index that drifted out of date and gave researchers no way to reason about why a result surfaced.

Constraints

What we were holding to.

  • The lab had a strict reproducibility bar: every query had to be replayable with the same answer for the same corpus snapshot.
  • Some sub-corpora updated daily; others were effectively frozen. One blanket re-index policy would have been wasteful.
  • Researchers needed to understand the ranking, not just trust it.
Approach

How we built it.

Hybrid retrieval, then rerank

We combined a sparse lexical index with a dense embedding index, then reranked the top candidates with a cross-encoder. Each stage exposed its score so researchers could see how a result moved through the pipeline.

Freshness-aware index policy

Sub-corpora declared their update cadence. The indexing layer respected that declaration — daily incremental for fast-moving sub-corpora, snapshot-pinned for archival material. Re-index cost dropped sharply without losing freshness where it mattered.

An eval suite the lab could extend

We seeded a held-out query set with researcher-labelled relevance judgements. The eval suite ran on every retrieval-stack change, gating merges. The lab continues to extend the suite — it is a living artifact, not a launch deliverable.

Handover

What we left with the client.

  • Hybrid retrieval + rerank stack with documented score semantics.
  • Freshness-aware indexing policy with per-sub-corpus configuration.
  • Reproducible eval suite tied to corpus snapshots, runnable from a single command.
  • Maintenance runbook covering index rebuilds, eval drift, and snapshot rotation.