All engagementsEngagement note

Literature triage pipeline

Hybrid retrieval and rerank over a multi-million document corpus with freshness-aware caching and reproducible eval on held-out queries.

Public research lab
Retrieval & eval
8 weeks

Duration8 weeks

Team1 senior engineer + the lab's research engineering lead

HandoverResearch engineering, with a written maintenance runbook

Disciplines

Retrieval
Rerank
Eval suite
Caching

Decide

Best fit when.

01Researchers or auditors need to replay any query against a corpus snapshot and get the same answer.
02Your corpus has parts that update daily and parts that are effectively frozen — one re-index policy is wasteful.
03Users will not trust a ranking they cannot inspect score by score.

Context

What was happening.

A research lab needed a triage layer over a large literature corpus — researchers asking questions in their own language and getting a ranked, traceable shortlist back. Existing internal tooling was a thin wrapper over a single embedding index that drifted out of date and gave researchers no way to reason about why a result surfaced.

Constraints

What we were holding to.

The lab had a strict reproducibility bar: every query had to be replayable with the same answer for the same corpus snapshot.
Some sub-corpora updated daily; others were effectively frozen. One blanket re-index policy would have been wasteful.
Researchers needed to understand the ranking, not just trust it.

Approach

How we built it.

Hybrid retrieval, then rerank

We combined a sparse lexical index with a dense embedding index, then reranked the top candidates with a cross-encoder. Each stage exposed its score so researchers could see how a result moved through the pipeline.

Freshness-aware index policy

Sub-corpora declared their update cadence. The indexing layer respected that declaration — daily incremental for fast-moving sub-corpora, snapshot-pinned for archival material. Re-index cost dropped sharply without losing freshness where it mattered.

An eval suite the lab could extend

We seeded a held-out query set with researcher-labelled relevance judgements. The eval suite ran on every retrieval-stack change, gating merges. The lab continues to extend the suite — it is a living artifact, not a launch deliverable.

Handover

What we left with the client.

Hybrid retrieval + rerank stack with documented score semantics.
Freshness-aware indexing policy with per-sub-corpus configuration.
Reproducible eval suite tied to corpus snapshots, runnable from a single command.
Maintenance runbook covering index rebuilds, eval drift, and snapshot rotation.