Literature triage pipeline
Hybrid retrieval and rerank over a multi-million document corpus with freshness-aware caching and reproducible eval on held-out queries.
- Public research lab
- Retrieval & eval
- 8 weeks
- Retrieval
- Rerank
- Eval suite
- Caching
Best fit when.
- 01Researchers or auditors need to replay any query against a corpus snapshot and get the same answer.
- 02Your corpus has parts that update daily and parts that are effectively frozen — one re-index policy is wasteful.
- 03Users will not trust a ranking they cannot inspect score by score.
What was happening.
A research lab needed a triage layer over a large literature corpus — researchers asking questions in their own language and getting a ranked, traceable shortlist back. Existing internal tooling was a thin wrapper over a single embedding index that drifted out of date and gave researchers no way to reason about why a result surfaced.
What we were holding to.
- The lab had a strict reproducibility bar: every query had to be replayable with the same answer for the same corpus snapshot.
- Some sub-corpora updated daily; others were effectively frozen. One blanket re-index policy would have been wasteful.
- Researchers needed to understand the ranking, not just trust it.
How we built it.
Hybrid retrieval, then rerank
We combined a sparse lexical index with a dense embedding index, then reranked the top candidates with a cross-encoder. Each stage exposed its score so researchers could see how a result moved through the pipeline.
Freshness-aware index policy
Sub-corpora declared their update cadence. The indexing layer respected that declaration — daily incremental for fast-moving sub-corpora, snapshot-pinned for archival material. Re-index cost dropped sharply without losing freshness where it mattered.
An eval suite the lab could extend
We seeded a held-out query set with researcher-labelled relevance judgements. The eval suite ran on every retrieval-stack change, gating merges. The lab continues to extend the suite — it is a living artifact, not a launch deliverable.
What we left with the client.
- Hybrid retrieval + rerank stack with documented score semantics.
- Freshness-aware indexing policy with per-sub-corpus configuration.
- Reproducible eval suite tied to corpus snapshots, runnable from a single command.
- Maintenance runbook covering index rebuilds, eval drift, and snapshot rotation.