ARCHITECTURE

Lost in the middle, found by structure.

Long-context language models and retrieval pipelines share a documented failure mode: facts buried mid-context get missed. Smarter retrieval slows it. More context makes it worse. For an agent about to act on what it just read, missing a fact is not a slightly weaker answer. It is the wrong action. Limma sidesteps the problem by not asking the model to find the fact at decision time.

A KNOWN FAILURE MODE

Reading is not reasoning.

Researchers have documented a U-shaped attention bias in language models reading long contexts: relevant facts at the very start and very end of the context are retrieved reliably. Facts in the middle are not. The bigger the context, the more pronounced the drop.

SAME FACT, THREE POSITIONS IN A LONG CONTEXT

POSITION 1 OF 1,500

total_assets = $3,383,247,000

page 4

RETRIEVED

Attention is high. Fact retrieved reliably.

POSITION 750 OF 1,500

total_assets = $3,383,247,000

page 750

MISSED

Attention drops. Fact gets missed.

POSITION 1,500 OF 1,500

total_assets = $3,383,247,000

page 1,496

RETRIEVED

Attention recovers. Fact retrieved reliably.

Liu et al., 2023. Same fact, same wording, same model. Retrieval accuracy depends on where in the window the fact sits. The pattern holds across models and across context lengths.

HOW LIMMA SIDESTEPS IT

Do not ask the model to find the fact at query time.

Every fact is extracted into the workspace at ingest, indexed by entity, metric, and period. Reasoning operates on the workspace, not on the context window. Position in the source has no bearing on retrieval. A fact on page 47 of a 1,500-page binder has the same retrieval cost as a fact on page 4.

THE BINDER

~1,500 pages across 3 documents

p.1cover letter

......

p.4710-K: total_assets = $3,383,247,000

......

p.412MD&A narrative

......

p.812supplemental: total_assets = $3,418,902,000

......

p.1496signatures

EXTRACT

THE WORKSPACE

extracted records, indexed by structure

[1]total_assets: $3,383,247,00010-K p.47

[2]total_assets: $3,418,902,000supplemental p.812

[3]total_liabilities: $1,960,228,00010-K p.47

[4]total_equity: $1,423,019,00010-K p.47

[...]...thousands more, equally retrievable...

CONFLICT[1] paired with [2]

surfaced mechanically. attention not required.

PARALLEL CATEGORIES

Different work, different architecture.

The decision is not which one is better. They are aimed at different work. The same organization will use both. RAG for the parts of the workflow where retrieval is the value. Limma for the parts where provability is the value.

USE RAG WHEN

01The user is asking a question whose answer is naturally a paragraph
02The source material is largely unstructured prose
03The user will read the answer and apply their own judgment
04Subtle numerical or structural errors are tolerable

USE LIMMA WHEN

01The output is a number, a model, a memo, a redline, or anything mechanically checkable
02The cost of an undetected error is high (regulatory, financial, legal)
03The work spans multiple sources that must be reconciled with each other
04A human downstream will defend the output to an auditor, regulator, or counterparty

Stop asking the model to find the fact.

For agents that act on what they just read, missing a mid-context fact is not a slightly weaker answer. It is the wrong action. Bring the corpus, the agent, and the action. We will show you what the verification step catches that the long-context read does not.

For AI builders ↗Talk to founders Cross-source contradictions ↗← Back to home