ARCHITECTURE

Lost in the middle, found by structure.

Long-context language models and retrieval pipelines share a documented failure mode: facts buried mid-context get missed. Smarter retrieval slows it. More context makes it worse. For an agent about to act on what it just read, missing a fact is not a slightly weaker answer. It is the wrong action. Limma sidesteps the problem by not asking the model to find the fact at decision time.

A KNOWN FAILURE MODE

Reading is not reasoning.

Researchers have documented a U-shaped attention bias in language models reading long contexts: relevant facts at the very start and very end of the context are retrieved reliably. Facts in the middle are not. The bigger the context, the more pronounced the drop.

SAME FACT, THREE POSITIONS IN A LONG CONTEXT
POSITION 1 OF 1,500
total_assets = $3,383,247,000
page 4
RETRIEVED
Attention is high. Fact retrieved reliably.
POSITION 750 OF 1,500
total_assets = $3,383,247,000
page 750
MISSED
Attention drops. Fact gets missed.
POSITION 1,500 OF 1,500
total_assets = $3,383,247,000
page 1,496
RETRIEVED
Attention recovers. Fact retrieved reliably.
Liu et al., 2023. Same fact, same wording, same model. Retrieval accuracy depends on where in the window the fact sits. The pattern holds across models and across context lengths.
HOW LIMMA SIDESTEPS IT

Do not ask the model to find the fact at query time.

Every fact is extracted into the workspace at ingest, indexed by entity, metric, and period. Reasoning operates on the workspace, not on the context window. Position in the source has no bearing on retrieval. A fact on page 47 of a 1,500-page binder has the same retrieval cost as a fact on page 4.

THE BINDER
~1,500 pages across 3 documents
p.1cover letter
......
p.4710-K: total_assets = $3,383,247,000
......
p.412MD&A narrative
......
p.812supplemental: total_assets = $3,418,902,000
......
p.1496signatures
THE WORKSPACE
extracted records, indexed by structure
[1]total_assets: $3,383,247,00010-K p.47
[2]total_assets: $3,418,902,000supplemental p.812
[3]total_liabilities: $1,960,228,00010-K p.47
[4]total_equity: $1,423,019,00010-K p.47
[...]...thousands more, equally retrievable...
CONFLICT[1] paired with [2]
surfaced mechanically. attention not required.
PARALLEL CATEGORIES

Different work, different architecture.

The decision is not which one is better. They are aimed at different work. The same organization will use both. RAG for the parts of the workflow where retrieval is the value. Limma for the parts where provability is the value.

USE RAG WHEN
  • 01The user is asking a question whose answer is naturally a paragraph
  • 02The source material is largely unstructured prose
  • 03The user will read the answer and apply their own judgment
  • 04Subtle numerical or structural errors are tolerable
USE LIMMA WHEN
  • 01The output is a number, a model, a memo, a redline, or anything mechanically checkable
  • 02The cost of an undetected error is high (regulatory, financial, legal)
  • 03The work spans multiple sources that must be reconciled with each other
  • 04A human downstream will defend the output to an auditor, regulator, or counterparty

Stop asking the model to find the fact.

For agents that act on what they just read, missing a mid-context fact is not a slightly weaker answer. It is the wrong action. Bring the corpus, the agent, and the action. We will show you what the verification step catches that the long-context read does not.