Part 6: Context Management & Reliability

This is the smallest domain by weight. But concepts here cascade into Domains 1, 2, and 4. Getting context management wrong breaks your multi-agent systems and extraction pipelines.

01 Context Preservation

The Progressive Summarisation Trap

Condensing conversation history compresses critical details into useless vagueness:

Before Summarisation	After Summarisation
"Customer wants a refund of $247.83 for order #8891 placed on March 3rd"	"Customer wants a refund for a recent order"

Fix: extract transactional facts into a persistent "case facts" block. Include it in every prompt. Never summarise it.

CASE FACTS (do not summarise):
- Customer ID: CUS-4421
- Order: #8891, placed 2026-03-03
- Refund requested: $247.83
- Product: Wireless headphones (SKU: WH-200)
- Reason: Defective left ear speaker

The "Lost in the Middle" Effect

Models process the beginning and end of long inputs reliably. Findings buried in the middle may be missed.

Fix: place key findings summaries at the beginning. Use explicit section headers throughout.

Tool Result Trimming

An order lookup returns 40+ fields. You need 5. Trim verbose results to relevant fields BEFORE appending to context. Prevents token budget exhaustion from accumulated irrelevant data.

Upstream Agent Optimisation

Modify agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains. Critical when downstream agents have limited context budgets.

02 Escalation and Ambiguity Resolution

Three Valid Escalation Triggers

Trigger	Action
Customer explicitly requests a human	Honour immediately. Do NOT attempt to resolve first.
Policy exceptions or gaps	Escalate (e.g., competitor price matching when policy only covers own-site)
Inability to make meaningful progress	Escalate after exhausting available options

Two Unreliable Triggers (Reject These)

Trigger	Why It Fails
Sentiment-based escalation	Frustration does not correlate with case complexity
Self-reported confidence scores	Model is often incorrectly confident on hard cases and uncertain on easy ones

The Frustration Nuance

Customer is frustrated but issue is straightforward → acknowledge frustration, offer resolution
Customer explicitly says "I want a human" → escalate immediately, no investigation first
Customer reiterates preference for human after you offer help → escalate

Ambiguous Customer Matching

Multiple customers match a search query. Ask for additional identifiers (email, phone, order number). Do NOT select based on heuristics (most recent, most active).

03 Error Propagation

Structured Error Context

When propagating errors, include:

Failure type (transient, validation, business, permission)
What was attempted (specific query, parameters used)
Partial results gathered before failure
Potential alternative approaches

Two Anti-Patterns

Anti-Pattern	Why It Fails
Silent suppression	Returns empty results marked as success. Prevents any recovery.
Workflow termination	Kills entire pipeline on single failure. Throws away partial results.

Access Failure vs Valid Empty Result

This is the same distinction from Domain 2, and it matters even more in multi-agent systems:

Access failure → consider retry
Valid empty result → no retry needed, this IS the answer

Coverage Annotations

Synthesis output should note which findings are well-supported vs which areas have gaps:

"Section on geothermal energy is limited due to unavailable journal access"

This is better than silently omitting the topic entirely.

04 Codebase Exploration and Context Degradation

In extended sessions, the model starts referencing "typical patterns" instead of specific classes it discovered earlier. Context fills with verbose discovery output and loses grip on earlier findings.

Mitigation Strategies

Strategy	How It Helps
Scratchpad files	Write key findings to a file, reference it for subsequent questions
Subagent delegation	Spawn subagents for specific investigations, main agent keeps high-level coordination
Summary injection	Summarise findings from one phase before spawning subagents for the next
`/compact`	Reduce context usage when filled with verbose discovery output

Crash Recovery

Each agent exports structured state to a known file location (manifest). On resume, the coordinator loads the manifest and injects it into agent prompts.

05 Human Review and Confidence Calibration

The Aggregate Metrics Trap

97% overall accuracy can hide 40% error rates on a specific document type. Always validate accuracy by document type AND field segment before automating.

Stratified Random Sampling

Sample high-confidence extractions for ongoing verification. Detects novel error patterns that would otherwise slip through.

Field-Level Confidence Calibration

Model outputs confidence per field
Calibrate thresholds using labelled validation sets (ground truth data)
Route low-confidence fields to human review
Prioritise limited reviewer capacity on highest-uncertainty items

06 Information Provenance

Structured Claim-Source Mappings

Each finding must include:

Claim
Source URL
Document name
Relevant excerpt
Publication date

Downstream agents preserve and merge these mappings through synthesis. Without this, attribution dies during summarisation.

Content Type	Format
Financial data	Tables
News	Prose
Technical findings	Structured lists

Do not flatten everything into one uniform format.

07 What to Build

Build a coordinator with two subagents:

Implement a persistent case facts block
Simulate a timeout with structured error propagation
Verify the coordinator receives structured error context and proceeds with partial results
Test with conflicting sources and verify the synthesis preserves attribution