This is the smallest domain by weight. But concepts here cascade into Domains 1, 2, and 4. Getting context management wrong breaks your multi-agent systems and extraction pipelines.
01 Context Preservation
The Progressive Summarisation Trap
Condensing conversation history compresses critical details into useless vagueness:
| Before Summarisation | After Summarisation |
|---|---|
| "Customer wants a refund of $247.83 for order #8891 placed on March 3rd" | "Customer wants a refund for a recent order" |
Fix: extract transactional facts into a persistent "case facts" block. Include it in every prompt. Never summarise it.
CASE FACTS (do not summarise):
- Customer ID: CUS-4421
- Order: #8891, placed 2026-03-03
- Refund requested: $247.83
- Product: Wireless headphones (SKU: WH-200)
- Reason: Defective left ear speaker
The "Lost in the Middle" Effect
Models process the beginning and end of long inputs reliably. Findings buried in the middle may be missed.
Fix: place key findings summaries at the beginning. Use explicit section headers throughout.
Tool Result Trimming
An order lookup returns 40+ fields. You need 5. Trim verbose results to relevant fields BEFORE appending to context. Prevents token budget exhaustion from accumulated irrelevant data.
Upstream Agent Optimisation
Modify agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains. Critical when downstream agents have limited context budgets.
02 Escalation and Ambiguity Resolution
Three Valid Escalation Triggers
| Trigger | Action |
|---|---|
| Customer explicitly requests a human | Honour immediately. Do NOT attempt to resolve first. |
| Policy exceptions or gaps | Escalate (e.g., competitor price matching when policy only covers own-site) |
| Inability to make meaningful progress | Escalate after exhausting available options |
Two Unreliable Triggers (Reject These)
| Trigger | Why It Fails |
|---|---|
| Sentiment-based escalation | Frustration does not correlate with case complexity |
| Self-reported confidence scores | Model is often incorrectly confident on hard cases and uncertain on easy ones |
The Frustration Nuance
- Customer is frustrated but issue is straightforward → acknowledge frustration, offer resolution
- Customer explicitly says "I want a human" → escalate immediately, no investigation first
- Customer reiterates preference for human after you offer help → escalate
Ambiguous Customer Matching
Multiple customers match a search query. Ask for additional identifiers (email, phone, order number). Do NOT select based on heuristics (most recent, most active).
03 Error Propagation
Structured Error Context
When propagating errors, include:
- Failure type (transient, validation, business, permission)
- What was attempted (specific query, parameters used)
- Partial results gathered before failure
- Potential alternative approaches
Two Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|---|
| Silent suppression | Returns empty results marked as success. Prevents any recovery. |
| Workflow termination | Kills entire pipeline on single failure. Throws away partial results. |
Access Failure vs Valid Empty Result
This is the same distinction from Domain 2, and it matters even more in multi-agent systems:
- Access failure → consider retry
- Valid empty result → no retry needed, this IS the answer
Coverage Annotations
Synthesis output should note which findings are well-supported vs which areas have gaps:
"Section on geothermal energy is limited due to unavailable journal access"
This is better than silently omitting the topic entirely.
04 Codebase Exploration and Context Degradation
The Problem
In extended sessions, the model starts referencing "typical patterns" instead of specific classes it discovered earlier. Context fills with verbose discovery output and loses grip on earlier findings.
Mitigation Strategies
| Strategy | How It Helps |
|---|---|
| Scratchpad files | Write key findings to a file, reference it for subsequent questions |
| Subagent delegation | Spawn subagents for specific investigations, main agent keeps high-level coordination |
| Summary injection | Summarise findings from one phase before spawning subagents for the next |
/compact | Reduce context usage when filled with verbose discovery output |
Crash Recovery
Each agent exports structured state to a known file location (manifest). On resume, the coordinator loads the manifest and injects it into agent prompts.
05 Human Review and Confidence Calibration
The Aggregate Metrics Trap
97% overall accuracy can hide 40% error rates on a specific document type. Always validate accuracy by document type AND field segment before automating.
Stratified Random Sampling
Sample high-confidence extractions for ongoing verification. Detects novel error patterns that would otherwise slip through.
Field-Level Confidence Calibration
- Model outputs confidence per field
- Calibrate thresholds using labelled validation sets (ground truth data)
- Route low-confidence fields to human review
- Prioritise limited reviewer capacity on highest-uncertainty items
06 Information Provenance
Structured Claim-Source Mappings
Each finding must include:
- Claim
- Source URL
- Document name
- Relevant excerpt
- Publication date
Downstream agents preserve and merge these mappings through synthesis. Without this, attribution dies during summarisation.
Conflict Handling
Two credible sources report different statistics. Do NOT arbitrarily select one. Annotate with both values and source attribution. Let the consumer decide.
Temporal Awareness
Require publication/data collection dates in structured outputs. Different dates explain different numbers — they are not contradictions.
Content-Appropriate Rendering
| Content Type | Format |
|---|---|
| Financial data | Tables |
| News | Prose |
| Technical findings | Structured lists |
Do not flatten everything into one uniform format.
07 What to Build
Build a coordinator with two subagents:
- Implement a persistent case facts block
- Simulate a timeout with structured error propagation
- Verify the coordinator receives structured error context and proceeds with partial results
- Test with conflicting sources and verify the synthesis preserves attribution