How to Become a Claude Architect
Chapter 6 / 75 min read

Part 6: Context Management & Reliability

This is the smallest domain by weight. But concepts here cascade into Domains 1, 2, and 4. Getting context management wrong breaks your multi-agent systems and extraction pipelines.

01 Context Preservation

The Progressive Summarisation Trap

Condensing conversation history compresses critical details into useless vagueness:

Before SummarisationAfter Summarisation
"Customer wants a refund of $247.83 for order #8891 placed on March 3rd""Customer wants a refund for a recent order"

Fix: extract transactional facts into a persistent "case facts" block. Include it in every prompt. Never summarise it.

CASE FACTS (do not summarise):
- Customer ID: CUS-4421
- Order: #8891, placed 2026-03-03
- Refund requested: $247.83
- Product: Wireless headphones (SKU: WH-200)
- Reason: Defective left ear speaker

The "Lost in the Middle" Effect

Models process the beginning and end of long inputs reliably. Findings buried in the middle may be missed.

Fix: place key findings summaries at the beginning. Use explicit section headers throughout.

Tool Result Trimming

An order lookup returns 40+ fields. You need 5. Trim verbose results to relevant fields BEFORE appending to context. Prevents token budget exhaustion from accumulated irrelevant data.

Upstream Agent Optimisation

Modify agents to return structured data (key facts, citations, relevance scores) instead of verbose content and reasoning chains. Critical when downstream agents have limited context budgets.

02 Escalation and Ambiguity Resolution

Three Valid Escalation Triggers

TriggerAction
Customer explicitly requests a humanHonour immediately. Do NOT attempt to resolve first.
Policy exceptions or gapsEscalate (e.g., competitor price matching when policy only covers own-site)
Inability to make meaningful progressEscalate after exhausting available options

Two Unreliable Triggers (Reject These)

TriggerWhy It Fails
Sentiment-based escalationFrustration does not correlate with case complexity
Self-reported confidence scoresModel is often incorrectly confident on hard cases and uncertain on easy ones

The Frustration Nuance

  • Customer is frustrated but issue is straightforward → acknowledge frustration, offer resolution
  • Customer explicitly says "I want a human" → escalate immediately, no investigation first
  • Customer reiterates preference for human after you offer help → escalate

Ambiguous Customer Matching

Multiple customers match a search query. Ask for additional identifiers (email, phone, order number). Do NOT select based on heuristics (most recent, most active).

03 Error Propagation

Structured Error Context

When propagating errors, include:

  • Failure type (transient, validation, business, permission)
  • What was attempted (specific query, parameters used)
  • Partial results gathered before failure
  • Potential alternative approaches

Two Anti-Patterns

Anti-PatternWhy It Fails
Silent suppressionReturns empty results marked as success. Prevents any recovery.
Workflow terminationKills entire pipeline on single failure. Throws away partial results.

Access Failure vs Valid Empty Result

This is the same distinction from Domain 2, and it matters even more in multi-agent systems:

  • Access failure → consider retry
  • Valid empty result → no retry needed, this IS the answer

Coverage Annotations

Synthesis output should note which findings are well-supported vs which areas have gaps:

"Section on geothermal energy is limited due to unavailable journal access"

This is better than silently omitting the topic entirely.

04 Codebase Exploration and Context Degradation

The Problem

In extended sessions, the model starts referencing "typical patterns" instead of specific classes it discovered earlier. Context fills with verbose discovery output and loses grip on earlier findings.

Mitigation Strategies

StrategyHow It Helps
Scratchpad filesWrite key findings to a file, reference it for subsequent questions
Subagent delegationSpawn subagents for specific investigations, main agent keeps high-level coordination
Summary injectionSummarise findings from one phase before spawning subagents for the next
/compactReduce context usage when filled with verbose discovery output

Crash Recovery

Each agent exports structured state to a known file location (manifest). On resume, the coordinator loads the manifest and injects it into agent prompts.

05 Human Review and Confidence Calibration

The Aggregate Metrics Trap

97% overall accuracy can hide 40% error rates on a specific document type. Always validate accuracy by document type AND field segment before automating.

Stratified Random Sampling

Sample high-confidence extractions for ongoing verification. Detects novel error patterns that would otherwise slip through.

Field-Level Confidence Calibration

  1. Model outputs confidence per field
  2. Calibrate thresholds using labelled validation sets (ground truth data)
  3. Route low-confidence fields to human review
  4. Prioritise limited reviewer capacity on highest-uncertainty items

06 Information Provenance

Structured Claim-Source Mappings

Each finding must include:

  • Claim
  • Source URL
  • Document name
  • Relevant excerpt
  • Publication date

Downstream agents preserve and merge these mappings through synthesis. Without this, attribution dies during summarisation.

Conflict Handling

Two credible sources report different statistics. Do NOT arbitrarily select one. Annotate with both values and source attribution. Let the consumer decide.

Temporal Awareness

Require publication/data collection dates in structured outputs. Different dates explain different numbers — they are not contradictions.

Content-Appropriate Rendering

Content TypeFormat
Financial dataTables
NewsProse
Technical findingsStructured lists

Do not flatten everything into one uniform format.

07 What to Build

Build a coordinator with two subagents:

  • Implement a persistent case facts block
  • Simulate a timeout with structured error propagation
  • Verify the coordinator receives structured error context and proceeds with partial results
  • Test with conflicting sources and verify the synthesis preserves attribution