How to Become a Claude Architect
Chapter 5 / 76 min read

Part 5: Prompt Engineering & Structured Output

Two words will save you across this entire domain: be explicit. "Be conservative" does not improve precision. "Only report high-confidence findings" does not reduce false positives. What works is defining exactly which issues to report versus skip, with concrete code examples for each severity level.

01 Explicit Criteria

The Core Principle

Specific categorical criteria obliterate vague confidence-based instructions.

ApproachResult
"Be conservative"Inconsistent filtering
"Only report high-confidence findings"Arbitrary threshold applied differently each time
"Flag comments only when claimed behaviour contradicts actual code behaviour. Report bugs and security vulnerabilities. Skip minor style preferences."Consistent, precise output

The False Positive Trust Problem

High false positive rates in one category destroy trust in all categories. Developers stop reading your findings entirely.

Fix: temporarily disable high false-positive categories while improving prompts for those categories. This restores trust while you iterate on the problematic areas.

Severity Calibration

Define explicit severity criteria with concrete code examples for each level. Not prose descriptions of severity. Actual code showing what "critical" vs "minor" looks like.

Critical: SQL injection via unsanitised user input
  connection.execute(f"SELECT * FROM users WHERE id = {user_input}")

Minor: Unused import statement
  import os  # never referenced

02 Few-Shot Prompting

Few-shot examples are the most effective technique for consistency. Not more instructions. Not confidence thresholds. Examples.

When to Deploy

  • Detailed instructions alone produce inconsistent formatting
  • Model makes inconsistent judgment calls on ambiguous cases
  • Extraction tasks produce empty/null fields for information that exists in the document

How to Construct

Use 2-4 targeted examples for ambiguous scenarios. Each example must show reasoning for why one action was chosen over plausible alternatives. This teaches generalisation to novel patterns, not just pattern-matching pre-specified cases.

Example 1:
Input: "The function returns null on error"
Classification: Bug (not style)
Reasoning: Null returns without error context violate the project's
error-handling contract. This is a correctness issue, not a style preference.

Example 2:
Input: "Variable named 'x' in a 3-line lambda"
Classification: Skip (style preference)
Reasoning: Short variable names in short lambdas are idiomatic.
Flagging this would be noise.

Hallucination Reduction

Few-shot examples showing correct handling of varied document structures (inline citations vs bibliographies, narrative vs structured tables) dramatically improve extraction quality.

03 Structured Output with tool_use

The Reliability Hierarchy

MethodSyntax ErrorsSemantic Errors
tool_use with JSON schemasEliminatedStill possible
Prompt-based JSONPossibleStill possible

tool_use eliminates syntax errors entirely. But it does NOT prevent:

  • Semantic errors: line items that do not sum to stated total
  • Field placement errors: values in wrong fields
  • Fabrication: model invents values for required fields when source lacks information

tool_choice Configuration

SettingBehaviourUse Case
"auto"Model may return text instead of tool callDefault operation
"any"Must call a tool, chooses whichGuaranteed structured output with unknown doc types
{"type": "tool", "name": "..."}Must call this specific toolForcing mandatory first steps

Schema Design to Prevent Fabrication

TechniquePurpose
Optional/nullable fieldsWhen source may not contain information. Prevents fabrication.
"unclear" enum valueFor ambiguous cases
"other" + freeform detail stringFor extensible categorisation
Format normalisation rulesIn prompts alongside strict schemas

04 Validation-Retry Loops

Retry with Error Feedback

Send back: original document + failed extraction + specific validation error. The model uses the error to self-correct.

When Retries Work (and When They Do Not)

Effective ForIneffective For
Format mismatchesInformation genuinely absent from source
Structural output errors
Misplaced values

The exam presents both scenarios. You must identify which is fixable by retry.

Self-Correction Patterns

  • detected_pattern fields: track which code construct triggered the finding. Enables analysis of dismissal patterns. Improves prompts over time.
  • calculated_total alongside stated_total: flag discrepancies automatically.
  • conflict_detected booleans: for inconsistent source data.

Practice Scenario

An extraction tool produces JSON with line items totalling $1,247.83 but the stated_total field says $1,347.83. A validation-retry loop catches this: it sends back the extraction with the specific discrepancy, and the model self-corrects by re-reading the source document.

05 Batch Processing

Message Batches API Constraints

FeatureDetail
Cost savings50%
Processing windowUp to 24 hours
Latency SLANone
Multi-turn tool callingNot supported
Request correlationcustom_id field

The Matching Rule

APIUse For
SynchronousBlocking workflows — pre-merge checks, anything developers wait for
BatchLatency-tolerant — overnight reports, weekly audits, nightly test generation

The exam presents a manager proposing batch for everything. The correct answer keeps blocking workflows synchronous.

Batch Failure Handling

  1. Identify failed documents by custom_id
  2. Resubmit only failures with modifications (e.g., chunking oversized documents)
  3. Refine prompts on a sample set BEFORE batch processing to maximise first-pass success

06 Multi-Instance Review

The Self-Review Limitation

A model reviewing its own output in the same session retains reasoning context. It is less likely to question its own decisions. An independent instance without prior context catches more subtle issues.

Multi-Pass Architecture

PassPurpose
Per-file local analysisConsistent depth per file
Cross-file integration passCatches data flow issues across files

This prevents attention dilution and contradictory findings.

Confidence-Based Routing

  1. Model self-reports confidence per finding
  2. Route low-confidence findings to human review
  3. Calibrate confidence thresholds using labelled validation sets
  4. Prioritise limited reviewer capacity on highest-uncertainty items

07 What to Build

Create an extraction pipeline:

  • Define a tool with JSON schema (required, optional, nullable fields, enums with "other")
  • Implement a validation-retry loop
  • Process 10 documents with varied formats
  • Add few-shot examples and compare before/after extraction quality
  • Run a batch through the Batches API
  • Handle failures by custom_id