Bulletproof Structures: Pydantic, Zod, and the Underlying Duel with LLM Native Structured Outputs
(Article 49: The Shield of Agent Protocols)
In previous chapters, we discussed how to "punish" an Agent if it grabs the wrong JSON. However, this is essentially a "post-mortem remedy." In the latest contemporary Agent architectures, the solution to this pain point has undergone a monumental leap: From "hoping the model generates well and then parsing," it has evolved to "holding a gun to the model's head at the Generation Level, forcing it into a bulletproof structure."
This chapter will deeply explore how to use strongly-typed Schemas to construct the logical boundaries of an Agent, and analyze how Native Structured Output physically eliminates JSON anomalies.
0. Validation is Not One Layer, It's Two: Generation-Time Constraints + Execution-Time Gateways
Many systems only do one thing: make the model output "look like JSON." This guarantees you will eventually step on two landmines in production:
- The JSON structure is correct, but the semantics are dangerous (e.g., path traversal, command injection).
- The schema is strict, but when faced with refusals/truncations, your program lacks a failure path.
The correct approach involves two layers of validation:
| Layer | Objective | Representative Mechanism | What It Cannot Replace |
|---|---|---|---|
| Generation Constraints | Maximize legal structure output | Structured outputs / constrained decoding | Privilege isolation |
| Execution Gateways | Turn side effects into controllable commits | Allowlist/Permissions/Timeouts/Idempotency/Auditing | N/A |
The schema discussed in this article represents the protocol boundary, NOT the security boundary.
1. The Evolution of Constraints: From "Praying" to "Iron Laws"
In the history of Agent protocol development, we have experienced four stages:
- The Praying Stage: Adding a line like "Please return JSON" to the Prompt. Failure rate: 30%+.
- The Regex Stage: Violently extracting code blocks from Markdown using regular expressions. Failure rate: 15%.
- The Post-Validation Stage: Introducing Pydantic/Zod for parsing; if it fails, throw the error back to the model and ask for a retry. Failure rate: 5% (consumes a massive amount of Tokens).
- The Constrained Decoding Stage: Controlling the Token exit gate through underlying compute power. Failure rate: 0%.
2. Pydantic / Zod: Single Source of Truth
In industrial-grade Agent development, the Schema is first and foremost strongly-typed code; only secondarily is it translated for the LLM. This Single Source of Truth ensures that your documentation and code logic are forever synchronized.
2.1 [Code Battle] Building a Bulletproof Shell Execution Schema
Let's define a safe Shell tool in Python using Pydantic:
from pydantic import BaseModel, Field, validator
import re
class ShellAction(BaseModel):
"""
Defines a secure Shell execution contract.
The model must strictly adhere to this structure; otherwise, it will be blocked during the generation phase.
"""
command: str = Field(..., description="The shell command to execute")
working_dir: str = Field(default="/tmp/workspace", description="Execution directory, MUST be an absolute path")
timeout_seconds: int = Field(default=30, ge=1, le=300)
@validator("command")
def prevent_suicidal_commands(cls, v):
# Even if the model intends to do evil, this layer of code validation is the final physical checkpoint
forbidden = ["rm -rf /", "mkfs", "dd"]
if any(f in v for f in forbidden):
raise ValueError("Destructive system command detected; execution has been blocked by the underlying kernel!")
return v
When you feed this Pydantic model to a large model as a tool, it is translated into a set of JSON Schemas. The large model sees more than just the word "command"; it sees a strict set of laws encompassing types, ranges, and regex constraints.
2.2 First-Request Latency and Schema Caching: You Must Observe It
Strict schemas often require extra processing upon their first use (e.g., caching/compiling artifacts), which introduces first-request latency. If you do not monitor this, once deployed, you will misdiagnose "sluggishness" as "the model getting dumber."
It is recommended to record at least the following:
| Field | Meaning |
|---|---|
schema_id |
Schema version/hash |
schema_cached |
Cache hit status |
schema_compile_ms |
Processing time for the first request |
validation_errors |
Validation failure count |
Only when these fields enter trace/spans and audit logs can they support long-term stable operation.
3. Core Inside Story: The Physical Principles of Native Structured Output
When we call GPT-4o or Gemini 1.5 with response_format: { "type": "json_schema" }, what exactly happens during the API's decoding phase?
3.1 Constrained Decoding Technology
This is what is known as Logit Bias / Grammar Guiding.
- Probability Matrix Prediction: When the model has generated the 7 characters
{"age":. - Dynamic Masking: The large model might ordinarily predict the next Token to be
"twenty",(space),25, or[(left bracket). - Physical Intervention: Because your Schema dictates that
agemust be aninteger, the underlying decoder of the API instantly and forcibly corrects the probabilities of all non-numeric symbol Tokens to 0. - Guaranteed Result: The next Token the model "spits" out at the physical layer can only and must be a number.
Conclusion: Under Native mode, you no longer need to worry about a JSON having an extra comma or a missing bracket. Because illegal brackets simply cannot pass through the prediction matrix at that specific step.
4. Post-Feedback Loop: The Fallback for Non-Native Models
If you are using a local Llama 3 or Qwen and are not utilizing llama.cpp's Grammar control features, then you must implement a highly robust Validation-Feedback Pipeline.
async def orchestrate_agent(task):
context = [{"role": "system", "content": "You are a JSON machine."}]
for _ in range(3): # Maximum 3 retry opportunities
response = await llm.chat(context)
try:
# Attempt strict Pydantic parsing
action = ShellAction.parse_raw(response.content)
return await execute_physical(action)
except ValidationError as e:
# EXTREMELY CRITICAL: Format the structured Pydantic error message
error_feedback = f"""
[VALIDATION_ERROR] The JSON you generated does not comply with the Schema contract:
{e.json()}
Please check:
- timeout_seconds must be an integer.
- Forbidden commands cannot be included.
Please attempt to modify your output again.
"""
context.append({"role": "user", "content": error_feedback})
continue
Geek Tip: Don't just toss the generic Exception back. Toss Pydantic's e.json() back, because it contains the exact Keypaths and reasons for the error. The large model's error correction efficiency will increase by an order of magnitude.
6. Failure Paths: Refusals, Truncations, Partial Outputs
Even strict structured outputs can fail. Common edge cases include:
- Refusals: The model refuses to answer due to safety policies.
- Truncations: Hitting
max_tokensleaves the JSON unclosed. - Partial Outputs: Only partial fields are output.
Engineering must explicitly define how to handle these:
- Before entering a retry, first determine if side effects will occur (an idempotency key MUST be bound).
- Retries must have a maximum limit and backoff to avoid retry storms.
- Failures must be written to audit logs and trace/spans to review why degradation occurred.
7. Circuit Breakers and Idempotency: Validation Failure Does Not Equal Infinite Retries
Validation failure is the norm, not an exception. But "infinite retries" is an incident manufacturing machine, especially when tools produce side effects.
Minimum governance recommendations:
- Circuit Breakers: If the same type of validation error occurs 3 consecutive times, stop write-type tools and enter read-only diagnostics (shadow mode).
- Idempotency: Any tool call with side effects must be bound to an
idempotency_key; otherwise, a retry is a duplicate commit. - Auditing: Record
schema_id, error categories, retry counts, and the final degradation strategy to support post-mortems.
8. Minimum Acceptance Checklist: How Do You Prove "Structured Output" is Truly Stable?
Write the following checklist into your CI or regression tests; do not rely solely on the naked eye:
| Check Item | Expectation |
|---|---|
| Schema Subset Usage | Does not use unsupported JSON Schema features |
| Refusal Path | Does not enter tool execution upon refusal |
| Truncation Path | max_tokens truncation does not result in half-executed side effects |
| Retry Cap | Validation failures have max attempts and backoffs |
| Idempotency Coverage | Tools with side effects must possess an idempotency_key |
| Observation Fields | schema_cached/compile_ms/error_count are recorded |
5. Extreme Challenge: Handling "Dirty" Data in Streams
When the Agent performs large-scale file reads and writes, the JSON can become massive. If you want to intercept it before the tokens finish generating (for PII filtering or illegal path monitoring), you need a Streaming JSON Parser.
It maintains a Stack Architecture:
- Pushes to the stack upon encountering
{. - Pops from the stack upon encountering
}. - Monitors in real-time which
keyis currently active. Once the current Key ispathand its Value string begins to match a sensitive directory, the interceptor can instantly sever the TCP connection, achieving physical-level real-time explosion prevention.
Chapter Summary
- The Schema is the Agent's "Worldview": An Agent without Schema constraints is like a soulless runaway wild horse.
- Native is the First Choice: If you are purchasing commercial APIs, be sure to enable
strict: true; this can save you 20% in retry Token costs. - Precise Feedback: Error feedback is not just a string; it is a "logical patch" complete with paths.
Once you have learned to arm your Agent with structured outputs, it evolves from an "occasionally schizophrenic little assistant" into an emotionless, never-overstepping, high-precision CNC machine.
In the next chapter, we will discuss an extremely thorny practical problem: [Violent Instruction Extraction from Bare Text: When the Model Refuses to Follow JSON Formatting, How Do We Use Regex and Lenient Parsing Techniques for Soft Interception?]. We are about to write that "not-so-elegant but life-saving" code.
(End of text - Deep Dive Series 15 / Approx. 1600 words)
(Note: It is recommended to run pydantic.BaseModel.schema_json() in your IDE to observe how code transforms into machine-readable contracts.)