正在切换页面...

Bulletproof Structures: Pydantic, Zod, and the Underlying Duel with LLM Native Structured Outputs

hardJSON SchemaValidationStructured OutputsPydanticUpdated

(Article 49: The Shield of Agent Protocols)

In previous chapters, we discussed how to "punish" an Agent if it grabs the wrong JSON. However, this is essentially a "post-mortem remedy." In the latest contemporary Agent architectures, the solution to this pain point has undergone a monumental leap: From "hoping the model generates well and then parsing," it has evolved to "holding a gun to the model's head at the Generation Level, forcing it into a bulletproof structure."

This chapter will deeply explore how to use strongly-typed Schemas to construct the logical boundaries of an Agent, and analyze how Native Structured Output physically eliminates JSON anomalies.

0. Validation is Not One Layer, It's Two: Generation-Time Constraints + Execution-Time Gateways

Many systems only do one thing: make the model output "look like JSON." This guarantees you will eventually step on two landmines in production:

The JSON structure is correct, but the semantics are dangerous (e.g., path traversal, command injection).
The schema is strict, but when faced with refusals/truncations, your program lacks a failure path.

The correct approach involves two layers of validation:

Layer	Objective	Representative Mechanism	What It Cannot Replace
Generation Constraints	Maximize legal structure output	Structured outputs / constrained decoding	Privilege isolation
Execution Gateways	Turn side effects into controllable commits	Allowlist/Permissions/Timeouts/Idempotency/Auditing	N/A

The schema discussed in this article represents the protocol boundary, NOT the security boundary.

1. The Evolution of Constraints: From "Praying" to "Iron Laws"

In the history of Agent protocol development, we have experienced four stages:

The Praying Stage: Adding a line like "Please return JSON" to the Prompt. Failure rate: 30%+.
The Regex Stage: Violently extracting code blocks from Markdown using regular expressions. Failure rate: 15%.
The Post-Validation Stage: Introducing Pydantic/Zod for parsing; if it fails, throw the error back to the model and ask for a retry. Failure rate: 5% (consumes a massive amount of Tokens).
The Constrained Decoding Stage: Controlling the Token exit gate through underlying compute power. Failure rate: 0%.

2. Pydantic / Zod: Single Source of Truth

In industrial-grade Agent development, the Schema is first and foremost strongly-typed code; only secondarily is it translated for the LLM. This Single Source of Truth ensures that your documentation and code logic are forever synchronized.

2.1 [Code Battle] Building a Bulletproof Shell Execution Schema

Let's define a safe Shell tool in Python using Pydantic:

from pydantic import BaseModel, Field, validator
import re

class ShellAction(BaseModel):
    """
    Defines a secure Shell execution contract.
    The model must strictly adhere to this structure; otherwise, it will be blocked during the generation phase.
    """
    command: str = Field(..., description="The shell command to execute")
    working_dir: str = Field(default="/tmp/workspace", description="Execution directory, MUST be an absolute path")
    timeout_seconds: int = Field(default=30, ge=1, le=300)

    @validator("command")
    def prevent_suicidal_commands(cls, v):
        # Even if the model intends to do evil, this layer of code validation is the final physical checkpoint
        forbidden = ["rm -rf /", "mkfs", "dd"]
        if any(f in v for f in forbidden):
            raise ValueError("Destructive system command detected; execution has been blocked by the underlying kernel!")
        return v

When you feed this Pydantic model to a large model as a tool, it is translated into a set of JSON Schemas. The large model sees more than just the word "command"; it sees a strict set of laws encompassing types, ranges, and regex constraints.

2.2 First-Request Latency and Schema Caching: You Must Observe It

Strict schemas often require extra processing upon their first use (e.g., caching/compiling artifacts), which introduces first-request latency. If you do not monitor this, once deployed, you will misdiagnose "sluggishness" as "the model getting dumber."

It is recommended to record at least the following:

Field	Meaning
`schema_id`	Schema version/hash
`schema_cached`	Cache hit status
`schema_compile_ms`	Processing time for the first request
`validation_errors`	Validation failure count

Only when these fields enter trace/spans and audit logs can they support long-term stable operation.

3. Core Inside Story: The Physical Principles of Native Structured Output

When we call GPT-4o or Gemini 1.5 with response_format: { "type": "json_schema" }, what exactly happens during the API's decoding phase?

3.1 Constrained Decoding Technology

This is what is known as Logit Bias / Grammar Guiding.

Probability Matrix Prediction: When the model has generated the 7 characters {"age": .
Dynamic Masking: The large model might ordinarily predict the next Token to be "twenty", (space), 25, or [ (left bracket).
Physical Intervention: Because your Schema dictates that age must be an integer, the underlying decoder of the API instantly and forcibly corrects the probabilities of all non-numeric symbol Tokens to 0.
Guaranteed Result: The next Token the model "spits" out at the physical layer can only and must be a number.

Conclusion: Under Native mode, you no longer need to worry about a JSON having an extra comma or a missing bracket. Because illegal brackets simply cannot pass through the prediction matrix at that specific step.

4. Post-Feedback Loop: The Fallback for Non-Native Models

If you are using a local Llama 3 or Qwen and are not utilizing llama.cpp's Grammar control features, then you must implement a highly robust Validation-Feedback Pipeline.

async def orchestrate_agent(task):
    context = [{"role": "system", "content": "You are a JSON machine."}]
    
    for _ in range(3): # Maximum 3 retry opportunities
        response = await llm.chat(context)
        try:
            # Attempt strict Pydantic parsing
            action = ShellAction.parse_raw(response.content)
            return await execute_physical(action)
        except ValidationError as e:
            # EXTREMELY CRITICAL: Format the structured Pydantic error message
            error_feedback = f"""
            [VALIDATION_ERROR] The JSON you generated does not comply with the Schema contract:
            {e.json()}
            Please check:
            - timeout_seconds must be an integer.
            - Forbidden commands cannot be included.
            Please attempt to modify your output again.
            """
            context.append({"role": "user", "content": error_feedback})
            continue

Geek Tip: Don't just toss the generic Exception back. Toss Pydantic's e.json() back, because it contains the exact Keypaths and reasons for the error. The large model's error correction efficiency will increase by an order of magnitude.

6. Failure Paths: Refusals, Truncations, Partial Outputs

Even strict structured outputs can fail. Common edge cases include:

Refusals: The model refuses to answer due to safety policies.
Truncations: Hitting max_tokens leaves the JSON unclosed.
Partial Outputs: Only partial fields are output.

Engineering must explicitly define how to handle these:

Before entering a retry, first determine if side effects will occur (an idempotency key MUST be bound).
Retries must have a maximum limit and backoff to avoid retry storms.
Failures must be written to audit logs and trace/spans to review why degradation occurred.

7. Circuit Breakers and Idempotency: Validation Failure Does Not Equal Infinite Retries

Validation failure is the norm, not an exception. But "infinite retries" is an incident manufacturing machine, especially when tools produce side effects.

Minimum governance recommendations:

Circuit Breakers: If the same type of validation error occurs 3 consecutive times, stop write-type tools and enter read-only diagnostics (shadow mode).
Idempotency: Any tool call with side effects must be bound to an idempotency_key; otherwise, a retry is a duplicate commit.
Auditing: Record schema_id, error categories, retry counts, and the final degradation strategy to support post-mortems.

8. Minimum Acceptance Checklist: How Do You Prove "Structured Output" is Truly Stable?

Write the following checklist into your CI or regression tests; do not rely solely on the naked eye:

Check Item	Expectation
Schema Subset Usage	Does not use unsupported JSON Schema features
Refusal Path	Does not enter tool execution upon refusal
Truncation Path	`max_tokens` truncation does not result in half-executed side effects
Retry Cap	Validation failures have max attempts and backoffs
Idempotency Coverage	Tools with side effects must possess an `idempotency_key`
Observation Fields	`schema_cached`/`compile_ms`/`error_count` are recorded

5. Extreme Challenge: Handling "Dirty" Data in Streams

When the Agent performs large-scale file reads and writes, the JSON can become massive. If you want to intercept it before the tokens finish generating (for PII filtering or illegal path monitoring), you need a Streaming JSON Parser.

It maintains a Stack Architecture:

Pushes to the stack upon encountering {.
Pops from the stack upon encountering }.
Monitors in real-time which key is currently active. Once the current Key is path and its Value string begins to match a sensitive directory, the interceptor can instantly sever the TCP connection, achieving physical-level real-time explosion prevention.

Chapter Summary

The Schema is the Agent's "Worldview": An Agent without Schema constraints is like a soulless runaway wild horse.
Native is the First Choice: If you are purchasing commercial APIs, be sure to enable strict: true; this can save you 20% in retry Token costs.
Precise Feedback: Error feedback is not just a string; it is a "logical patch" complete with paths.

Once you have learned to arm your Agent with structured outputs, it evolves from an "occasionally schizophrenic little assistant" into an emotionless, never-overstepping, high-precision CNC machine.

In the next chapter, we will discuss an extremely thorny practical problem: [Violent Instruction Extraction from Bare Text: When the Model Refuses to Follow JSON Formatting, How Do We Use Regex and Lenient Parsing Techniques for Soft Interception?]. We are about to write that "not-so-elegant but life-saving" code.

(End of text - Deep Dive Series 15 / Approx. 1600 words) (Note: It is recommended to run pydantic.BaseModel.schema_json() in your IDE to observe how code transforms into machine-readable contracts.)