正在切换页面...

Staying Sober Amidst Hallucinations: Hard Interception and Exception Degradation of Tool Execution

hardRuntimeValidationError HandlingReliabilityUpdated

(Article 42: Agent Architecture Solidification)

If processing illegal user inputs is the most painful part of developing a standard Web backend, then the sudden, unprovoked "Hallucinations" of large language models are absolutely the most maddening part of developing an Agent system.

LLMs hallucinate. They will invent tools out of thin air that don't exist, or even if a tool does exist, they will omit core fields or inject nonsensical characters that defy human logic. If you just write eval(llm_output), the absolute ceiling of your Agent is a toy code execution engine prone to spontaneous self-destruction.

This chapter dives deep into how to construct a "Fully Automated Error-Correction and Self-Healing System" for your Agent.

0. First, Split "Hallucination Degradation" into Two Layers

Many implementations write "hallucination handling" as a giant pile of if-else statements. In engineering, you must split this into two distinct layers because their objectives are completely different:

Deterministic Validation Layer: parse/schema/allowlist. The goal is to fail closed and prevent side effects.
Strategic Fallback Layer: Switch models, HITL (Human-in-the-Loop), shadow mode. The goal is to keep the task moving forward but with controlled risk.

The moment you mix these two layers, you will attempt to "patch" things when you should be "refusing execution," thereby turning a hallucination into a full-blown incident.

1. The Cause of Hallucinations: Why Can't It Stop?

From the perspective of probability theory, a large model is an Autoregressive Markov Chain. It predicts the next word every single second. When its generated trajectory deviates even 1% from business reality (e.g., typing file_name instead of file_path), because it cannot "un-generate" Tokens, it will try with all its might to forcibly rationalize the subsequent logic based on this erroneous "prequel."

This phenomenon is called "Probabilistic Drift." Your Agent Runtime must act as a cold, ruthless judge, subjecting every single character to the strictest scrutiny before it even touches the physical hardware.

2. Schema Enforcement: Shackling the Mind

Any geek worth their salt nowadays will abandon naked string parsing and shift towards strongly typed bulletproof vests built with Pydantic (Python) or Zod (TS).

2.1 [Core Code] Pydantic-Based Tool Admission Gateway

Do not just define a tool; define the tool's "physical boundaries."

from pydantic import BaseModel, Field, ValidationError
from typing import Dict, Any

class GitCommitTool(BaseModel):
    """
    Defines the strict contract for the Git Commit tool.
    Only model outputs conforming to this Schema are permitted to run in the shell.
    """
    message: str = Field(..., min_length=5, description="Commit message, must contain specific change descriptions")
    files: list[str] = Field(..., description="List of files to commit. The use of '*' wildcards is strictly forbidden")
    author: str = Field(default="Agent_Bot", description="Identity executing the commit")

def dispatch_tool(raw_json_from_llm: str):
    try:
        # First Layer of Defense: Syntax Parsing
        data = json.loads(raw_json_from_llm)
        
        # Second Layer of Defense: Domain Model Validation
        validated_data = GitCommitTool(**data)
        
        # ONLY after surviving this trial by fire and blood is it allowed to touch the physical OS
        return do_real_git_commit(validated_data)
        
    except ValidationError as e:
        # [CRITICAL] Do NOT swallow the error! Feed the exact Zod/Pydantic error text back to the brain
        return self.handle_hallucination(e.json())
    except Exception as e:
        return self.handle_hallucination("JSON parsing failed, please check your nested quotes.")

3. Error-Correction Mechanism: The Error Feedback Loop Introspection Circuit

In traditional backends, when we encounter a JSON error, we return a 500 Server Error to the frontend and call it a day. But in an Agent system, the Stack Trace is not meant for human eyes; it is a "nutritional supplement" specifically packaged as text to be fed back to the LLM.

Large models possess extremely powerful In-Context Self-Correction capabilities.

3.1 The Art of the Correction Prompt

When validation fails, don't just tell it "You're wrong." Guide it like a mentor:

def handle_hallucination(self, detailed_error):
    feedback = f"""
    [CRITICAL_ALARM] Tool call protocol interrupted!
    Reason: {detailed_error}
    
    Detected that your output deviated from the predefined format. This will cause the underlying physical executor to crash.
    You MUST re-execute the following actions:
    1. Take a deep breath and re-evaluate your Task objective.
    2. Strictly check the closure of your JSON and the types of its fields.
    3. Confirm whether the file paths actually exist (if unsure, invoke the `ls` command first).
    
    Please re-send your Action block after making corrections.
    """
    # Forcibly inject this feedback at the very top of the conversation history as an Observation
    self.memory_store.append({"role": "user", "content": feedback})
    # Instantly trigger the next round of reasoning
    return self.re_trigger_loop()

If you do not design this code, the Agent will think it successfully dropped the database and continue to fantasize downwards based on that non-existent fact, eventually drifting miles off-topic. This is the truth behind "the LLM just went crazy while working": because you didn't slap it across the face the moment it started going crazy.

Even if Schema validation passes, logical hallucinations (e.g., deleting the wrong file) can still occur.

For high-risk actions (like rm or push), we introduce a "Critic Agent". Before the Action is executed, the system automatically wakes up a low-cost small model (like GPT-4o-mini) and asks it one single question:

"The primary agent just decided to execute rm test.py in the project root. Based on the current task objectives, is this action suicidal? Please answer YES or NO."

If the referee says NO, the system intercepts the primary agent's action and informs it: "Your logic was intercepted by the safety monitoring system. Please provide a more reasonable justification for deletion."

5. Fallback Protocol: Preventing Mentally Deficient Infinite Loops

Sometimes, even if you stuff it full of error messages, the model gets "possessed" and sends the exact same broken JSON 10 times in a row.

Geek Circuit-Breaker Strategies:

Counter-Based Circuit Breaker: If the same Tool Call fails consecutively more than 3 times, the Agent Runtime MUST forcibly disconnect.
Fallback:
- Method A: Brain Swap. Automatically switch the backend model from GPT-4 to Claude 3.5. Different reasoning inertia often breaks the deadlock.
- Method B: Request Human Intervention. Pop up a TUI dialog: "Master, I am trapped in a logical dead-end while writing to a file. Here is my error. Can you help me fix it?"

6. Retries Must Be Controlled: Exponential Backoff + Circuit Breakers + Idempotency Keys

"Retrying" is not a virtue in the agent world, especially when tool side effects are involved.

The minimum viable retry strategy must satisfy all of the following:

Max Attempts: E.g., over 3 consecutive failures for the same tool call triggers an immediate circuit break.
Backoff: Exponential backoff or jittered backoff to avoid amplifying brief network blips into retry storms (timeouts, retries).
Idempotency: ANY tool call that might generate side effects MUST carry an idempotency_key, otherwise a retry is just a duplicate commit.

A minimum "tool commit record" example (for auditing and recovery):

tool=shell.exec|idem=ab12...|timeout_ms=8000|attempt=2|exit=1|err_sha=...

These fields are not "log hoarding fetishes"; they are your levers to prevent a hallucination from escalating into a catastrophic incident (observation, auditing).

7. Protocols/Adapters are NOT Safety Boundaries: Permissions, Isolation, and Auditing Remain in the Runtime

Even if you use MCP or any "standardized tool protocol," it does not solve permission and isolation issues. A protocol ensures you "can connect," but it doesn't guarantee you "connect safely."

Therefore, hardcode these rules:

Default DENY for all tools, allowing passage only via an allowlist.
High-risk tools require stronger isolation (containers/read-only mounts/network isolation), and demand human approval or secondary verification.
All critical actions must be written to the audit chain, otherwise you cannot conduct post-mortems or hold the system accountable.

8. Degradation Decision Matrix: When to Swap Models, When to Call Humans

Strategic fallback is not "swapping brains the moment you hit an error." It must be an interpretable, auditable decision.

Scenario	Phenomenon	Recommended Action	Key Constraint
Parse Failure	Half-JSON / Unclosed structure	Wait for stop event before parsing, or demand LLM resend structured action	MUST NOT execute side effects
Schema Failure	Missing fields/Wrong types	Inject validation error exactly as observation	Max retry limit
Repeated Failure	Same tool fails 3 times	Trigger circuit breaker, switch to read-only diagnostics (shadow mode)	Prohibit further writes
High-Risk Action	rm/push/charge money	Request human approval or secondary verification (critic)	MUST HAVE Idempotency key
Insufficient Model Capability	Repeatedly misunderstands protocol/fields	Swap model or swap to specialized agent (handoff)	Record handoff/audit

The point of this matrix is so that every degradation can explicitly state "Why" in the audit log, rather than relying on mysticism.

9. Shadow Mode: Observe First, Execute Later

For complex systems, the most effective degradation isn't "stopping," but "running in the shadows first."

The Shadow Mode approach is:

Tool calls still go through parse/schema/allowlist, but all tools generating side effects are replaced with "simulated execution."
Write the simulated output alongside the read-only check results of the real environment into the trace/span.
Only when N consecutive shadow steps align with read-only verifications is real execution allowed to resume (gradual rollout).

Its engineering value is this: When you suspect the model is hallucinating, you can continue gathering evidence and narrowing down the problem scope, all while avoiding incidents.

Chapter Summary

Abandon the illusion of "Omnipotent AI." An application-level Agent capable of running independently in the background for half an hour without crashing is an iceberg built upon hundreds of thousands of densely packed Schema validations, Exception injections, and Critical interceptions.

AI is responsible for performing magic; as the Architect, your responsibility is to grip its wire harness tightly in the background.

In the next chapter, we officially step out of the "Cognition" section and enter the physical carrier of the Agent—[Memory Persistence: Implementing a Dynamic Hippocampus Based on SQLite and Concurrent Locks]. We are about to begin constructing long-term memory storage for our Agent!

(End of text - Deep Dive Series 08 / Approx. 1600 words) (Note: It is highly recommended to integrate this chapter's handle_hallucination mechanism into your Agent base class; it can increase task success rates by over 40%.)

Reference Materials (For Verification)

Guardrails (Agents SDK): https://openai.github.io/openai-agents-python/guardrails/
Anthropic streaming messages: https://docs.anthropic.com/claude/reference/messages-streaming
MCP base protocol: https://modelcontextprotocol.io/specification/2025-11-25/basic
MCP Safety Audit: https://arxiv.org/abs/2504.03767

Staying Sober Amidst Hallucinations: Hard Interception and Exception Degradation of Tool Execution

hardRuntimeValidationError HandlingReliabilityUpdated

(Article 42: Agent Architecture Solidification)

This chapter dives deep into how to construct a "Fully Automated Error-Correction and Self-Healing System" for your Agent.

0. First, Split "Hallucination Degradation" into Two Layers

Deterministic Validation Layer: parse/schema/allowlist. The goal is to fail closed and prevent side effects.
Strategic Fallback Layer: Switch models, HITL (Human-in-the-Loop), shadow mode. The goal is to keep the task moving forward but with controlled risk.

The moment you mix these two layers, you will attempt to "patch" things when you should be "refusing execution," thereby turning a hallucination into a full-blown incident.

1. The Cause of Hallucinations: Why Can't It Stop?

2. Schema Enforcement: Shackling the Mind

Any geek worth their salt nowadays will abandon naked string parsing and shift towards strongly typed bulletproof vests built with Pydantic (Python) or Zod (TS).

2.1 [Core Code] Pydantic-Based Tool Admission Gateway

Do not just define a tool; define the tool's "physical boundaries."

from pydantic import BaseModel, Field, ValidationError
from typing import Dict, Any

class GitCommitTool(BaseModel):
    """
    Defines the strict contract for the Git Commit tool.
    Only model outputs conforming to this Schema are permitted to run in the shell.
    """
    message: str = Field(..., min_length=5, description="Commit message, must contain specific change descriptions")
    files: list[str] = Field(..., description="List of files to commit. The use of '*' wildcards is strictly forbidden")
    author: str = Field(default="Agent_Bot", description="Identity executing the commit")

def dispatch_tool(raw_json_from_llm: str):
    try:
        # First Layer of Defense: Syntax Parsing
        data = json.loads(raw_json_from_llm)
        
        # Second Layer of Defense: Domain Model Validation
        validated_data = GitCommitTool(**data)
        
        # ONLY after surviving this trial by fire and blood is it allowed to touch the physical OS
        return do_real_git_commit(validated_data)
        
    except ValidationError as e:
        # [CRITICAL] Do NOT swallow the error! Feed the exact Zod/Pydantic error text back to the brain
        return self.handle_hallucination(e.json())
    except Exception as e:
        return self.handle_hallucination("JSON parsing failed, please check your nested quotes.")

3. Error-Correction Mechanism: The Error Feedback Loop Introspection Circuit

Large models possess extremely powerful In-Context Self-Correction capabilities.

3.1 The Art of the Correction Prompt

When validation fails, don't just tell it "You're wrong." Guide it like a mentor:

def handle_hallucination(self, detailed_error):
    feedback = f"""
    [CRITICAL_ALARM] Tool call protocol interrupted!
    Reason: {detailed_error}
    
    Detected that your output deviated from the predefined format. This will cause the underlying physical executor to crash.
    You MUST re-execute the following actions:
    1. Take a deep breath and re-evaluate your Task objective.
    2. Strictly check the closure of your JSON and the types of its fields.
    3. Confirm whether the file paths actually exist (if unsure, invoke the `ls` command first).
    
    Please re-send your Action block after making corrections.
    """
    # Forcibly inject this feedback at the very top of the conversation history as an Observation
    self.memory_store.append({"role": "user", "content": feedback})
    # Instantly trigger the next round of reasoning
    return self.re_trigger_loop()

Even if Schema validation passes, logical hallucinations (e.g., deleting the wrong file) can still occur.

"The primary agent just decided to execute rm test.py in the project root. Based on the current task objectives, is this action suicidal? Please answer YES or NO."

5. Fallback Protocol: Preventing Mentally Deficient Infinite Loops

Sometimes, even if you stuff it full of error messages, the model gets "possessed" and sends the exact same broken JSON 10 times in a row.

Geek Circuit-Breaker Strategies:

Counter-Based Circuit Breaker: If the same Tool Call fails consecutively more than 3 times, the Agent Runtime MUST forcibly disconnect.
Fallback:
- Method A: Brain Swap. Automatically switch the backend model from GPT-4 to Claude 3.5. Different reasoning inertia often breaks the deadlock.
- Method B: Request Human Intervention. Pop up a TUI dialog: "Master, I am trapped in a logical dead-end while writing to a file. Here is my error. Can you help me fix it?"

6. Retries Must Be Controlled: Exponential Backoff + Circuit Breakers + Idempotency Keys

"Retrying" is not a virtue in the agent world, especially when tool side effects are involved.

The minimum viable retry strategy must satisfy all of the following:

Max Attempts: E.g., over 3 consecutive failures for the same tool call triggers an immediate circuit break.
Backoff: Exponential backoff or jittered backoff to avoid amplifying brief network blips into retry storms (timeouts, retries).
Idempotency: ANY tool call that might generate side effects MUST carry an idempotency_key, otherwise a retry is just a duplicate commit.

A minimum "tool commit record" example (for auditing and recovery):

tool=shell.exec|idem=ab12...|timeout_ms=8000|attempt=2|exit=1|err_sha=...

These fields are not "log hoarding fetishes"; they are your levers to prevent a hallucination from escalating into a catastrophic incident (observation, auditing).

7. Protocols/Adapters are NOT Safety Boundaries: Permissions, Isolation, and Auditing Remain in the Runtime

Even if you use MCP or any "standardized tool protocol," it does not solve permission and isolation issues. A protocol ensures you "can connect," but it doesn't guarantee you "connect safely."

Therefore, hardcode these rules:

Default DENY for all tools, allowing passage only via an allowlist.
High-risk tools require stronger isolation (containers/read-only mounts/network isolation), and demand human approval or secondary verification.
All critical actions must be written to the audit chain, otherwise you cannot conduct post-mortems or hold the system accountable.

8. Degradation Decision Matrix: When to Swap Models, When to Call Humans

Strategic fallback is not "swapping brains the moment you hit an error." It must be an interpretable, auditable decision.

Scenario	Phenomenon	Recommended Action	Key Constraint
Parse Failure	Half-JSON / Unclosed structure	Wait for stop event before parsing, or demand LLM resend structured action	MUST NOT execute side effects
Schema Failure	Missing fields/Wrong types	Inject validation error exactly as observation	Max retry limit
Repeated Failure	Same tool fails 3 times	Trigger circuit breaker, switch to read-only diagnostics (shadow mode)	Prohibit further writes
High-Risk Action	rm/push/charge money	Request human approval or secondary verification (critic)	MUST HAVE Idempotency key
Insufficient Model Capability	Repeatedly misunderstands protocol/fields	Swap model or swap to specialized agent (handoff)	Record handoff/audit

The point of this matrix is so that every degradation can explicitly state "Why" in the audit log, rather than relying on mysticism.

9. Shadow Mode: Observe First, Execute Later

For complex systems, the most effective degradation isn't "stopping," but "running in the shadows first."

The Shadow Mode approach is:

Tool calls still go through parse/schema/allowlist, but all tools generating side effects are replaced with "simulated execution."
Write the simulated output alongside the read-only check results of the real environment into the trace/span.
Only when N consecutive shadow steps align with read-only verifications is real execution allowed to resume (gradual rollout).

Its engineering value is this: When you suspect the model is hallucinating, you can continue gathering evidence and narrowing down the problem scope, all while avoiding incidents.

Chapter Summary

AI is responsible for performing magic; as the Architect, your responsibility is to grip its wire harness tightly in the background.

Reference Materials (For Verification)

Guardrails (Agents SDK): https://openai.github.io/openai-agents-python/guardrails/
Anthropic streaming messages: https://docs.anthropic.com/claude/reference/messages-streaming
MCP base protocol: https://modelcontextprotocol.io/specification/2025-11-25/basic
MCP Safety Audit: https://arxiv.org/abs/2504.03767

Staying Sober Amidst Hallucinations: Hard Interception and Exception Degradation of Tool Execution

0. First, Split "Hallucination Degradation" into Two Layers

1. The Cause of Hallucinations: Why Can't It Stop?

2. Schema Enforcement: Shackling the Mind

2.1 [Core Code] Pydantic-Based Tool Admission Gateway

3. Error-Correction Mechanism: The Error Feedback Loop Introspection Circuit

3.1 The Art of the Correction Prompt

4. Settling Accounts: "Double-Blind Verification" Based on a Critic Agent

5. Fallback Protocol: Preventing Mentally Deficient Infinite Loops

6. Retries Must Be Controlled: Exponential Backoff + Circuit Breakers + Idempotency Keys

7. Protocols/Adapters are NOT Safety Boundaries: Permissions, Isolation, and Auditing Remain in the Runtime

8. Degradation Decision Matrix: When to Swap Models, When to Call Humans

9. Shadow Mode: Observe First, Execute Later

Chapter Summary

Reference Materials (For Verification)

Staying Sober Amidst Hallucinations: Hard Interception and Exception Degradation of Tool Execution

0. First, Split "Hallucination Degradation" into Two Layers

1. The Cause of Hallucinations: Why Can't It Stop?

2. Schema Enforcement: Shackling the Mind

2.1 [Core Code] Pydantic-Based Tool Admission Gateway

3. Error-Correction Mechanism: The Error Feedback Loop Introspection Circuit

3.1 The Art of the Correction Prompt

4. Settling Accounts: "Double-Blind Verification" Based on a Critic Agent

5. Fallback Protocol: Preventing Mentally Deficient Infinite Loops

6. Retries Must Be Controlled: Exponential Backoff + Circuit Breakers + Idempotency Keys

7. Protocols/Adapters are NOT Safety Boundaries: Permissions, Isolation, and Auditing Remain in the Runtime

8. Degradation Decision Matrix: When to Swap Models, When to Call Humans

9. Shadow Mode: Observe First, Execute Later

Chapter Summary

Reference Materials (For Verification)