Salvaging from the Quagmire: Raw Text Tool Parsers and Fallback Strategies
(Article 50: The Resilience of Agent Protocols)
In the previous article, we discussed how silky smooth Schema validation is when supported by top-tier constrained decoding APIs. However, in the cruel reality of actual Agent deployment (especially when clients need to run local open-source models like Llama 3 or Qwen 2 offline on their intranet), models usually do not natively support OpenAI-style physical interception for tool_calls.
At this point, we can only let the large model output raw text (Raw Text) within a standard conversational flow. This demands the most hardcore craft of a geek—"Text Salvaging Surgery": precisely severing machine instructions from a pile of gibberish.
0. First, Put the Parser Back into a Secure Context: External Text is Untrusted by Default
Before discussing "how to parse," we must first clarify the threat model. The moment you allow raw text to carry "tool instructions," you must acknowledge:
- The model's output is not a trusted boundary.
- Observations (web pages, RAG, logs) are not trusted boundaries.
- Successful parsing merely means "extraction," not "authorization."
Indirect Prompt Injection (IPI) is not a theoretical problem; It has already appeared in real-world retrieval and tool output chains, And it will disguise "contextual content" as "high-priority instructions."
As long as you allow raw text to contain "tool instructions," you must accept one fact:
- Raw text may come from untrusted data (user input, web pages, RAG results).
- Untrusted data may carry Indirect Prompt Injections (IPI).
Therefore, the parser must obey two hard rules:
- Successful parsing does not equal executability: The parser's output is still untrusted input; it must go through an allowlist/permissions/auditing check before execution.
- Data blocks must be isolated: Retrieval/web snippets must be wrapped in isolation tags, and their source and timestamp must be recorded in the audit logs.
1. Native Convention Tag Patterns: The Engineering Battle Between XML and JSON
When a large model loses native API interface support, there are currently two most robust protocol schools in the industry: one is the Markdown JSON Block protocol, and the other is the XML Tag protocol, highly revered by Anthropic and the large-scale code Agent community.
1.0 Conclusion First: Protocols are Not Aesthetics, They are Failure Mode Management
When choosing a protocol, what you really need to compare is:
- Can you still salvage a complete instruction during a "half-baked output"?
- Can you reliably slice out the boundaries from the noise when it's "mixed with gibberish"?
- Will you self-destruct when "parameters contain code/quotes/backticks"?
Being "more elegant" doesn't matter. What matters is: will the system enter a retry storm when parsing fails?
1.1 Why is XML Better Suited for "Salvaging" Than JSON?
JSON parsing is "all or nothing": missing a single comma will cause json.loads() to crash and throw an exception. XML (or custom tags), on the other hand, possesses extremely strong local recognizability.
Core Advantages of XML:
- Starting Atomicity: Seeing
<tool_call>immediately indicates the action has begun, regardless of the preceding gibberish. - Fault Tolerance Boundary: Even if the
argsinternally contain incredibly complex special characters (like quotes in a shell script), as long as we match</tool_call>via a non-greedy regex, we can close the tag.
1.2 Protocol Selection Matrix: When to Use Which
| Goal | XML Tag | Markdown ```json code block | Notes |
|---|---|---|---|
| Salvageable during partial streaming output | Strong | Weak | JSON dies completely missing one bracket; XML can be "closed and patched" |
| Chaining multiple tool calls | Medium | Strong | JSON arrays are more natural but require strict syntax |
| Parameters containing code snippets | Strong | Medium | XML can treat args as pure text; JSON requires escaping |
| Compatibility across different models | Strong | Medium | Using self-explanatory XML tags is often more stable |
| Security isolation (Data vs Instructions) | Strong | Medium | XML makes it easier to "tag data blocks," reducing misparsing risks |
2. Violent Aesthetics: Implementation of the Fuzzy Tool Parser
When the Agent engine receives a "quagmire of text" containing wordy gibberish, explanatory information, and tool tags, we must use multi-level regex to dismantle it.
2.1 [Core Code] A Regex Extractor with Self-Healing Capabilities
Here is a Python implementation capable of violently salvaging instructions from both Markdown and XML:
import re
import json
class FuzzyToolParser:
"""
A text parser that ignores gibberish and only looks for instructions.
It not only uses regex to find tags but is also responsible for patching fragmented JSON strings.
"""
def __init__(self):
# Compatible with XML tags and Markdown code blocks
self.xml_pattern = re.compile(r"<tool_call>\s*<name>(.*?)<\/name>\s*<args>(.*?)<\/args>\s*<\/tool_call>", re.DOTALL)
self.md_json_pattern = re.compile(r"```json\s*(\{.*?\})\s*```", re.DOTALL)
def extract(self, text: str) -> list:
tool_calls = []
# 1. Attempt to salvage from XML tags
for match in self.xml_pattern.finditer(text):
tool_calls.append({
"name": match.group(1).strip(),
"args": self._lenient_json_parse(match.group(2).strip())
})
# 2. If nothing is found in XML, attempt to find it within Markdown code blocks
if not tool_calls:
for match in self.md_json_pattern.finditer(text):
tool_calls.append(self._lenient_json_parse(match.group(1)))
return tool_calls
def _lenient_json_parse(self, raw_str: str):
"""
Lenient Parser:
1. Attempt direct parsing.
2. If it fails, attempt to patch trailing curly braces.
3. If it still fails, wrap it as plain text.
"""
try:
return json.loads(raw_str)
except json.JSONDecodeError:
# Violent patching: Prevent model output interruption
fixed_str = raw_str.strip()
if not fixed_str.endswith("}"): fixed_str += "}"
try:
return json.loads(fixed_str)
except:
return {"raw_text_args": raw_str}
2.2 Failure Modes and Governance Points: Parsing Failures Turn Into Retry Storms
| Failure Mode | Trigger | Consequence | Governance Point |
|---|---|---|---|
| Parsing Failure | Half-baked tags / half-baked JSON | Retry storm | Commit boundaries + Upper limits |
| Misparsing | Treating data as instructions | Privilege escalation side effects | Data isolation + deny-by-default |
| Output Explosion | Massive volume of raw text | Timeout / Memory leak | Truncation + hashing |
| Lack of Auditing | No source fields | Inability to post-mortem | Observability + Auditing |
3. "Limb Completion" Logic in Stream Truncation
If you are dealing with ultra-fast Streaming truncation, the situation gets even worse. When the network suddenly drops or the Token limit is reached, you might only receive:
<tool_call><name>run_shell</name><args>rm -rf /
A true industrial-grade implementation isn't "writing a few more regexes"; It is a state machine:
- Enter capturing state: Sees
<tool_call>. - Enter field state: Captures
<name>,<args>. - Enter closing state: Only commits upon seeing
</tool_call>. - Stream ends but still in capturing state: Triggers "patching strategy," but MUST degrade to read-only mode.
The core principle of state machine patching is: It is better to execute less than to execute by mistake. The patched instruction must pass through the allowlist and permission checks again at the security layer.
In implementation, there can be a daemon coroutine:
When the stream ends but is_capturing is still True,
It attempts to inject </args></tool_call> to complete structural closure.
However, the output of this closure can only enter shadow mode for read-only tools,
Direct entry into write-type tool paths is strictly prohibited.
3.2 Degradation Paths: How to Advance the Task When Parsing Fails
When parsing fails continuously, the system should not idle, and it certainly shouldn't proceed with write-type tools. Recommended minimum degradation strategy:
- Enter shadow mode: Only allow read-only tools (ls/cat/grep/status).
- Inject the parse error and the expected format as a one-shot example back into the model.
- Exceeding a threshold triggers a circuit breaker, requesting human intervention or handing off to a dedicated agent.
All three of these actions must be written to the audit chain (observation/auditing); otherwise, you will not be able to review why it "suddenly stopped executing."
3.3 Input Validation and Guardrails: There Must Be a Second Door Behind the Parser
The parser solves the problem of "picking structure out of text." The security layer solves the problem of "whether this structure can be executed."
It is recommended to have at least three lines of defense:
- Allowlist: Only allow a small subset of tool names (write-type tools disabled by default).
- Parameter validation: Enforce length/character set/dangerous pattern constraints on paths, URLs, and commands.
- Auditing and replay: Record the source and context hash of every tool call to facilitate accountability and regression testing.
Such "input validation and guardrails" are explicitly recommended for defense in depth in agentic systems.
4. Error Correction Loop: How Do You Educate a "Disobedient" Model?
If the parser fails completely (e.g., the model outputs a string of unrecognizable gibberish), you absolutely cannot let the program idle.
The Geek's Fallback Strategy:
- Inject a Punishing Observation:
[PARSE_ERROR]: I completely failed to understand the format you just output. Please re-check the format requirements in the System Prompt and output strictly according to the <tool_call> tags!. - Forced Degradation: If parsing fails 2 consecutive times, the system should actively modify the current
System Message, adding an extremely simple One-Shot example, forcibly pulling the model's probability distribution back on track.
Furthermore: When the system enters a continuous failure state, The Runner should hard-lock "write-type tools," Only allowing read-only tools to gather evidence, Until parsing and validation stably recover.
Chapter Summary
- Text is Communication: Don't blindly trust SDKs. At the Agent level, network transmission is merely a byte stream; you must possess the ability to dig logic out of a quagmire of bytes.
- Robustness Over Elegance: In offline small-model scenarios, a parser covered in regexes and seemingly "inelegant" can often save your Agent's life.
- XML is an Excellent Relay Format: Especially when handling parameters containing code snippets.
Having mastered the art of text salvaging, your Agent no longer relies on top-tier commercial APIs. It can run robustly on an old laptop with 5GB of VRAM, using an open-source Llama 3 to execute code and manage files.
In the next chapter, we will completely rip off the operating system's veil: [Terminal Hijacking and PTY: How Does an Agent Disguise Itself as a Human to Take Over Your iTerm2?]. We are going to start writing C code to control pseudo-terminals!
(End of text - Deep Dive Series 16 / Approx. 1600 words)
(Note: It is recommended to use the FuzzyToolParser from this chapter in conjunction with a pre-commit hook to periodically scan extreme text cases in your test suite.)
Reference Materials (For Verification)
- Anthropic streaming messages: https://docs.anthropic.com/claude/reference/messages-streaming
- IPI in the wild: https://arxiv.org/abs/2601.07072
- Prompt injection best practices (AWS): https://docs.aws.amazon.com/pdfs/prescriptive-guidance/latest/llm-prompt-engineering-best-practices/llm-prompt-engineering-best-practices.pdf