正在切换页面...

Performing Surgery at the Molecular Level: Closed-Loop Modifications with AST and Semantic Operations

expertASTTree-sitterCode TransformationSemantic EngineeringUpdated

(Article 56: Agent Dynamics - The AST Engine)

Following the pain points mentioned in the previous chapter: We can absolutely never rely on a Large Language Model with probabilistic deviations to stably output "applicable, rollback-able, and retrospectable" correct patches in a 5,000-line mixed front-end .tsx file.

If you want an Agent to evolve from "occasionally getting it right" to "engineering-ready," you must abandon the obsession with "Strings" and ascend to semantic-level editing of the Abstract Syntax Tree (AST). But more importantly: You cannot just preach the benefits of AST; you must ground AST editing into an executable engineering contract, ensuring every modification enters a closed loop: Locatable for validation, transactional for application, degradable upon failure, and auditable for results.

1. Code is Not Text: Why "String Patches" Are Doomed to Fail in Engineering

To humans, code represents paragraphs in index.js. To Large Language Models, code is a probability sequence of BPE tokens. But to the compiler, code is a strict structure: It has scopes, symbol bindings, syntax trees, and semantic constraints.

The reason string patches fail in engineering is not because "the LLM isn't smart enough," but because "strings lack boundaries." When you say, "Rename calculate to calculateTotal," sed doesn't know what is a symbol, what is a comment, what is a string literal, and what is a local variable with the same name in another scope. It can only replace blindly.

The value of an AST is: It projects "text" onto "structure," allowing you to modify just a single Identifier node while being naturally immune to comments and string literals.

However, in an Agent system, what is even more crucial is turning "where to modify" into a verifiable fact. Only through verifiability can you achieve a closed loop.

2. AST Does Not Equal Compiler: CST, Incremental Parsing, and Where Tree-sitter Fits In

In multi-language code manipulation by Agents, Tree-sitter is a practical path: It is fast enough, supports incremental updates, has broad language coverage, and provides a query language allowing you to "find nodes by structure."

Let's nail down the concepts here:

AST (Abstract Syntax Tree) emphasizes abstract semantics.
CST (Concrete Syntax Tree) emphasizes "what characters you exactly typed."
Tree-sitter is closer to CST. It needs to "update the tree with every keystroke" in editors, so its core capabilities are incremental parsing and fault tolerance/error recovery.

This is highly important for Agents because, in real-world engineering, code is often in a broken, "half-modified" state: Missing a parenthesis or an unclosed string. You still need to locate the modifiable structural fragments as accurately as possible, and then approach correctness in the verification loop.

3. The Core of Semantic Operations: Selectors Instead of Line Numbers

"Line numbers" are a concept for human UIs, not a reliable means for engineering localization. The reason is simple: Formatters insert line breaks, import sorting injects lines, merge conflicts compress them. Even if you only modify a space at the beginning of a file, all subsequent line numbers will drift.

For an Agent to stably modify code, you need a deterministic address system. I call it a Selector.

A reliable Selector must meet at least these criteria:

Relocatable: Insertions/deletions beforehand do not affect target localization.
Verifiable: It can prove "the correct node was found."
Degradable: When a node cannot be found, it can fall back to a coarser localization granularity.

In the Tree-sitter ecosystem, Selectors typically have three variants:

Query selector: Matches structure using queries (most recommended).
Path selector: Locates via tree paths (prone to drift).
Anchor selector: Content anchor + hash (suitable for degradation).

Below is an actionable "Query selector" example. Note that I am deliberately not writing it as a bound "single correct code," but rather emphasizing the contract and risk points.

from dataclasses import dataclass
from typing import Optional

@dataclass(frozen=True)
class NodeSlice:
    # Note: Byte offsets are used here, not line numbers
    start_byte: int
    end_byte: int
    node_type: str
    node_text_sha256: str


class SemanticLocator:
    """
    Semantic Locator (Read-only layer).
    Goal: Transform the intent of "which symbol do I want to modify" into a verifiable NodeSlice.
    """

    def locate_function_by_name(self, source_bytes: bytes, function_name: str) -> Optional[NodeSlice]:
        # Pseudocode: Parse, query, get node's start_byte/end_byte
        # Key points:
        # 1) The query match must be specific enough to avoid homonym ambiguity
        # 2) The output slice needs the hash of node_text for anti-drift verification
        # 3) Missing nodes (error/missing node) must be explicitly marked, preventing them from directly entering the write path
        raise NotImplementedError

4. Why Byte Offsets Are More Stable, Yet More Dangerous

Locators exposed by parsers like Tree-sitter are often byte offsets. This is more stable than line numbers because it corresponds directly to the underlying byte sequence.

However, byte offsets have a very practical pitfall: You must guarantee that "you are replacing the exact same byte content." Otherwise, the offset will slice into the wrong target.

You must implement three engineering gates:

Unified encoding: Use UTF-8 for both read and write (and verify no BOM / lossless round-trip before writing back).
Content verification: Before applying a patch, compare the hash of the target slice (refuse to write if inconsistent).
Transactional boundaries: Bind "localization + application" to the same source snapshot. Cross-version offset reuse is strictly forbidden.

5. Semantic Operations: Let LLMs Output "Intent", Let the Runtime Output "Patches"

The design principle for top-tier Agent toolchains is: LLMs make high-level decisions, local engines execute low-level tasks, and the engine must produce verifiable evidence.

By getting the tool interface right, you'll find the LLM's error rate drops significantly because it no longer needs to "guess the formatting" and "guess the position."

Comparing two generations of tool interfaces:

The String Era: replace_text(file, old, new)
The Semantic Era: rename_symbol(selector, new_name, constraints)

The minimal contract in the semantic era usually looks like this:

selector: Tells the system "which structural node to modify."
constraints: Tells the system "what the allowed context is" (e.g., must be within a certain class, must match the parameter list).
expected_effect: Tells the system "what diagnostic change I expect" (e.g., number of compilation errors decreases, reference count remains identical).

The essence of doing this is making modifications testable; otherwise, you can only rely on "it looks right."

6. 'Apply' is a Surgery: Transactions, Idempotency, Concurrency Conflicts, and Rollbacks

In engineering, "finding the node" is only the beginning. The truly hard part is "how to change it." You must assume that all steps can fail and design controllable consequences for failure.

6.1 Transactional Apply

A semantic modification requires at least 4 steps:

parse: Obtain the syntax tree and node slices.
plan: Generate modification intent (LLM output).
apply: Transform the intent into a patch on the same snapshot and apply it.
verify: Run formatting, type checking, unit tests, or minimal linting.

These 4 steps must be written to an audit log and require a "transaction id" to ensure retrospectability: you need to know which change caused which diagnostic shift.

6.2 Idempotency (Repeated execution should not produce extra side effects)

The most common engineering accident in Agent systems is not "modifying it wrong," but "retrying and breaking what was already correct."

Therefore, write-based tools must have idempotency checks:

If the target node already meets expectations, direct no-op.
If the target node no longer exists, direct failure and enter fallback localization.
If the target node's content hash is inconsistent, refuse the write (preventing drift collateral damage).

6.3 Concurrency Conflicts (Multi-point rewriting)

A complex task often requires multiple patches. You have two strategies:

Sequential: Re-parse after each patch (stable but slow).
Batching: Perform multiple edits on the same tree (fast but prone to conflict).

Batching requires extra conflict detection: The byte ranges of two slices cannot overlap, and the application order must be fixed (e.g., ascending by start_byte); otherwise, offsets will drift after being rewritten by a preceding edit.

6.4 Rollbacks (Reversibility)

The most rudimentary yet reliable rollback is: Save the old content entirely before writing, and restore it upon failure.

More engineered approach: Record the patch of every apply as a reversible diff, and expose rollback as a first-class tool, rather than hiding the rollback logic within "exception handling."

7. The Verify Loop: Turning "Self-Correction" into a Deterministic Process

Modification is not the end; verification is the gate of the "closed loop."

A minimal yet effective verification chain:

Formatter: Format first (squeezing out meaningless whitespace differences).
Typecheck / compile: Then run type checking or compilation.
Lint: Finally, run lint (capturing finer constraints).

When verification fails, you shouldn't dump the entire log back to the model. Instead, inject the "minimum usable facts":

Error codes and the most critical 20 lines of context (after stripping ANSI).
Definitions of relevant symbols (can be extracted from AST/LSP).
The intent and selector from the last apply (for model alignment).

This is the engineered version of Plan -> Act -> Verify -> Refine.

8. AST is Not a Panacea: Acknowledging Boundaries and Degradation Strategies

AST/Tree-sitter becomes significantly weaker in these scenarios:

Preprocessor macros: The real semantics are not in the source text.
Generated code: You are modifying the artifact, not the source.
Mixed languages: e.g., template strings and CSS-in-JS embedded in .tsx.
Broken syntax: Half-finished states mid-edit.

Therefore, your runtime must provide fallback paths:

AST localization fails: Fallback to "content anchor + neighborhood slice."
AST parsing is untrustworthy: Read-only extraction + human confirmation, writing disabled.
Language server is more reliable: When available, use LSP's definition/references for cross-verification.

9. Dehydrating Context with AST: Compressing 10,000 Lines to 200 Without Losing Structure

When a file is massive, you shouldn't feed the full text to the model. You should use the AST to generate a skeletal view:

Retain: Declarations, signatures, docstrings, imports, and exports of modules/classes/functions.
Discard: Implementation bodies, using ... as placeholders.
Extra exports: Symbol tables and reference graphs (optional).

The key to skeletal views is not saving tokens, but focusing the model's attention on structural decisions, rather than drowning in implementation details.

This strategy is especially effective in the "planning phase": The model first outputs modification intents based on the skeleton, and then the local engine precisely executes them.

Chapter Summary (The Hard Truths)

Do not give the LLM line numbers; let it give you "intent + selector + constraints + expected effects."
Byte offsets are stable, but hashes and transaction boundaries must be used to prevent drift collateral damage.
Semantic modification is not "changing a piece of text," but a surgery: idempotency, concurrency, rollbacks, and auditing are mandatory.
AST is powerful, but not a panacea; degradation paths and verification loops must be designed.

In the next chapter, we will fill in the other half of "semantic localization": In a massive workspace, how to narrow down the candidate scope to "small enough" using indexing and search, and then hand it over to AST/LSP for final deterministic localization.

(End of this article - In-Depth Analysis Series 22)