正在切换页面...

Decentralized Deliberation: P2P Consensus Protocols and Multi-Agent Debate (MAD)

hardConsensusSwarm IntelligenceMADMulti-agentP2PUpdated

What

"P2P Consensus" is often conflated with two entirely different concepts in the context of multi-agent systems:

Distributed Consensus (Engineering Consistency): Ensuring multiple replica state machines agree on the same sequence of operations, such as the log replication and commit rules in Raft or PBFT.
Cognitive Consensus (Cognitive Consistency): Ensuring multiple reasoning agents reach a "convergence of opinion" on the same problem, such as Multi-Agent Debate (MAD) and vote aggregation.

The purpose of this article is to firmly establish the boundaries between these concepts:

When you need "state machine consensus" like Raft/PBFT.
When you need "cognitive consensus" like MAD.
How to combine both into a system capable of long-term execution: timeouts, retries, idempotency, concurrency, rollback, isolation, permissions, auditing, observability, and degradation must be designed in from the very beginning.

Problem

The most common failure in multi-agent P2P collaboration is not "failing to reach consensus," but rather a "runaway consensus process":

Infinite Debate Loops: Lacking exit conditions, burning through the token budget (retry storms).
Double-Write Side Effects: Multiple agents attempting write operations simultaneously without idempotency or commit logs, resulting in duplicate submissions (idempotency, concurrency).
State Tearing: Agent A believes a step is committed, while Agent B believes it is uncommitted. The resulting rollback and compensation become catastrophic (rollback, auditing).
Group Hallucinations: Multiple models reinforcing each other's incorrect conclusions, leading to groupthink (insufficient observability).
Attribution Failure: After an incident occurs, it is impossible to answer "who proposed, who approved, who executed, and who audited," making accountability and post-mortems impossible (auditing, observability).

Therefore, the core objective is not merely to "let everyone discuss," but to formalize discussions and commits into a governable protocol.

Principle

1) State Machine Consensus: What Raft (Crash Tolerance) Solves

Raft's target is not "opinions," but an "operation log":

Leader Election: Electing a leader responsible for proposing the log sequence.
Log Replication: Replicating the log to a quorum (majority).
Commit Rule: Only announcing "committed" after the commit conditions are met.

Its value proposition: In a world plagued by crashes and network partitions, the system can still reach agreement on "which operations have occurred."

Primary Sources:

Raft Paper (PDF): https://raft.github.io/raft.pdf
Raft Official Website: https://raft.github.io/index.html

2) Byzantine Consensus: What PBFT Solves

PBFT (Practical Byzantine Fault Tolerance) also targets "operation sequencing and commits," but it assumes a much stronger adversarial environment: nodes might be malicious. It provides a replication protocol capable of tolerating Byzantine faults.

Primary Source:

USENIX OSDI 99: https://www.usenix.org/conference/osdi-99/practical-byzantine-fault-tolerance

3) Cognitive Consensus: What MAD (Multi-Agent Debate) Solves

MAD's targets are "opinions and evidence." It is frequently used for:

Fact Verification: Multi-perspective cross-examination to reduce the probability of single-agent hallucinations.
Solution Trade-offs: Multiple roles raising counterexamples and risks based on different constraints.
Structured Reviews: Ensuring conclusions are accompanied by evidence, counter-evidence, and unresolved questions.

However, MAD is not equivalent to Raft/PBFT, because it does not provide the hard semantics of an "operation log commit." You can use MAD to derive a "recommendation," but you cannot use MAD to declare that "a specific write operation has been committed and is globally consistent."

Usage

Here is the most practical engineering split: separate the "Discussion Layer" from the "Commit Layer."

1) Discussion Layer (MAD): Produces Proposals Only, No Commits

The output of the discussion layer must be a structured proposal, not free-form prose:

plan: A list of proposed execution steps.
risks: Engineering risk points (timeouts, retries, idempotency, isolation, permissions, concurrency, rollbacks, auditing, observability, degradation).
evidence: Referenced files/logs/URLs.
stop_conditions: Exit conditions and triggers for human intervention.

Furthermore, the discussion layer must enforce exit mechanisms:

Maximum Round Limit (Retry bounds).
Budget Cap (Token budget bounds).
Convergence Threshold: Voting thresholds or "divergence < threshold."

2) Commit Layer (Commit): Does One Thing, Ensures Side Effects Are Controlled

The responsibilities of the commit layer are:

Performing schema validation and permission checks on the proposal (Permissions).
Funneling write operations into an "auditable commit point" (e.g., submitting a patch or committing a transaction).
Generating an idempotency key for every commit and writing it to a Write-Ahead Log (WAL) (Idempotency, Auditing).
Executing timeout controls and retry strategies (Timeouts, Retries), and triggering rollbacks/compensations upon failure (Rollbacks, Degradation).

This means that even if you have P2P discussion, once it translates into system side effects, there must be a "Single Submitter" or a "Deterministic Commit Protocol." Otherwise, concurrency conflicts are inevitable.

3) A Minimum Viable "Proposal-Commit" Protocol

{
  "proposal_id": "p-20260421-001",
  "task_id": "t-xxx",
  "plan": [
    {"step": "read", "target": "file:src/foo.ts"},
    {"step": "patch", "target": "file:src/foo.ts", "patch_id": "patch-abc"},
    {"step": "test", "cmd": "npm test", "timeout_ms": 600000}
  ],
  "risks": ["timeouts", "retries", "idempotency", "concurrency", "rollback", "auditing", "observability", "degradation"],
  "stop_conditions": ["budget_exceeded", "test_failed_twice", "permission_denied"],
  "votes": [
    {"agent": "A", "decision": "approve", "notes": "risk ok"},
    {"agent": "B", "decision": "approve", "notes": "needs timeout cap"},
    {"agent": "C", "decision": "reject", "notes": "missing idempotency key"}
  ],
  "decision": {"threshold": "2/3", "result": "approved"}
}

[!WARNING] Even if the vote passes, it only grants "permission to enter the commit layer." The actual write operation must still be gated and executed by the commit layer's strict controls.

Design

P2P without a protocol devolves into a "chat room." The failure modes of a chat room are:

Endless discussion, no action (Missing exit conditions).
Everyone acts, overwriting each other (Uncontrolled concurrency).
Everyone retries upon failure, multiplying side effects (Missing idempotency).

The value of the protocol is that it allows the system to mechanically execute and block operations, rather than relying on the LLM's "self-awareness" or good behavior.

Pitfall

Treating MAD as Distributed Consistency: Consistent opinions do not equal consistent logs.
Missing Idempotency Keys: Retries will manufacture duplicate side effects (Idempotency, Retries).
Missing Timeouts: Discussions or commits hanging indefinitely, preventing resource release (Timeouts, Resource Release).
Missing Audits: Inability to attribute actions, inability to rollback, inability to conduct post-mortems (Auditing, Observability).
Permission Contamination: An agent acquires elevated privileges and propagates them through the P2P network, leading to privilege escalation (Permissions, Isolation).

Debug

When debugging P2P collaboration failures, answer these three questions first:

Did the failure occur in the discussion layer or the commit layer?
Are there duplicate commits? (Check idempotency keys and WAL).
Are there concurrency conflicts? (Check if multiple submitters are writing to the same resource simultaneously).

Only after answering these should you investigate whether the "discussion content made sense." Otherwise, you will waste hours agonizing over semantic reasoning, only to discover the root cause was a missing timeout, idempotency key, or audit log.

Source

Raft Paper: https://raft.github.io/raft.pdf
Raft Official Website: https://raft.github.io/index.html
PBFT (OSDI 99): https://www.usenix.org/conference/osdi-99/practical-byzantine-fault-tolerance
MAD Framework Overview: https://www.emergentmind.com/topics/multi-agent-debate-mad-frameworks
MAD for Fact Verification (Paper): https://www.sciencedirect.com/science/article/pii/S0957417425037194

Engineering Implementation: Turning "Consensus" into Executable Actions

Many systems stop at "the discussion reached a conclusion," but what engineering actually demands is an "executable mutation." We recommend standardizing consensus outputs into two artifacts:

proposal.json: The proposal generated by the discussion layer (plan, risks, evidence, exit conditions, votes).
commit_record.json: The commit record (WAL) generated by the commit layer, containing the idempotency key, resource target, actual execution result, and rollback information.

This architecture enables:

Auditing: Answering "who authorized this commit" (Auditing).
Observability: Aggregating failure reasons and latency metrics (Observability).
Rollback: When the result is flawed, following the commit_record to issue compensations (Rollback).

The Absolute Baseline: Side Effects Can Only Have One Submitter

Regardless of how many P2P agents are debating upstream, the moment you enter the realm of side effects (writing files, modifying DBs, dispatching requests), there must be a Single Submitter:

Submitted by an elected Leader (similar to Raft's leader paradigm).
Or submitted by an external Orchestrator acting as the sole submitter.

Otherwise, "concurrent writes" will render any debate conclusions meaningless, and your system budget will be entirely consumed by rollbacks and compensations (Concurrency, Rollback, Degradation).

This is not about being overly conservative; this is the mandatory prerequisite for upgrading a system from a "chat application" to an "executable engineering system."

Exit Mechanisms (Preventing the "Consensus System" from Becoming an Incinerator)

Both state machine consensus and MAD must be designed with explicit exit mechanisms. A viable exit strategy must contain at least:

Max Rounds: MAD debates execute at most N rounds; if exceeded, fallback to a judge or human intervention (Retries, Degradation).
Max Time: Every discussion and commit phase has a timeout. Upon timeout, it halts and writes an audit record (Timeouts, Auditing).
Max Cost: A token budget cap. If exceeded, further LLM calls are prohibited (Degradation).
Max Concurrency: Concurrent commits against the same resource must be limited to prevent overwriting (Concurrency).

The essence of exit mechanisms is transforming "infinite discussion / infinite retry" into a controllable state machine. Otherwise, the system only grows exponentially more expensive and unstable.

Observability and Auditing Fields (Standardize into a Schema)

To make P2P collaboration capable of post-mortems, you need at least these fields strictly schema-enforced:

task_id / proposal_id / commit_id
agent_id / role (Proposer / Auditor / Submitter)
idempotency_key (Idempotency)
attempt / retry_reason (Retries)
timeout_ms / latency_ms (Timeouts)
resource_targets (The set of written resources)
result / error_code (Outcome)

Once these fields are fixed, you can measure "Failure Reason Distribution," "Rollback Frequency," and "Concurrency Conflict Rates." Otherwise, you are relegated to guessing by reading unstructured log text.