The Coordination Problem: Preventing Agent Conflicts

The Problem Emerges

You've got it working:

Sales agent updates customer records
Support agent logs interactions
Product agent tracks feature requests

Each works great in isolation.

Then they collide.

Sales marks a customer as "churned" while Support is mid-conversation. Product links a feature request to a customer that Sales just deleted. Support's interaction log references a conversation that Product's cleanup archived.

Welcome to the coordination problem.

Why Coordination Is Hard

Multi-agent systems face classic distributed systems challenges:

Race conditions: Two agents read the same state, make decisions, write back. Last write wins, first write's context is lost.

Stale reads: Agent makes decision based on data that changed milliseconds after it read.

Conflicting goals: Sales wants to close the deal; Support wants to escalate the issue. Both are "right" in their domain.

Cascading effects: Agent A's action triggers Agent B, which triggers Agent C, which breaks Agent A's assumption.

These aren't edge cases. At scale, they're inevitable.

A Real Scenario

Timeline:

10:00:00 - Sales agent reads customer "Acme": status=active, health=good
10:00:01 - Support agent receives escalation for Acme, marks status=at_risk
10:00:02 - Sales agent, unaware, sends upsell campaign to Acme
10:00:03 - Customer receives cheerful upsell email during crisis
10:00:04 - Customer tweets about tone-deaf vendor

Total elapsed time: 4 seconds. Damage: significant.

Pattern 1: The Single Source of Truth

Problem: Agents have different views of reality.

Solution: One authoritative knowledge base that all agents read and write.

           ┌──────────────┐
           │   Central    │
           │  Knowledge   │
           │    Base      │
           └──────────────┘
          ↗      ↑       ↖
         /       |        \
   ┌─────┐   ┌─────┐   ┌─────┐
   │Sales│   │Supp │   │Prod │
   │Agent│   │Agent│   │Agent│
   └─────┘   └─────┘   └─────┘

Implementation:

All reads go to the same source
All writes go to the same source
No agent maintains separate state

Caveat: Single source helps consistency but doesn't solve race conditions.

Pattern 2: Read-Before-Write Verification

Problem: Agent acts on stale data.

Solution: Re-read immediately before writing, verify assumptions still hold.

async function updateCustomerStatus(agentId, customerId, newStatus, reason) {
  // Read current state
  const current = await getCustomer(customerId);

  // Verify our assumption
  if (current.status === "churned") {
    return { success: false, reason: "Customer already churned" };
  }

  // Apply our change with context
  await updateCustomer(customerId, {
    status: newStatus,
    last_updated_by: agentId,
    update_reason: reason,
    previous_status: current.status,
  });

  return { success: true };
}

Gap: Still vulnerable to changes between read and write. Better but not bulletproof.

Pattern 3: Optimistic Locking

Problem: Changes between read and write.

Solution: Version numbers. Write only succeeds if version matches.

async function safeUpdate(customerId, changes, expectedVersion) {
  const result = await updateCustomer(customerId, {
    ...changes,
    version: expectedVersion + 1,
    // WHERE version = expectedVersion
  });

  if (result.rowsAffected === 0) {
    // Someone else changed it
    return { success: false, reason: "Conflict - please re-read and retry" };
  }

  return { success: true };
}

Agent behavior:

Read record with version
Make decision
Attempt write with version check
If conflict, re-read and reconsider

Pattern 4: Domain Ownership

Problem: Multiple agents legitimately update the same record.

Solution: Designate which agent owns which fields.

Customer Record:
├── sales_fields (owned by Sales Agent)
│   ├── deal_stage
│   ├── contract_value
│   └── renewal_date
│
├── support_fields (owned by Support Agent)
│   ├── open_tickets
│   ├── last_interaction
│   └── satisfaction_score
│
└── product_fields (owned by Product Agent)
    ├── feature_requests
    ├── usage_metrics
    └── beta_programs

Rules:

Agents can READ any field
Agents can only WRITE to their owned fields
Cross-domain updates require explicit handoff

Benefit: Eliminates most conflicts by design.

Pattern 5: Event Sourcing

Problem: Understanding what happened and why.

Solution: Don't store state directly. Store events. Derive state.

Events:
├── CustomerCreated(id=123, name="Acme", by=sales)
├── StatusChanged(id=123, from=prospect, to=active, by=sales, reason="Contract signed")
├── TicketOpened(id=123, ticket=456, by=support, priority=high)
├── StatusChanged(id=123, from=active, to=at_risk, by=support, reason="Escalation")
└── ...

Current State = replay all events

Benefits:

Full audit trail
Can see exactly what each agent did
Can replay to debug conflicts
Can revert by adding compensating events

Cost: More complex architecture.

Pattern 6: Agent Awareness

Problem: Agents don't know about each other's activities.

Solution: Agents check for recent activity before acting.

async function beforeActing(agentId, customerId, plannedAction) {
  // Check recent activity
  const recentEvents = await getRecentEvents(customerId, { minutes: 5 });

  if (recentEvents.length > 0) {
    // Another agent was just here
    const lastEvent = recentEvents[0];

    if (lastEvent.agent !== agentId) {
      // Decide whether to proceed
      if (shouldDefer(plannedAction, lastEvent)) {
        return { proceed: false, reason: `Deferring to ${lastEvent.agent}` };
      }
    }
  }

  return { proceed: true };
}

Example rules:

Support actions take priority over Sales during active tickets
Product actions defer to Support during escalations
All agents defer to explicit human overrides

Pattern 7: Coordination Service

Problem: Agents need to coordinate complex multi-step processes.

Solution: A coordinator that manages agent orchestration.

                    ┌───────────────┐
                    │  Coordinator  │
                    │    Service    │
                    └───────────────┘
                     ↙    ↓     ↘
               ┌─────┐ ┌─────┐ ┌─────┐
               │Sales│ │Supp │ │Prod │
               └─────┘ └─────┘ └─────┘

Coordinator responsibilities:

Receive requests from agents
Determine if action is safe
Acquire locks if needed
Execute or queue action
Notify affected agents

Tradeoff: Adds latency and complexity, but enables sophisticated coordination.

Anti-Pattern: Fire and Forget

// DON'T DO THIS
async function badAgentAction(data) {
  await updateRecord(data); // Just do it, no checking
  // Hope nothing goes wrong
}

At small scale, this works. At production scale, it causes constant coordination failures.

Implementing Coordination in Practice

Minimal Viable Coordination

If you're just starting, implement these basics:

Single knowledge base — No agent-local state
Timestamps on everything — Know when things happened
Agent identification — Know who did what
Basic conflict detection — Alert on suspicious patterns

Growing Into Sophistication

As you scale:

Add versioning — Optimistic locking for critical records
Implement ownership — Clear field-level boundaries
Build audit logs — Events, not just state
Create coordination rules — Which agent defers to which

The Coordination Mindset

Single-agent development: "Does this agent work correctly?"

Multi-agent development: "Do these agents work correctly together?"

The second question is harder. But as systems become more agentic, it becomes the only question that matters.

Build coordination in from the start. Retrofitting is painful.

The Coordination Problem: How to Prevent Agent Conflicts in Multi-Agent Systems