The Coordination Problem: How to Prevent Agent Conflicts in Multi-Agent Systems
One agent is easy. Multiple agents sharing state? That's where things break.
The Problem Emerges
You've got it working:
- Sales agent updates customer records
- Support agent logs interactions
- Product agent tracks feature requests
Each works great in isolation.
Then they collide.
Sales marks a customer as "churned" while Support is mid-conversation. Product links a feature request to a customer that Sales just deleted. Support's interaction log references a conversation that Product's cleanup archived.
Welcome to the coordination problem.
Why Coordination Is Hard
Multi-agent systems face classic distributed systems challenges:
Race conditions: Two agents read the same state, make decisions, write back. Last write wins, first write's context is lost.
Stale reads: Agent makes decision based on data that changed milliseconds after it read.
Conflicting goals: Sales wants to close the deal; Support wants to escalate the issue. Both are "right" in their domain.
Cascading effects: Agent A's action triggers Agent B, which triggers Agent C, which breaks Agent A's assumption.
These aren't edge cases. At scale, they're inevitable.
A Real Scenario
Timeline:
10:00:00 - Sales agent reads customer "Acme": status=active, health=good
10:00:01 - Support agent receives escalation for Acme, marks status=at_risk
10:00:02 - Sales agent, unaware, sends upsell campaign to Acme
10:00:03 - Customer receives cheerful upsell email during crisis
10:00:04 - Customer tweets about tone-deaf vendorTotal elapsed time: 4 seconds. Damage: significant.
Pattern 1: The Single Source of Truth
Problem: Agents have different views of reality.
Solution: One authoritative knowledge base that all agents read and write.
┌──────────────┐
│ Central │
│ Knowledge │
│ Base │
└──────────────┘
↗ ↑ ↖
/ | \
┌─────┐ ┌─────┐ ┌─────┐
│Sales│ │Supp │ │Prod │
│Agent│ │Agent│ │Agent│
└─────┘ └─────┘ └─────┘Implementation:
- All reads go to the same source
- All writes go to the same source
- No agent maintains separate state
Caveat: Single source helps consistency but doesn't solve race conditions.
Pattern 2: Read-Before-Write Verification
Problem: Agent acts on stale data.
Solution: Re-read immediately before writing, verify assumptions still hold.
async function updateCustomerStatus(agentId, customerId, newStatus, reason) {
// Read current state
const current = await getCustomer(customerId);
// Verify our assumption
if (current.status === "churned") {
return { success: false, reason: "Customer already churned" };
}
// Apply our change with context
await updateCustomer(customerId, {
status: newStatus,
last_updated_by: agentId,
update_reason: reason,
previous_status: current.status,
});
return { success: true };
}Gap: Still vulnerable to changes between read and write. Better but not bulletproof.
Pattern 3: Optimistic Locking
Problem: Changes between read and write.
Solution: Version numbers. Write only succeeds if version matches.
async function safeUpdate(customerId, changes, expectedVersion) {
const result = await updateCustomer(customerId, {
...changes,
version: expectedVersion + 1,
// WHERE version = expectedVersion
});
if (result.rowsAffected === 0) {
// Someone else changed it
return { success: false, reason: "Conflict - please re-read and retry" };
}
return { success: true };
}Agent behavior:
- Read record with version
- Make decision
- Attempt write with version check
- If conflict, re-read and reconsider
Pattern 4: Domain Ownership
Problem: Multiple agents legitimately update the same record.
Solution: Designate which agent owns which fields.
Customer Record:
├── sales_fields (owned by Sales Agent)
│ ├── deal_stage
│ ├── contract_value
│ └── renewal_date
│
├── support_fields (owned by Support Agent)
│ ├── open_tickets
│ ├── last_interaction
│ └── satisfaction_score
│
└── product_fields (owned by Product Agent)
├── feature_requests
├── usage_metrics
└── beta_programsRules:
- Agents can READ any field
- Agents can only WRITE to their owned fields
- Cross-domain updates require explicit handoff
Benefit: Eliminates most conflicts by design.
Pattern 5: Event Sourcing
Problem: Understanding what happened and why.
Solution: Don't store state directly. Store events. Derive state.
Events:
├── CustomerCreated(id=123, name="Acme", by=sales)
├── StatusChanged(id=123, from=prospect, to=active, by=sales, reason="Contract signed")
├── TicketOpened(id=123, ticket=456, by=support, priority=high)
├── StatusChanged(id=123, from=active, to=at_risk, by=support, reason="Escalation")
└── ...
Current State = replay all eventsBenefits:
- Full audit trail
- Can see exactly what each agent did
- Can replay to debug conflicts
- Can revert by adding compensating events
Cost: More complex architecture.
Pattern 6: Agent Awareness
Problem: Agents don't know about each other's activities.
Solution: Agents check for recent activity before acting.
async function beforeActing(agentId, customerId, plannedAction) {
// Check recent activity
const recentEvents = await getRecentEvents(customerId, { minutes: 5 });
if (recentEvents.length > 0) {
// Another agent was just here
const lastEvent = recentEvents[0];
if (lastEvent.agent !== agentId) {
// Decide whether to proceed
if (shouldDefer(plannedAction, lastEvent)) {
return { proceed: false, reason: `Deferring to ${lastEvent.agent}` };
}
}
}
return { proceed: true };
}Example rules:
- Support actions take priority over Sales during active tickets
- Product actions defer to Support during escalations
- All agents defer to explicit human overrides
Pattern 7: Coordination Service
Problem: Agents need to coordinate complex multi-step processes.
Solution: A coordinator that manages agent orchestration.
┌───────────────┐
│ Coordinator │
│ Service │
└───────────────┘
↙ ↓ ↘
┌─────┐ ┌─────┐ ┌─────┐
│Sales│ │Supp │ │Prod │
└─────┘ └─────┘ └─────┘Coordinator responsibilities:
- Receive requests from agents
- Determine if action is safe
- Acquire locks if needed
- Execute or queue action
- Notify affected agents
Tradeoff: Adds latency and complexity, but enables sophisticated coordination.
Anti-Pattern: Fire and Forget
// DON'T DO THIS
async function badAgentAction(data) {
await updateRecord(data); // Just do it, no checking
// Hope nothing goes wrong
}At small scale, this works. At production scale, it causes constant coordination failures.
Implementing Coordination in Practice
Minimal Viable Coordination
If you're just starting, implement these basics:
- Single knowledge base — No agent-local state
- Timestamps on everything — Know when things happened
- Agent identification — Know who did what
- Basic conflict detection — Alert on suspicious patterns
Growing Into Sophistication
As you scale:
- Add versioning — Optimistic locking for critical records
- Implement ownership — Clear field-level boundaries
- Build audit logs — Events, not just state
- Create coordination rules — Which agent defers to which
The Coordination Mindset
Single-agent development: "Does this agent work correctly?"
Multi-agent development: "Do these agents work correctly together?"
The second question is harder. But as systems become more agentic, it becomes the only question that matters.
Build coordination in from the start. Retrofitting is painful.
Coordinate Your Agents
Xtended's structured knowledge base provides the single source of truth your agents need. Timestamps, audit trails, and consistent APIs enable coordination patterns that work.
Get Started Free