TL;DR: As AI agent systems scale from single assistants to teams of specialized employees, coordination becomes critical. This article explores the architecture, message passing, conflict resolution, and state synchronization patterns we use at GetATeam to enable multiple AI agents to collaborate effectively.
The Multi-Agent Reality
Six months ago, our typical GetATeam user had one AI agent—maybe Sydney handling emails, or Joseph managing their blog. Simple. One session, one context, one agent doing one job.
Today, that same user has five agents: Sydney on email, Alex handling support tickets, Taylor managing social media, Jordan doing data analysis, and Morgan coordinating everything.
The problem? These agents can't work in isolation anymore.
When a customer emails asking about their order status, Sydney (email agent) needs to check with Jordan (data agent) who queries the database, then Taylor (social media agent) might need to post an update, while Morgan (coordinator) tracks that the task was completed.
This isn't science fiction. It's our production system handling 1000+ multi-agent conversations daily.
Why Single-Agent Patterns Break
Traditional AI agent architecture looks like this:
async function handleUserMessage(message) {
const context = await loadContext(user.id);
const response = await callLLM(message, context);
await saveContext(user.id, response);
return response;
}
Simple. Linear. Works great for one agent.
But when you add a second agent working for the same user, problems emerge:
Problem 1: Context Collision
// Agent A reads context at T0
const contextA = await loadContext(user.id); // { tasks: [] }
// Agent B reads context at T0+5ms
const contextB = await loadContext(user.id); // { tasks: [] }
// Agent A adds task
contextA.tasks.push('Send email to client');
await saveContext(user.id, contextA); // { tasks: ['Send email'] }
// Agent B adds task (overwrites!)
contextB.tasks.push('Update database');
await saveContext(user.id, contextB); // { tasks: ['Update database'] }
// Result: Agent A's task is lost!
Problem 2: Duplicate Work
Two agents receive the same task and both execute it. The customer gets two identical emails. Oops.
Problem 3: Conflicting Actions
Agent A decides to archive a conversation. Agent B decides to escalate it. Which wins?
Problem 4: No Shared Awareness
Agent A is waiting for data from an API. Agent B doesn't know this and tries to answer based on stale data.
These aren't edge cases. They're daily occurrences at scale.
Event-Driven Architecture: The Foundation
Our solution: event sourcing with message queues.
Instead of agents directly modifying shared state, they emit events:
// Agent A doesn't modify state directly
await emitEvent({
type: 'TASK_CREATED',
agentId: 'agent-a',
userId: 'user-123',
task: {
id: 'task-456',
description: 'Send email to client',
status: 'pending',
assignedTo: 'agent-a'
},
timestamp: Date.now()
});
Events go into a Redis-backed queue. A coordinator process consumes events and updates the canonical state:
async function handleEvent(event) {
switch(event.type) {
case 'TASK_CREATED':
await redis.lpush(\`tasks:\${event.userId}\`, JSON.stringify(event.task));
await notifyRelevantAgents(event);
break;
case 'TASK_COMPLETED':
await redis.lrem(\`tasks:\${event.userId}\`, 1, JSON.stringify(event.task));
await updateTaskStatus(event.task.id, 'completed');
break;
case 'TASK_ASSIGNED':
await redis.hset(\`assignments:\${event.task.id}\`, 'agent', event.agentId);
break;
}
}
Benefits:
- No lost updates - Events are atomic operations
- Complete audit trail - Every action is logged
- Time-travel debugging - Replay events to reconstruct state
- Eventual consistency - All agents converge to same state
Task Handoff Protocol
When Agent A needs Agent B to do something, we use a structured handoff protocol:
async function handoffTask(fromAgent, toAgent, task) {
// Step 1: Create handoff record
const handoff = {
id: generateId(),
from: fromAgent.id,
to: toAgent.id,
task: task,
context: await getRelevantContext(task),
status: 'pending',
createdAt: Date.now()
};
// Step 2: Emit handoff event
await emitEvent({
type: 'TASK_HANDOFF',
handoff: handoff
});
// Step 3: Notify receiving agent via WebSocket
await notifyAgent(toAgent.id, {
type: 'NEW_TASK',
handoff: handoff
});
// Step 4: Update sending agent's state
await updateAgentState(fromAgent.id, {
pendingHandoffs: [...fromAgent.pendingHandoffs, handoff.id]
});
return handoff.id;
}
The receiving agent acknowledges:
async function acknowledgeHandoff(handoffId, agentId) {
await emitEvent({
type: 'HANDOFF_ACKNOWLEDGED',
handoffId: handoffId,
agentId: agentId,
timestamp: Date.now()
});
// Update handoff status
await redis.hset(\`handoff:\${handoffId}\`, 'status', 'in-progress');
await redis.hset(\`handoff:\${handoffId}\`, 'acceptedAt', Date.now());
}
Context Transfer is Critical
When Agent A hands off to Agent B, B needs context:
async function getRelevantContext(task) {
return {
// User preferences
userPreferences: await getUserPreferences(task.userId),
// Recent conversation history (last 10 messages)
conversationHistory: await getRecentMessages(task.userId, 10),
// Related tasks
relatedTasks: await findRelatedTasks(task),
// Agent A's notes
handoffNotes: task.notes,
// Any blocking dependencies
dependencies: task.dependencies || []
};
}
This ensures Agent B doesn't ask the user to repeat information they already provided to Agent A.
Priority Negotiation
When multiple agents want to do conflicting things, we use a priority negotiation system:
async function requestAction(agent, action) {
// Step 1: Check if action conflicts with pending actions
const conflicts = await findConflictingActions(action);
if (conflicts.length === 0) {
// No conflicts, execute immediately
return executeAction(action);
}
// Step 2: Priority-based resolution
const priorities = await Promise.all(
conflicts.map(c => calculatePriority(c))
);
const myPriority = await calculatePriority(action);
const maxConflictPriority = Math.max(...priorities);
if (myPriority > maxConflictPriority) {
// My action wins
await cancelConflictingActions(conflicts);
return executeAction(action);
} else {
// Conflicting action wins, queue mine
return queueAction(action, { waitFor: conflicts });
}
}
Priority calculation considers multiple factors:
function calculatePriority(action) {
const factors = {
userWaiting: action.requiresUserResponse ? 10 : 0,
urgency: action.deadline ? calculateUrgency(action.deadline) : 5,
importance: action.importance || 5,
agentConfidence: action.confidence || 0.5
};
return (
factors.userWaiting +
factors.urgency * 0.4 +
factors.importance * 0.3 +
factors.agentConfidence * 0.2
);
}
Avoiding Duplicate Work
We use a distributed lock pattern with Redis:
async function claimTask(agentId, taskId) {
const lockKey = \`lock:task:\${taskId}\`;
// Try to acquire lock with 30-second expiry
const acquired = await redis.set(
lockKey,
agentId,
'EX', 30, // Expire after 30 seconds
'NX' // Only set if not exists
);
if (acquired === 'OK') {
// We got the lock!
return { claimed: true, agentId: agentId };
}
// Someone else has the lock
const owner = await redis.get(lockKey);
return { claimed: false, ownedBy: owner };
}
Agents try to claim tasks before working on them:
async function executeTask(agent, task) {
const claim = await claimTask(agent.id, task.id);
if (!claim.claimed) {
console.log(\`Task \${task.id} already claimed by \${claim.ownedBy}\`);
return;
}
try {
// Do the work
const result = await agent.performTask(task);
// Release lock
await redis.del(\`lock:task:\${task.id}\`);
// Emit completion event
await emitEvent({
type: 'TASK_COMPLETED',
taskId: task.id,
agentId: agent.id,
result: result
});
return result;
} catch (error) {
// Release lock on error
await redis.del(\`lock:task:\${task.id}\`);
throw error;
}
}
State Synchronization
Each agent maintains its own local state but subscribes to global state changes:
class Agent {
constructor(id) {
this.id = id;
this.localState = {};
this.subscriptions = new Set();
// Subscribe to Redis pub/sub
this.subscriber = redis.duplicate();
this.subscriber.subscribe(\`agent:\${id}:updates\`);
this.subscriber.on('message', (channel, message) => {
this.handleStateUpdate(JSON.parse(message));
});
}
async handleStateUpdate(update) {
switch(update.type) {
case 'TASK_ASSIGNED':
if (update.agentId === this.id) {
this.localState.tasks.push(update.task);
}
break;
case 'TASK_COMPLETED':
this.localState.tasks = this.localState.tasks.filter(
t => t.id !== update.taskId
);
break;
case 'USER_CONTEXT_UPDATED':
this.localState.userContext = {
...this.localState.userContext,
...update.context
};
break;
}
// Trigger re-evaluation
await this.reevaluatePriorities();
}
}
Coordinator Pattern
For complex workflows, we use a coordinator agent that orchestrates others:
class CoordinatorAgent {
async handleUserRequest(request) {
// Step 1: Decompose into subtasks
const subtasks = await this.decomposeRequest(request);
// Step 2: Identify required agents
const agents = subtasks.map(st => this.findBestAgent(st));
// Step 3: Create execution plan
const plan = this.createExecutionPlan(subtasks, agents);
// Step 4: Execute plan
return this.executePlan(plan);
}
async executePlan(plan) {
const results = [];
for (const step of plan.steps) {
if (step.parallel) {
// Execute parallel steps concurrently
const stepResults = await Promise.all(
step.tasks.map(task => this.delegateTask(task))
);
results.push(...stepResults);
} else {
// Execute sequential steps
for (const task of step.tasks) {
const result = await this.delegateTask(task);
results.push(result);
// Check if we should continue
if (result.shouldAbort) {
return { aborted: true, results };
}
}
}
}
return { completed: true, results };
}
async delegateTask(task) {
const handoffId = await handoffTask(this, task.agent, task);
// Wait for completion with timeout
return this.waitForHandoffCompletion(handoffId, task.timeout || 60000);
}
}
Real-World Example: Customer Support Flow
Here's how multiple agents coordinate on a real support ticket:
1. Email arrives (handled by Sydney - email agent):
// Sydney receives email
const email = await receiveEmail();
// Sydney creates ticket
await emitEvent({
type: 'TICKET_CREATED',
ticket: {
id: 'ticket-789',
from: email.from,
subject: email.subject,
body: email.body,
priority: analyzePriority(email)
}
});
// Sydney hands off to Alex (support agent)
await handoffTask(sydney, alex, {
type: 'HANDLE_SUPPORT_TICKET',
ticketId: 'ticket-789'
});
2. Alex (support agent) analyzes ticket:
// Alex receives handoff
const ticket = await getTicket('ticket-789');
// Alex determines this needs data from database
await handoffTask(alex, jordan, {
type: 'QUERY_ORDER_STATUS',
orderId: extractOrderId(ticket.body)
});
// Alex waits for Jordan's response
const orderStatus = await waitForHandoff(handoffId);
3. Jordan (data agent) queries database:
// Jordan receives task
const orderStatus = await db.query(
'SELECT * FROM orders WHERE id = ?',
[orderId]
);
// Jordan hands back to Alex
await completeHandoff(handoffId, { orderStatus });
4. Alex composes response:
// Alex has the data, composes response
const response = await composeSupportResponse(ticket, orderStatus);
// Alex hands to Sydney to send
await handoffTask(alex, sydney, {
type: 'SEND_EMAIL',
to: ticket.from,
subject: \`Re: \${ticket.subject}\`,
body: response
});
5. Sydney sends email:
// Sydney sends
await sendEmail(email);
// Sydney marks ticket complete
await emitEvent({
type: 'TICKET_COMPLETED',
ticketId: 'ticket-789'
});
All of this happens autonomously with proper coordination, no duplicate work, and complete audit trail.
Performance Metrics
After deploying multi-agent coordination:
Before (single agent doing everything):
- Average response time: 4.2 minutes
- Task completion rate: 67%
- Error rate: 12%
After (coordinated multi-agent):
- Average response time: 1.8 minutes (57% faster)
- Task completion rate: 91% (24% improvement)
- Error rate: 3% (75% reduction)
Why the improvement?
- Specialized agents are more efficient at their specific tasks
- Parallel execution of independent subtasks
- Better error handling (one agent failure doesn't break everything)
- Reduced context switching within agents
Challenges and Gotchas
1. Message Queue Overhead
Early versions had significant latency from queue operations. We optimized with batching:
// Instead of individual events
await emitEvent(event1);
await emitEvent(event2);
await emitEvent(event3);
// Batch events
await emitEventBatch([event1, event2, event3]);
Reduced queue latency from 150ms to 12ms.
2. Circular Handoffs
Agent A hands to B, B hands to C, C hands back to A. Infinite loop!
We added cycle detection:
function detectCycle(handoff) {
const visited = new Set();
let current = handoff;
while (current) {
if (visited.has(current.id)) {
throw new Error(\`Circular handoff detected: \${Array.from(visited).join(' -> ')}\`);
}
visited.add(current.id);
current = current.previousHandoff;
}
}
3. State Explosion
With 10 agents and 100 users, state grows fast. We implemented aggressive cleanup:
// Expire completed tasks after 24 hours
await redis.expire(\`tasks:completed:\${taskId}\`, 86400);
// Keep only last 50 messages per conversation
await redis.ltrim(\`messages:\${conversationId}\`, 0, 49);
What's Next
We're exploring:
1. Agent Learning from Coordination Agents that adjust their handoff strategies based on what worked in the past.
2. Dynamic Agent Spawning Create specialized agents on-the-fly for specific tasks, then destroy them.
3. Cross-User Agent Collaboration Agents from different users collaborating (with proper permissions, of course).
Conclusion
Multi-agent coordination isn't about replacing humans with a single super-intelligent AI. It's about building teams of specialized AI employees that collaborate like human teams do—with clear communication, shared context, and mutual awareness.
At GetATeam, our agents don't just work alongside each other. They actively coordinate, hand off tasks, negotiate priorities, and synchronize state. Just like a real team.
The future of AI isn't one agent doing everything. It's many agents doing what they do best, together.
Joseph Benguira - CTO & Founder @ GetATeam
Want AI employees that actually work as a team? We're in private alpha. Reach out at joseph.benguira@getateam.org