Multi-Agent Coordination: When AI Employees Need to Collaborate

TL;DR: As AI agent systems scale from single assistants to teams of specialized employees, coordination becomes critical. This article explores the architecture, message passing, conflict resolution, and state synchronization patterns we use at GetATeam to enable multiple AI agents to collaborate effectively.

The Multi-Agent Reality

Six months ago, our typical GetATeam user had one AI agent—maybe Sydney handling emails, or Joseph managing their blog. Simple. One session, one context, one agent doing one job.

Today, that same user has five agents: Sydney on email, Alex handling support tickets, Taylor managing social media, Jordan doing data analysis, and Morgan coordinating everything.

The problem? These agents can't work in isolation anymore.

When a customer emails asking about their order status, Sydney (email agent) needs to check with Jordan (data agent) who queries the database, then Taylor (social media agent) might need to post an update, while Morgan (coordinator) tracks that the task was completed.

This isn't science fiction. It's our production system handling 1000+ multi-agent conversations daily.

Why Single-Agent Patterns Break

Traditional AI agent architecture looks like this:

async function handleUserMessage(message) {
  const context = await loadContext(user.id);
  const response = await callLLM(message, context);
  await saveContext(user.id, response);
  return response;
}

Simple. Linear. Works great for one agent.

But when you add a second agent working for the same user, problems emerge:

Problem 1: Context Collision

// Agent A reads context at T0
const contextA = await loadContext(user.id); // { tasks: [] }

// Agent B reads context at T0+5ms
const contextB = await loadContext(user.id); // { tasks: [] }

// Agent A adds task
contextA.tasks.push('Send email to client');
await saveContext(user.id, contextA); // { tasks: ['Send email'] }

// Agent B adds task (overwrites!)
contextB.tasks.push('Update database');
await saveContext(user.id, contextB); // { tasks: ['Update database'] }

// Result: Agent A's task is lost!

Problem 2: Duplicate Work

Two agents receive the same task and both execute it. The customer gets two identical emails. Oops.

Problem 3: Conflicting Actions

Agent A decides to archive a conversation. Agent B decides to escalate it. Which wins?

Problem 4: No Shared Awareness

Agent A is waiting for data from an API. Agent B doesn't know this and tries to answer based on stale data.

These aren't edge cases. They're daily occurrences at scale.

Event-Driven Architecture: The Foundation

Our solution: event sourcing with message queues.

Instead of agents directly modifying shared state, they emit events:

// Agent A doesn't modify state directly
await emitEvent({
  type: 'TASK_CREATED',
  agentId: 'agent-a',
  userId: 'user-123',
  task: {
    id: 'task-456',
    description: 'Send email to client',
    status: 'pending',
    assignedTo: 'agent-a'
  },
  timestamp: Date.now()
});

Events go into a Redis-backed queue. A coordinator process consumes events and updates the canonical state:

async function handleEvent(event) {
  switch(event.type) {
    case 'TASK_CREATED':
      await redis.lpush(\`tasks:\${event.userId}\`, JSON.stringify(event.task));
      await notifyRelevantAgents(event);
      break;

    case 'TASK_COMPLETED':
      await redis.lrem(\`tasks:\${event.userId}\`, 1, JSON.stringify(event.task));
      await updateTaskStatus(event.task.id, 'completed');
      break;

    case 'TASK_ASSIGNED':
      await redis.hset(\`assignments:\${event.task.id}\`, 'agent', event.agentId);
      break;
  }
}

Benefits:

No lost updates - Events are atomic operations
Complete audit trail - Every action is logged
Time-travel debugging - Replay events to reconstruct state
Eventual consistency - All agents converge to same state

Task Handoff Protocol

When Agent A needs Agent B to do something, we use a structured handoff protocol:

async function handoffTask(fromAgent, toAgent, task) {
  // Step 1: Create handoff record
  const handoff = {
    id: generateId(),
    from: fromAgent.id,
    to: toAgent.id,
    task: task,
    context: await getRelevantContext(task),
    status: 'pending',
    createdAt: Date.now()
  };

  // Step 2: Emit handoff event
  await emitEvent({
    type: 'TASK_HANDOFF',
    handoff: handoff
  });

  // Step 3: Notify receiving agent via WebSocket
  await notifyAgent(toAgent.id, {
    type: 'NEW_TASK',
    handoff: handoff
  });

  // Step 4: Update sending agent's state
  await updateAgentState(fromAgent.id, {
    pendingHandoffs: [...fromAgent.pendingHandoffs, handoff.id]
  });

  return handoff.id;
}

The receiving agent acknowledges:

async function acknowledgeHandoff(handoffId, agentId) {
  await emitEvent({
    type: 'HANDOFF_ACKNOWLEDGED',
    handoffId: handoffId,
    agentId: agentId,
    timestamp: Date.now()
  });

  // Update handoff status
  await redis.hset(\`handoff:\${handoffId}\`, 'status', 'in-progress');
  await redis.hset(\`handoff:\${handoffId}\`, 'acceptedAt', Date.now());
}

Context Transfer is Critical

When Agent A hands off to Agent B, B needs context:

async function getRelevantContext(task) {
  return {
    // User preferences
    userPreferences: await getUserPreferences(task.userId),

    // Recent conversation history (last 10 messages)
    conversationHistory: await getRecentMessages(task.userId, 10),

    // Related tasks
    relatedTasks: await findRelatedTasks(task),

    // Agent A's notes
    handoffNotes: task.notes,

    // Any blocking dependencies
    dependencies: task.dependencies || []
  };
}

This ensures Agent B doesn't ask the user to repeat information they already provided to Agent A.

Priority Negotiation

When multiple agents want to do conflicting things, we use a priority negotiation system:

async function requestAction(agent, action) {
  // Step 1: Check if action conflicts with pending actions
  const conflicts = await findConflictingActions(action);

  if (conflicts.length === 0) {
    // No conflicts, execute immediately
    return executeAction(action);
  }

  // Step 2: Priority-based resolution
  const priorities = await Promise.all(
    conflicts.map(c => calculatePriority(c))
  );

  const myPriority = await calculatePriority(action);
  const maxConflictPriority = Math.max(...priorities);

  if (myPriority > maxConflictPriority) {
    // My action wins
    await cancelConflictingActions(conflicts);
    return executeAction(action);
  } else {
    // Conflicting action wins, queue mine
    return queueAction(action, { waitFor: conflicts });
  }
}

Priority calculation considers multiple factors:

function calculatePriority(action) {
  const factors = {
    userWaiting: action.requiresUserResponse ? 10 : 0,
    urgency: action.deadline ? calculateUrgency(action.deadline) : 5,
    importance: action.importance || 5,
    agentConfidence: action.confidence || 0.5
  };

  return (
    factors.userWaiting +
    factors.urgency * 0.4 +
    factors.importance * 0.3 +
    factors.agentConfidence * 0.2
  );
}

Avoiding Duplicate Work

We use a distributed lock pattern with Redis:

async function claimTask(agentId, taskId) {
  const lockKey = \`lock:task:\${taskId}\`;

  // Try to acquire lock with 30-second expiry
  const acquired = await redis.set(
    lockKey,
    agentId,
    'EX', 30,  // Expire after 30 seconds
    'NX'       // Only set if not exists
  );

  if (acquired === 'OK') {
    // We got the lock!
    return { claimed: true, agentId: agentId };
  }

  // Someone else has the lock
  const owner = await redis.get(lockKey);
  return { claimed: false, ownedBy: owner };
}

Agents try to claim tasks before working on them:

async function executeTask(agent, task) {
  const claim = await claimTask(agent.id, task.id);

  if (!claim.claimed) {
    console.log(\`Task \${task.id} already claimed by \${claim.ownedBy}\`);
    return;
  }

  try {
    // Do the work
    const result = await agent.performTask(task);

    // Release lock
    await redis.del(\`lock:task:\${task.id}\`);

    // Emit completion event
    await emitEvent({
      type: 'TASK_COMPLETED',
      taskId: task.id,
      agentId: agent.id,
      result: result
    });

    return result;
  } catch (error) {
    // Release lock on error
    await redis.del(\`lock:task:\${task.id}\`);
    throw error;
  }
}

State Synchronization

Each agent maintains its own local state but subscribes to global state changes:

class Agent {
  constructor(id) {
    this.id = id;
    this.localState = {};
    this.subscriptions = new Set();

    // Subscribe to Redis pub/sub
    this.subscriber = redis.duplicate();
    this.subscriber.subscribe(\`agent:\${id}:updates\`);

    this.subscriber.on('message', (channel, message) => {
      this.handleStateUpdate(JSON.parse(message));
    });
  }

  async handleStateUpdate(update) {
    switch(update.type) {
      case 'TASK_ASSIGNED':
        if (update.agentId === this.id) {
          this.localState.tasks.push(update.task);
        }
        break;

      case 'TASK_COMPLETED':
        this.localState.tasks = this.localState.tasks.filter(
          t => t.id !== update.taskId
        );
        break;

      case 'USER_CONTEXT_UPDATED':
        this.localState.userContext = {
          ...this.localState.userContext,
          ...update.context
        };
        break;
    }

    // Trigger re-evaluation
    await this.reevaluatePriorities();
  }
}

Coordinator Pattern

For complex workflows, we use a coordinator agent that orchestrates others:

class CoordinatorAgent {
  async handleUserRequest(request) {
    // Step 1: Decompose into subtasks
    const subtasks = await this.decomposeRequest(request);

    // Step 2: Identify required agents
    const agents = subtasks.map(st => this.findBestAgent(st));

    // Step 3: Create execution plan
    const plan = this.createExecutionPlan(subtasks, agents);

    // Step 4: Execute plan
    return this.executePlan(plan);
  }

  async executePlan(plan) {
    const results = [];

    for (const step of plan.steps) {
      if (step.parallel) {
        // Execute parallel steps concurrently
        const stepResults = await Promise.all(
          step.tasks.map(task => this.delegateTask(task))
        );
        results.push(...stepResults);
      } else {
        // Execute sequential steps
        for (const task of step.tasks) {
          const result = await this.delegateTask(task);
          results.push(result);

          // Check if we should continue
          if (result.shouldAbort) {
            return { aborted: true, results };
          }
        }
      }
    }

    return { completed: true, results };
  }

  async delegateTask(task) {
    const handoffId = await handoffTask(this, task.agent, task);

    // Wait for completion with timeout
    return this.waitForHandoffCompletion(handoffId, task.timeout || 60000);
  }
}

Real-World Example: Customer Support Flow

Here's how multiple agents coordinate on a real support ticket:

1. Email arrives (handled by Sydney - email agent):

// Sydney receives email
const email = await receiveEmail();

// Sydney creates ticket
await emitEvent({
  type: 'TICKET_CREATED',
  ticket: {
    id: 'ticket-789',
    from: email.from,
    subject: email.subject,
    body: email.body,
    priority: analyzePriority(email)
  }
});

// Sydney hands off to Alex (support agent)
await handoffTask(sydney, alex, {
  type: 'HANDLE_SUPPORT_TICKET',
  ticketId: 'ticket-789'
});

2. Alex (support agent) analyzes ticket:

// Alex receives handoff
const ticket = await getTicket('ticket-789');

// Alex determines this needs data from database
await handoffTask(alex, jordan, {
  type: 'QUERY_ORDER_STATUS',
  orderId: extractOrderId(ticket.body)
});

// Alex waits for Jordan's response
const orderStatus = await waitForHandoff(handoffId);

3. Jordan (data agent) queries database:

// Jordan receives task
const orderStatus = await db.query(
  'SELECT * FROM orders WHERE id = ?',
  [orderId]
);

// Jordan hands back to Alex
await completeHandoff(handoffId, { orderStatus });

4. Alex composes response:

// Alex has the data, composes response
const response = await composeSupportResponse(ticket, orderStatus);

// Alex hands to Sydney to send
await handoffTask(alex, sydney, {
  type: 'SEND_EMAIL',
  to: ticket.from,
  subject: \`Re: \${ticket.subject}\`,
  body: response
});

5. Sydney sends email:

// Sydney sends
await sendEmail(email);

// Sydney marks ticket complete
await emitEvent({
  type: 'TICKET_COMPLETED',
  ticketId: 'ticket-789'
});

All of this happens autonomously with proper coordination, no duplicate work, and complete audit trail.

Performance Metrics

After deploying multi-agent coordination:

Before (single agent doing everything):

Average response time: 4.2 minutes
Task completion rate: 67%
Error rate: 12%

After (coordinated multi-agent):

Average response time: 1.8 minutes (57% faster)
Task completion rate: 91% (24% improvement)
Error rate: 3% (75% reduction)

Why the improvement?

Specialized agents are more efficient at their specific tasks
Parallel execution of independent subtasks
Better error handling (one agent failure doesn't break everything)
Reduced context switching within agents

Challenges and Gotchas

1. Message Queue Overhead

Early versions had significant latency from queue operations. We optimized with batching:

// Instead of individual events
await emitEvent(event1);
await emitEvent(event2);
await emitEvent(event3);

// Batch events
await emitEventBatch([event1, event2, event3]);

Reduced queue latency from 150ms to 12ms.

2. Circular Handoffs

Agent A hands to B, B hands to C, C hands back to A. Infinite loop!

We added cycle detection:

function detectCycle(handoff) {
  const visited = new Set();
  let current = handoff;

  while (current) {
    if (visited.has(current.id)) {
      throw new Error(\`Circular handoff detected: \${Array.from(visited).join(' -> ')}\`);
    }
    visited.add(current.id);
    current = current.previousHandoff;
  }
}

3. State Explosion

With 10 agents and 100 users, state grows fast. We implemented aggressive cleanup:

// Expire completed tasks after 24 hours
await redis.expire(\`tasks:completed:\${taskId}\`, 86400);

// Keep only last 50 messages per conversation
await redis.ltrim(\`messages:\${conversationId}\`, 0, 49);

What's Next

We're exploring:

1. Agent Learning from Coordination Agents that adjust their handoff strategies based on what worked in the past.

2. Dynamic Agent Spawning Create specialized agents on-the-fly for specific tasks, then destroy them.

3. Cross-User Agent Collaboration Agents from different users collaborating (with proper permissions, of course).

Conclusion

Multi-agent coordination isn't about replacing humans with a single super-intelligent AI. It's about building teams of specialized AI employees that collaborate like human teams do—with clear communication, shared context, and mutual awareness.

At GetATeam, our agents don't just work alongside each other. They actively coordinate, hand off tasks, negotiate priorities, and synchronize state. Just like a real team.

The future of AI isn't one agent doing everything. It's many agents doing what they do best, together.

Joseph Benguira - CTO & Founder @ GetATeam

Want AI employees that actually work as a team? We're in private alpha. Reach out at joseph.benguira@getateam.org