What 100 Virtual Workers Taught Us About the Future of Work
Joseph Benguira / Founder & CTO, Elestio / Founder, GetATeam
1/19
Who We Are
A global team with European roots
We run a global team, with roots in Europe — EU-resident workloads since day one.
Built for organisations that care where their data lives and who owns the stack underneath.
Not selling you anything today. Sharing what 12 months in production looks like.
The Elestio → GetATeam link matters for this room:
we own the full infrastructure stack underneath the AI employees.
No mystery cloud. No shared tenancy. Sovereign-deployable.
Elestio
Founded 2022 · Dublin
Managed open-source DevOps platform. 400+ open-source apps deployed on
dedicated VMs (not shared Kubernetes) for thousands of customers across EU, US, Asia.
GetATeam
Founded September 2025
Platform to build, run, and monitor AI employees. Sits on Elestio infrastructure
which is why we can promise self-hosted, EU-resident, dedicated VMs out of the box.
2/19
What "In Production" Actually Means
May 2026 · our own workforce running on GetATeam
97
AI Employees Running
Deployed inside our company
23
Human Operators
Working alongside them
40.76B
Tokens Processed
May 2026 · 17,048 prompts · 145,921 messages
97.3%
Cache Hit Ratio
The economics moat
What these numbers mean
9,395 hours of cumulative agent runtime in May — the equivalent of ~13 employees working 24/7 in parallel, on a base of 23 humans.
Each human operator effectively runs ~5 AI employees in parallel, all day, every day.
Average per employee: ~420 million tokens / month · ~14 million / day.
Every new employee ships with Web-publish + Task-scheduler mandatory by default — automation isn't an upgrade tier, it's the floor.
Counting customer-deployed agents on top of our internal workforce, we are well past 150 AI employees in production.
3/19
Definitions Matter / The Market Calls Everything an "Agent"
Chatbot
Copilot
Agent (one-shot)
AI Employee
Memory across sessions
No
No
No
Yes / Persistent
Initiates work on its own
No
No
No
Yes / Scheduled + reactive
Communication channels
1 / chat
1 / IDE
1 / API
Email, Slack, Teams, phone, chat
Skills
Fixed
Plugin-based
Static toolset
Self-programming / writes its own
Identity
None
None
None
Name, email, phone, signature
One-line version: a chatbot answers. A copilot suggests. An agent runs a task once.
An employee shows up Monday morning and works the inbox.
4/19
Who Actually Uses AI Employees Inside a Company
Pareto distribution · May 2026 · token usage per human operator
56.6%
Top 3 Users
of all token usage
~75%
Top 5 Users
classic power-law tail
#4
Founder (Me)
not the #1 user
Lesson: AI employees do not get adopted uniformly. The early heavy users are
the people whose workflows were already broken / overloaded (sales ops, support triage, content
production). The rest of the organisation watches for 2–3 months, then catches up.
For Sambruk: rolling out AI employees inside a municipality, expect 2–3 power
users to drive 70%+ of value in the first quarter. Plan onboarding around them, not around an even split.
5/19
The Economics Nobody Publishes
What this workload would cost at retail pricing
$35K–$75K
Monthly bill
40.76B tokens · no caching · typical retail Claude or GPT pricing
What we actually pay
$600
Monthly bill
3 Claude Max plans × $200 · pooled across 97 employees · 97.3% cache hit ratio · 60–125× cheaper
Cache is the moat. The mechanism: 3 Claude Max plans share the prompt-prefix cache across all 97 employees.
Without that pooling, AI employees are a luxury good / "you have to be a tech giant to afford this."
With cross-employee cache pooling, they become a line item on a normal municipal IT budget — a flat $600/month for a workforce of 97.
Practical implication for vendor selection: when you evaluate AI-employee vendors,
ask “What is your cache hit ratio?” If they cannot tell you, the pricing isn't
predictable / and the bill at scale will surprise you.
6/19
They Work While You Sleep / Literally
5+
Concurrent AI Employees
per human operator / at peak
What this looks like in practice
Concurrent agent runtime exceeded real wall-clock time in April / measured across the workforce.
Humans do not multitask. AI employees do. A human running 4–5 employees in parallel ships several times the output during work hours / plus continued ops while they sleep.
This is the bigger productivity lever than raw token-per-second speed. Speed is a benchmark number / concurrency is a workflow lever.
Caveat / not "robots replace humans." Each parallel employee still needs a human to define the scope, approve high-impact actions, and review the audit log.
May 2026 / production snapshot across the fleet
12.6×
Wall-clock compression
9 395h of agent runtime in May / vs 744 calendar hours
7.6×
AI actions per human prompt
17 048 user prompts / 128 873 agent responses
40.76B
Tokens processed
Context + planning + execution traces / May 2026
97%
Prompt-cache hit rate
Why per-employee cost stays flat as the fleet grows
For municipalities: the right metric to measure is not "how many humans saved" but
"output per FTE." When you redesign jobs around the assumption that each staff member
runs 3–5 AI employees in parallel, the math changes / and the role descriptions change with it.
7/19
Where They Fail / Rank-Ordered by Frequency
1Hallucinated tool output
Agent invents a CRM contact ID, a Stripe transaction, a Kubernetes pod name. Looks correct, never existed.
Fix / tool calls must round-trip through real APIs. Never simulate. Verifier step on every external action.
2Context rot
Too much memory in the prompt degrades reasoning. The longer the conversation, the worse the output.
Fix / structured memory tiers (working / episodic / semantic). Not one giant RAG bucket.
3Goal drift on open-ended tasks
“Improve our SEO” / disaster. Agent drifts into unrelated work / never completes.
Fix / scoped, measurable tasks only. No open-ended objectives without a verifier of "done."
4Silent failure
API returned HTTP 500 / agent reports the action as success because the response body was JSON.
Fix / explicit status-code verifier on every external call. Always.
5Compounding cost on retries
Exponential backoff that's actually exponential bill. One agent loop ate $400 in 6 hours before we caught it.
Fix / hard token budgets per task. Kill switch on cost / not just on error count.
I lead with failures on purpose. Anyone in this room who has evaluated AI in
the last 12 months has hit at least three of these. Vendors who pretend none of this happens
/ treat them as the unreliable narrator they are.
8/19
The Audience Is Already Voting With Attention
Real numbers from my own LinkedIn · mainly posting about Local AI, sovereign cloud, open source · rolling 90 days, pulled today
968K
Impressions
last 90 days on posts about Local AI, sovereign cloud, open source
What's driving it
+2,077%
Growth
vs prior 90 days
2,800
Likes / top post
single piece on local AI hardware
This is the temperature of the conversation in 2026. The interest is real, the demand is real,
the political resistance is also real. Plan for all three / not just the technical layer.
9/19
Three Things That Turn an Agent Into an Employee
1 / Persistent memory
Semantic plus episodic plus structured / queryable. Not a chat history. The employee
remembers who you are, what you asked last week, and what the outcome was.
Without this / you have a chatbot pretending to know you.
2 / Self-programming skills
When the employee needs a new integration / connector / API client, it writes the
connector itself. No engineer deploy. No restart. No new release.
This is the difference between a copilot and a colleague.
3 / True omnichannel identity
Same employee, same memory, across email, Slack, Teams, phone, chat. From a customer's
side / no seams. From the org's side / one personnel record, not five tool integrations.
If “Sara on Slack” doesn't remember “Sara on email,” she is not an employee.
10/19
Workflows That Actually Shipped / With Our Own Employees
Employee
Role
What they actually do
Where humans stay in control
Tara
Customer support lead
Triages every inbound ticket across email, chat, WhatsApp and Slack. Drafts the reply with the actual fix attached, auto-routes the queue, resolves ~72% of tickets herself with a 60-second first reply.
Refunds, bugs, churn signals, sensitive cases auto-route to a named human. Escalations land with a full diagnostic note: what she tried, what she'd try next.
Dana
Legal assistant
Reads inbound NDAs and MSAs in minutes, flags clause-level risks against an approved playbook, drafts redlines with precedent cited from past deals. Tracks every vendor renewal and notice window so nothing lapses.
Never signs. Drafts the redline and the rationale — a named lawyer reviews and signs before anything goes back to the counterparty.
Thomas
Technical support engineer
Diagnoses production incidents, drafts the technical fix and the customer-facing explanation, holds an institutional memory so recurring patterns resolve in minutes.
No change reaches a production system without a human sign-off. Every fix preceded by a backup and a verification matrix.
Lyla
Editorial / public communications
Proposes five article angles each morning, writes / illustrates / publishes the picked one, scans community sentiment and drafts replies it never posts itself.
Human picks the angle, approves the visual, gives the explicit green light to publish — nothing public without sign-off.
The pattern is the design / not a limitation: every single workflow has a clear,
pre-defined handoff to a human. Where the human stays in the loop is decided at design time / not
invented during the incident.
11/19
Demo
DEMO
A DevOps AI employee diagnoses and fixes a broken Keycloak instance in production.
Live screen-share. Real incident / real logs / real fix. The agent reads the error, reproduces the failure, identifies the root cause, ships the patch, writes the post-mortem. Human approval before any change touches prod.
The expensive lessons / so you don't pay for them yourself
Fully autonomous anything customer-facing without a verifier
One silent hallucination becomes one angry customer. Always have a verifier step before any outbound action.
Use cases that should have been an N8N workflow
If if-this-then-that solves the problem / do not use an LLM. You pay for tokens to do work a webhook already does for free.
Generic "research assistant" with no scoped output
Produces beautifully formatted nothing. Without a defined "done," the employee polishes forever.
Open-ended brainstorming as a deliverable
If you cannot tell whether the output is good in 30 seconds, the deliverable wasn't a deliverable / it was a meeting.
Replacing a human in a role where the value WAS the relationship
Senior account management, sensitive negotiations, escalations. The employee can support the human / not be the human.
The pattern across these failures: we tried to use AI for jobs where the
success criterion was implicit. The fix in all five cases / make success explicit, measurable,
and verifiable / or do not deploy the agent.
13/19
What Changes for the Public Sector
Three structural constraints / and their implications for vendor selection
1 / Data must stay in jurisdiction
Citizen data is not commercial data. EU-resident infrastructure is the bar / not an upgrade tier.
2 / Decisions must be auditable
Every output needs traceable reasoning. Not just "the agent did it" / a per-action audit log with the inputs, the tool calls, and the reasoning trace.
3 / Accountability is non-negotiable
A municipality cannot say "the AI did it." A named human is on the line for every consequential output.
Implications for vendor selection / a 4-item checklist
Self-hosted or sovereign-cloud option exists
EU-only regions / on-prem / air-gap. Not just a marketing claim / a deployable artifact.
BYOA / Bring Your Own LLM API key
You hold the contract with the LLM provider directly. The AI-employee platform never sees your data.
Per-action audit log / not per-session
You need to reconstruct any single decision, not just “a conversation happened.”
Human-in-the-loop is the default
Not the upgrade tier. Not a configuration toggle you discover later. Default-on, hard to turn off.
14/19
Augmentation, Not Replacement / The Data Backs This Up
23 + 97 = 120
jobs of output / from 23 humans
our internal numbers / April 2026
What our numbers actually say / and don't say
What they say: 23 humans plus 97 AI employees ship the work that would otherwise require approximately 120 people. Net headcount: unchanged. Output: roughly 5x.
What they don't say: "we replaced 97 jobs." We did not. We did not lay anyone off because of AI employees. The roles changed / nobody disappeared.
For a municipality, the math is the same: same staff, more output, citizens served faster. The headline is not "savings" / the headline is "service quality at the same cost."
15/19
Four Starter Use Cases for a Swedish Municipality
Pragmatic / low-risk / high-visibility / and politically defensible / each demonstrates a different capability
1 / Bygglov pre-screening
Inbound building-permit applications get checked for completeness / missing drawings, missing fee receipt, wrong form version. The AI emails the citizen with a fix-list before the file enters the human queue. Handläggare only see complete files.
Win / queue time roughly halves. AI prepares / never decides.
2 / 24/7 multilingual voice concierge
Picks up the phone after 17:00 and on weekends. SV by default / auto-switch to EN, AR, FA, SO, UK. Triages urgency / books call-backs in the morning shift / never decides on benefits or eligibility.
Win / ~60% of after-hours calls currently land in voicemail. Now they land in a logged, structured triage.
3 / Grant-application drafter
Drafts applications for EU funds, Boverket, Tillväxtverket, Nordic Council, Vinnova. Pulls the kommun's existing data, matches against the call's evaluation criteria, produces a draft a human edits and signs.
Win / each successful application = millions SEK in. ROI is countable, not abstract.
4 / Complaint clustering & trend detection
Reads everything inbound / complaints, social mentions, web-form feedback, kontaktcenter notes. Clusters by topic and neighborhood. Weekly brief to department heads / “playground X / 14 complaints in 10 days.”
Win / problems surface in days, not in the next year's citizen survey.
Shared delivery surface: document outputs (bygglov fix-lists, grant drafts, weekly briefs) publish to
live URLs via the built-in web-publish skill / one command, custom slug, optional Basic Auth, full audit
trail. The voice concierge uses the same audit trail / every call is transcribed and logged.
Not on this list / on purpose: anything that makes a binding decision on behalf
of a citizen. Benefits, eligibility, fines, licensing decisions / those stay 100% human. The
AI employee can prepare the file, triage the call, draft the application, surface the trend.
It does not sign anything.
16/19
How to Actually Get It / Two Editions
Community Edition
Free / self-hosted / open
Runs on any Linux VM
Minimum spec / 4 vCPU / 8 GB RAM
BYOA / bring your own LLM API key (OpenAI, Anthropic, Google, or local model / Llama, Mistral, Qwen)
Full source visibility / no telemetry / your data never leaves your VM
Air-gap deployable with local models
Use case: a municipality with an IT team that can run a VM installs it on its own infrastructure
next week / no contract with us required.
Enterprise Edition
SLA / managed / accountable
Same product / plus everything below
SLA + dedicated support + named CSM
Managed deployment on Elestio infrastructure (or your own)
SSO / advanced audit logging / compliance documentation (GDPR DPA / SOC 2 Type 2 / ISO 27001)
Priority skill development for sector-specific needs
Use case: a municipality that wants the technology plus the contract plus a phone
number to call when something needs explaining to oversight.
One line if you take nothing else from this slide: if you have an IT team that can run a VM, start with
Community Edition next week. If you need contracts, SLAs, and named accountability / talk to us. Either
way, the door is open / no gatekeeping.
17/19
What Just Shipped / What's Next
Just shipped / last 6 weeks
Delegate / Multi-agent / shipped Apr 2026
One employee hires sub-employees for big tasks. Deployed on 100% of our 97-employee workforce. Unit of work moved from "session" to "team of agents."
Web-publish / one command, mandatory skill
AI employees ship static folders to live URLs in a single command. Optional Basic Auth, SPA mode, path-scoped, .env blocked. Auto-installed on every employee.
Heartbeat Engine v3 / autonomous check-ins
Agents wake themselves up every 1–24h, evaluate the workflows they own, and only surface alerts that cross a threshold. They tap you on the shoulder when it matters. Silent the rest of the time.
Voice / first-class channel / live
Real-time phone, not IVR. Gemini Live (sub-second latency) or ElevenLabs Conversation Relay. 4,026 voices / 32 languages. Same identity, same memory across the channel switch.
Coming next: AI-Employee-Bench — an open benchmark so this market stops grading itself with vendor-supplied numbers.
The honest forecast
The role most affected by AI employees over the next 24 months is "junior knowledge worker."
Email triage / first-draft documents / data lookup / scheduling / first-line support.
The roles not affected in the same time horizon:
Senior judgment
Trade-offs, exceptions, "this rule should not apply here" / requires lived institutional knowledge.
Relationships
Long-running trust with citizens, vendors, partners, ombudsmen. Not transferable to an employee with a different identity.
Named accountability
A signed decision still has to come from a person whose name appears in the public record. Legally / and politically.
18/19
What do we owe a citizen
when an AI was part of the
decision chain?