How We Built Arvexi's Agent Fleet
Every morning at 9:30 AM Eastern, an AI agent named Sarah Mitchell opens Salesforce, pulls the highest-priority accounts from a ranked pool of 412 companies, researches each one using SEC EDGAR filings and Apollo enrichment data, writes personalized cold emails, and posts them to Slack with reasoning and sources attached. A human clicks Approve or Reject. If rejected with feedback, the agent rewrites and resubmits. If no one responds within four hours, it sends anyway.
Sarah is one of 15 agents that run Arvexi's go-to-market operation. Together they process inbound leads, manage email sequences, generate content, monitor production systems, and produce weekly intelligence reports. They share memory, coordinate through Salesforce, and learn from every human decision. The entire system costs roughly $50 per day in API calls and runs on a single Docker container on Railway.
The fleet at a glance
Three layers: a scheduler that triggers agents on cron schedules, the agents themselves (each with a dedicated prompt, model, budget, and tool allowlist), and shared infrastructure for memory, Slack, guardrails, and state.
Model selection is deliberate. The SDR uses Opus because cold outreach requires nuanced judgment: when to thread a follow-up versus start fresh, how to position against a competitor without naming them. GTM Research and Inbound Lead use Sonnet for structured data work that needs reasoning but not creativity. Everything else runs on Haiku because the tasks are well-defined.
93 tools, 13 namespaces, one MCP server
Every agent connects to a single MCP server called arvexi-ops. Each agent gets its own subprocess instance, sharing the codebase but isolated in process. The server imports services directly from the main application: same Prisma client, same Salesforce service, same Resend client. No API layer between them.
The critical decision: every agent gets an explicit tool allowlist. The SDR has 24 tools. The content agent has 15. The ops agent has 7. No agent has access to everything. Permission mode is dontAsk, so tools not on the list are silently denied. This prevents a content agent from hallucinating an Apollo API call, burning credits on a phantom enrichment.
The SDR: 460 lines of prompt, zero hardcoded workflows
The SDR prompt doesn't contain a step-by-step workflow. It defines a priority framework and lets the agent decide:
| Priority | Trigger | Action |
|---|---|---|
| P0 | Unread replies | Classify immediately. Interested = propose meeting. |
| P1 | Before any sending | Check bounce rate. If > 5% in 24h, stop all sending. |
| P2 | Active contacts 2-5 days old | Next touch, different angle. |
| P3 | Daily budget remaining | Research, compose, draft for approval. |
| P4 | Contact exhausted (6 touches) | Move to next contact on same account. |
The agent never sends emails directly. Every email goes through draft_for_approval, a custom MCP tool that creates a database record, posts a Block Kit message to Slack with the full email preview, reasoning, and quality score, then attaches Approve and Reject buttons. Rejections trigger an immediate re-draft with the reviewer's feedback incorporated.
Account selection is deterministic: a weighted score based on operating lease liability size (35%), ICP score (25%), competitor presence (15%), fiscal year-end proximity (10%), verified contact depth (10%), and known auditor (5%).
Memory: how agents share knowledge
Every agent reads from shared namespaces before each run. buildMemoryBriefing() fetches up to 20 private memories, 10 from each shared namespace, and 3 strategy memos, all prepended to the system prompt.
After each run, extractAndSaveMemories() parses structured output and upserts by (agent_name, namespace, key). TTLs prevent context rot: research caches expire at 7 days, observations at 14, strategy memos at 30. Every Sunday at 4 AM, a reflection job compacts the week's feedback into a condensed style guide under 300 words.
The learning loop is the most valuable component. When a draft is approved or rejected, a Haiku call extracts a structured style preference. Both SDR and inbound-lead agents read this shared feedback namespace. Rejection rate dropped from 30% (week 1) to under 10% (week 3).
The schedule
Agents run on node-cron but execute sequentially. A queue ensures only one agent runs at a time. Each gets its own MCP server subprocess. Running in parallel would mean multiple Prisma connections and race conditions on Salesforce data.
A mutex prevents duplicate runs: if an agent is already "running" in the database, the next trigger skips it. Stale runs older than 30 minutes are auto-cleared.
The economics
Per-agent budgets act as circuit breakers: if the SDR hits $6 in API costs, the run ends regardless of remaining work. Dry-run mode (AGENT_DRY_RUN=true) tests against real Salesforce data without any outbound cost; guardrail hooks silently deny all send operations.
Before every push to Railway, a 61-check verification suite runs: MCP server startup (93 tools register), per-agent config validation (model, budget, tools, permissions), module imports, and utility exports. If any check fails, the push is blocked.
See Intelligence to learn more about these capabilities.
Next: what happens when lease terms change mid-stream. The modification engine, and why it's the #1 audit finding in ASC 842.