How We Built an Autonomous SDR Agent That Actually Works in Production

Marcus Chen, Arvexi Engineering·March 10, 2026

We shipped an autonomous SDR agent that drafts cold outbound emails, routes them through human review via Slack, learns from every approval and rejection, and logs everything to Salesforce. It runs twice daily on Railway, manages a pipeline of 412 ranked accounts, and costs about $12/day.

This is not a demo. It is production infrastructure that sends real emails to real prospects. And building it required solving problems that most AI agent frameworks quietly ignore: What happens when the agent hallucinates a company name? What prevents it from sending the same email twice? How do you teach it to write better without retraining the model?

This post walks through the full architecture. We are sharing it because we think the engineering community deserves more transparency about what it actually takes to put AI agents into production, and because the patterns we landed on might save someone else six months of iteration.

The Problem With Most AI Agent Architectures

Most AI outbound tools follow the same pattern: connect an LLM to an email API, add a prompt template, ship it.

This works for demos. It falls apart in production for three reasons:

No human review loop. The agent sends directly. One hallucination and your domain reputation is destroyed.
No feedback mechanism. The agent writes the same way forever. There is no path from "rejected draft" to "better draft next time."
No audit trail. When a prospect replies "stop emailing me," you cannot trace which agent run, which tool call, which prompt generated that email.

We needed an architecture that treated AI-generated emails like pull requests: drafted by the agent, reviewed by a human, merged (sent) only after approval.

System Overview

The SDR agent is one node in a fleet of 14 agents running on Railway. Here is the full execution flow:

Click any node for details

Every component in this diagram is a separate module with its own responsibility. Let us walk through each one.

1. The Runner: Agent Configuration as Code

Every agent in the fleet is defined by four parameters: model, budget, tool allowlist, and schedule.

// Agent budgets (USD per run)
const AGENT_BUDGET = {
  sdr:            6.00,   // Opus - complex reasoning across 24 tools
  "gtm-research": 5.00,   // Sonnet - SEC/Apollo research
  "monday-digest": 5.00,  // Sonnet - weekly synthesis
  "inbound-lead": 3.00,   // Sonnet - context-sensitive replies
  router:         0.50,   // Haiku - intent classification only
  ops:            1.00,   // Haiku - quick operational tasks
};
 
// Model selection per agent
const AGENT_MODEL = {
  sdr:            "claude-opus-4-6",    // Heavyweight reasoning
  "gtm-research": "claude-sonnet-4-6",  // Research + enrichment
  "inbound-lead": "claude-sonnet-4-6",  // Personalized responses
  router:         "claude-haiku-4-5",   // Fast classification
};

The SDR gets Opus because it needs to reason across 24 tools, research SEC filings, compose personalized emails, and decide which priority level to work on. The router gets Haiku because it only needs to classify intent ("Is this a sales question or an ops question?") before handing off to a worker.

Why explicit tool allowlists matter. Every agent has a hardcoded list of tools it can access. The SDR gets draft_for_approval but NOT outbound_send_email. A content agent gets linkedin_create_post but NOT sf_update_account. This is security by default. An agent cannot hallucinate its way into sending an email directly or modifying a Salesforce record it should not touch.

The SDR's 24 allowed tools:

Category	Tools
Research	`sec_company_facts`, `apollo_enrich_person`, `score_prospect`, `web_search`
Salesforce	`sf_search_accounts`, `sf_list_account_contacts`, `sf_update_account`, `sf_update_contact`, `sf_create_opportunity`, `sf_log_activity`, `sf_soql_query`
Drafting	`draft_for_approval`
Pipeline	`outbound_check_bounces`, `outreach_create_campaign`, `outreach_enroll_contact`, `outreach_get_enrollment`, `outreach_track_response`
Email	`gmail_search`, `gmail_read`, `gmail_send`
Calendar	`calendar_propose_times`, `calendar_create_event`
Memory	`memory_note`
Blocked	`outbound_send_email`, `outbound_send_sequence_step`Agent cannot bypass draft approval

Notice what is missing: outbound_send_email and outbound_send_sequence_step. The agent physically cannot send an email without going through the draft approval flow.

2. The SDR Agent: Goal-Driven, Not Script-Driven

Most SDR automation follows a rigid script: Step 1, research. Step 2, compose. Step 3, send. This breaks immediately when the agent encounters a reply mid-sequence or when bounce rates spike.

Our SDR operates on a priority framework. Every run, it assesses the current pipeline state and decides what to work on:

P0: Hot replies. Check Gmail for unread replies. Classify each as INTERESTED, NOT_NOW, OBJECTION, OPTED_OUT, or BOUNCE. Respond immediately to interested replies. Update Salesforce. Create opportunities for meetings booked.

P1: Bounce check. Before any new sends, verify bounce rate is below 5%. If it exceeds 5%, stop all outbound immediately. Domain reputation is worth more than any single day's email volume.

P2: Follow-ups due. Query active sequences where the last touch was 2-5 business days ago. Draft follow-ups with different angles (never repeat the same pitch).

P3: New outreach. Pick top-priority accounts from Salesforce (ranked by a weighted algorithm: operating lease liability 35%, ICP score 25%, competitor presence 15%, fiscal year-end timing 10%, contact count 10%, auditor relationship 5%). Research via SEC EDGAR and Apollo. Draft personalized first-touch emails.

P4: Multi-threading. When all contacts at an account have been sequenced, move to the next contact. When all contacts are exhausted, mark the account as "Exhausted" and move on.

The agent executes every applicable priority level per run. On a typical morning run, it might handle 2 replies (P0), confirm bounce rate is clean (P1), send 5 follow-ups (P2), and draft 3 new outreach emails (P3). The next run at 1 PM might skip P0 (no new replies), skip P1 (already checked), send 3 more follow-ups (P2), and draft 5 new emails (P3).

3. Draft Approval: Treating Emails Like Pull Requests

This is the architectural decision that makes the entire system safe for production.

When the SDR wants to send an email, it calls draft_for_approval with the email content, 3-5 bullets of reasoning explaining the angle, and a list of sources (SEC filing, Apollo data, Salesforce fields). The system then:

Evaluates the draft against 7 rule-based assertions (CAN-SPAM compliance, em dash detection, length check, booking link presence, competitor name check, subject length) plus an LLM quality judge that scores tone, personalization, brevity, angle relevance, and call-to-action on a 1-5 scale.
Posts to Slack with Block Kit UI showing the full email preview, reasoning, quality score, and two buttons: Approve & Send and Reject + Note.
Waits for human action or auto-sends after 4 hours if no response (configurable via DRAFT_AUTO_SEND_HOURS).

#sales-signals

SDR AgentAPP9:32 AM

Why this matters for quality. Every email the SDR sends has been either explicitly approved by a human or deemed acceptable after 4 hours of visibility. This is not a rubber stamp. In our first week, the rejection rate was about 30%. By week three, after the learning loop had accumulated enough feedback, it dropped to under 10%.

4. The Rejection-Regeneration Loop: Real-Time Rewrites with Web Search

When a reviewer clicks "Reject + Note," the system:

Marks the draft as REJECTED (atomic claim prevents race conditions)
Extracts the reviewer's feedback note
Calls Claude Sonnet directly (not through the Agent SDK) with the original email, the feedback, and access to web search
Parses the rewritten email from the response
Creates a new draft record and posts the revised version in the same Slack thread

We use Sonnet for regeneration instead of Opus because speed matters here. The reviewer is actively watching Slack. A 7-minute Agent SDK loop is unacceptable. A 5-8 second direct API call is fast enough to feel interactive.

The regeneration prompt includes web search capability. If the reviewer says "research this company's recent acquisition and reference it," the model can search the web, find the news, and weave it into the email. All in under 15 seconds.

const rewriteResponse = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 4096,
  tools: [{ type: "web_search_20250305", name: "web_search", max_uses: 3 }],
  messages: [{
    role: "user",
    content: `Rewrite this rejected email incorporating the reviewer's feedback.
              If the feedback asks to research, USE the web_search tool first.
              ORIGINAL: ${originalEmail}
              FEEDBACK: "${feedbackNote}"
              Return JSON: {"subject": "...", "html": "...", "reasoning": "..."}`,
  }],
});

There is no limit on rejections. We have stress-tested chains of 3+ consecutive rejections with web search at each step. Each regeneration preserves the outreach contact ID, enrollment state, and Salesforce linkage. The final approval sends the email and logs the activity to Salesforce, regardless of how many times the draft was revised.

5. The Learning Loop: Feedback That Compounds

Every approval and rejection feeds into a learning loop. Here is how it works:

Click any node to explore the feedback cycle

On every action (approve or reject):

Strip HTML from the email body
Call Claude Haiku with the draft and action type
Extract a structured observation: { observation, category, example }
Categories: tone_length, angle_selection, personalization, formatting, content
Save to agent_memories table with 30-day expiry

Example memory from an approval:

[APPROVE] Short, punchy emails with one specific pain point
perform better than multi-paragraph pitches. Example:
"Hey Jane, 4,200 leases and quarterly roll forwards in
Excel is a pain. We automate that. Worth a look?"

Example memory from a rejection:

[REJECT] Do not lead with SEC filing data. It feels
surveillance-like. Lead with industry pain instead.
Example: Instead of "your $8.7B OLL across 4,200 leases,"
say "managing thousands of location leases across logistics."

Weekly compaction (Sunday 4 AM UTC):

Query the last 50 feedback memories
Call Haiku to synthesize into a concise style guide (under 300 words)
Save the compacted guide as a feedback_summary memory
Delete individual preferences older than 7 days

How the SDR reads feedback: At the start of every run, the runner calls buildMemoryBriefing("sdr"), which queries recent memories (both agent-specific and shared feedback). The briefing is prepended to the system prompt:

## Briefing (from your memory)

STYLE GUIDE (compiled from 47 feedback signals):
- Keep emails under 5 sentences. Phone-screen length.
- Never lead with SEC data. Lead with industry pain.
- "Hey {FirstName}" not "Dear Mr./Ms."
- One CTA per email. "Worth a quick look?" not
  "Would you be open to scheduling a discovery call?"
- Reference competitors obliquely, never by name.
- Extraction supplement angle works best for accounts
  with existing lease accounting software.

[Recent research cache for accounts in pipeline...]

The agent learns in real-time from feedback without code changes, model fine-tuning, or prompt rewriting. The feedback compounds. Week 1 drafts were verbose and led with features. Week 3 drafts are short, pain-first, and match the reviewer's preferred tone.

6. Safety: Three Layers of Guardrails

Layer 1: Tool Approval for High-Stakes Actions

Certain tools require explicit human approval before execution:

const REQUIRES_APPROVAL = [
  "docusign_send_envelope",
  "docusign_void_envelope",
  "deal_update_stage",
  "sf_close_won",
  "stripe_create_checkout",
];

When an agent attempts to use one of these tools, the system blocks the call, posts an alert to Slack, and requests approval. This prevents an agent from accidentally closing a deal or sending a contract.

Layer 2: Dry Run Mode

Setting AGENT_DRY_RUN=true blocks all outbound actions:

const SENDS_OUTBOUND = [
  "gmail_send", "gmail_reply", "linkedin_create_post",
  "outreach_enroll_contact", "outbound_send_email",
  "x_post_tweet", "youtube_upload_video",
];

The agent sees: "DRY_RUN mode active. Would have called gmail_send but blocked." It continues execution, logging what it would have done. This is essential for testing new prompt versions against production data without side effects.

Layer 3: Atomic Claims (Concurrency Safety)

Every state transition uses an atomic claim pattern:

const claimed = await prisma.draftApproval.updateMany({
  where: { id: draftId, status: "PENDING" },
  data: { status: "APPROVED", resolvedAt: new Date() },
});
if (claimed.count === 0) return; // Someone else got it first

This prevents double-sends when a button is clicked twice, when the auto-sender and a human approve simultaneously, or when two agent runs overlap. The database is the single source of truth.

7. Email Delivery: Domain Isolation and CAN-SPAM

We operate two completely separate email domains:

Domain	Purpose	Provider
`arvexi.com`	Google Workspace (team email)	Google
`mail.arvexi.com`	Transactional (password resets, notifications)	Resend
`getarvexi.com`	Cold outbound (SDR agent)	Resend

If cold outbound triggers spam complaints, it damages getarvexi.com reputation only. The main brand domain and transactional email are completely isolated.

Every outbound email passes through deAIify() before sending, which strips AI writing artifacts:

Em dashes to spaced hyphens
Smart quotes to straight quotes
Unicode ellipsis to three periods
Non-breaking spaces to regular spaces

Every email includes a professional signature and CAN-SPAM-compliant footer with a per-contact unsubscribe link. The unsubscribe link maps to a specific outreach contact ID, so when someone opts out, we can mark them as terminal in the database and guarantee they are never contacted again.

8. The Router: Unified Slack Interface

All human-to-agent communication flows through a single entry point: the router agent.

When a message arrives in any monitored Slack channel:

The router (Claude Haiku, $0.01/classification) reads the message
It classifies intent to one of 7 workers: sdr, ops, inbox, content, deal-followup, gtm-research, inbound-lead
The appropriate worker agent executes with its own model, budget, and tool allowlist
The response posts back to the Slack thread with a cost footer

This eliminates the need for per-agent Slack channels. Sales questions route to the SDR. Operational questions route to ops. Content requests route to the content agent. The user does not need to know which agent handles what.

9. Observability: Every Run is Logged

Every agent run records to a Supabase agent_runs table:

Field	Type	Purpose
`agent_name`	string	Which agent ran
`started_at`	timestamp	When it started
`completed_at`	timestamp	When it finished
`status`	enum	running, completed, failed, skipped
`tools_used`	string[]	Every tool called during the run
`actions_taken`	json	Summary of what the agent did
`error`	string	Error message if failed
`api_tokens_used`	number	Total tokens consumed
`cost_estimate`	number	USD cost of the run

After every run, a summary posts to Slack:

SDR: Drafted 8 emails, handled 2 replies, 1 meeting booked
Tools: sf_search_accounts (4x), draft_for_approval (8x),
       gmail_read (3x), sec_company_facts (2x)
Cost: $5.73 | Duration: 127s | Tokens: 18,342

10. Production Economics

Here is what the SDR agent actually costs to operate:

AI SDR$0

Human SDR$0

97% cost reduction

AI SDR Cost Breakdown (per run)

Claude API (Opus)$5.73

Resend email$0.15

Railway compute$0.12

Metric	Human SDR	AI SDR
Emails per day	50-80 (manual)	10-20 (deeply researched)
Research depth	LinkedIn profile scan	SEC 10-K + Apollo + web search + SF history
Consistency	Varies by day/mood	Same quality every run
Learning speed	Months of coaching	Real-time from every approval/rejection
Audit trail	Spotty	Every action logged

The AI SDR is not a replacement for a human SDR. It is a force multiplier. The human reviewer spends 30 seconds per draft (approve or reject with feedback) instead of 30 minutes per email (research + compose + send + log). The agent handles the tedious work. The human provides judgment.

What We Learned

Draft approval is the single most important architectural decision. It solves hallucination risk, creates a feedback mechanism, and builds an audit trail. If you are building AI agents that interact with external parties, route everything through human review first. You can relax the constraint later (our auto-send SLA is effectively a relaxation), but start with full review.

Model selection per task matters more than using the best model everywhere. Opus for complex multi-tool reasoning. Sonnet for fast regeneration. Haiku for intent classification and feedback extraction. Using Opus for router classification would cost 20x more with no quality improvement.

The learning loop is the compounding advantage. Static prompts produce static results. A feedback loop that captures human preferences and injects them into the next run produces an agent that gets measurably better every week.

Atomic claims are not optional. In any system where multiple processes can act on the same record (human approval, auto-sender, concurrent agent runs), you need database-level concurrency control. updateMany WHERE status='PENDING' is the pattern.

Domain isolation for email is non-negotiable. Cold outbound and transactional email must use separate domains. One spam complaint on your transactional domain and your password reset emails stop arriving.

Architecture Reference

For anyone building similar systems, here are the key modules and their relationships:

▶📁agent-orchestrator/

▶📁src/

📄index.ts

📄runner.ts

📄guardrails.ts

📄draft-actions.ts

📄draft-auto-sender.ts

📄learning-loop.ts

📄slack-server.ts

📄slack-client.ts

▶📁agents/

▶📁mcp/

▶📁src/

Click a file to see its description and data flow

See Intelligence to learn more about Arvexi's AI agent capabilities.

Published by the Arvexi Engineering Team. Questions or feedback: engineering@arvexi.com