Building an AI Workspace for Enterprise Accounting
Lease accounting is a domain where a single misplaced decimal cascades through amortization schedules, journal entries, and financial statements. When FASB introduced ASC 842, every company with more than a handful of leases suddenly needed software that could compute present values, track modifications, and generate compliant disclosures. Most of the incumbents built glorified spreadsheets with a database behind them. We built an agent.
Why an agent, not a chatbot
A chatbot answers questions about data. An agent does the work.
When an accounting team prepares for a quarterly close, the workflow is 200+ discrete steps: validate every lease, check classifications, generate journal entries, reconcile against the GL, compile the audit package. A chatbot could answer "what's the liability on LSE-00047?" An agent can execute "prepare the Q4 close for all leases" and deliver the complete output.
The core design principle: the user says what they want, and the agent delivers the outcome.
The agentic loop
The agent runs as a loop. On each iteration, Claude receives the conversation history plus all available tools. It either calls tools or produces a final response.
Click any node for details
Most requests complete in 3-5 iterations. Complex workflows might use 15-20. The model is Haiku -- a deliberate choice. The 39 tools do the heavy lifting (queries, calculations, schedules). Claude's job is tool selection and narration. Haiku is fast, cheap, and more than capable of routing to the right tool.
Temperature is 0.2, not 0. We found that 0 occasionally produces degenerate tool call sequences. A small amount of temperature prevents that while keeping financial outputs deterministic.
The tool architecture
Click a category to see its tools
39 tools, 6 executors, 4 modules. The routing is a two-level dispatch: system tools emit SSE events directly, domain tools go through a universal dispatcher to module-specific executors.
Every tool execution goes through an RBAC check. Mutation tools like generate_journal_entries require ADMIN or CONTROLLER. Everything else is implicitly read-only, including for AUDITOR. The denial message is human-readable: "This action requires Admin or Controller access."
A key design decision: all four modules share the same tool definitions. The module-specific behavior comes from the system prompt. This means the agent can cross-reference data across modules seamlessly -- from the Lease module, ask "what documents are linked to this lease?" and it uses search_documents without missing a beat.
The system prompt: domain knowledge as code
The system prompt is not a personality prompt. It is a 461-line decision-making framework with three layers:
Layer 1: Behavioral rules. Always use tools for data, never guess. Format currency as $X,XXX.XX. Use update_progress for complex tasks. Use the compute sandbox for any financial calculation.
Layer 2: Module-specific guidance. Natural language mapped to tools: "How many leases do we have?" maps to get_portfolio_summary. Warnings for mutations: "generate_journal_entries CREATES draft entries. Always explain what will happen before running it."
Layer 3: Expert accounting knowledge. Ten PwC-level rules encoded directly:
1. Commencement date = when lessee gets CONTROL (keys/access),
NOT the contract execution date.
2. Termination penalties must NEVER be expensed immediately:
allocated over remaining use periods (ASC 842-10-25-14).
This is the #1 audit finding.
3. Partial space return is a MODIFICATION, not a termination:
if any use continues, the entire contract is remeasured.
These rules prevent the most common lease accounting errors. When a user asks "should I expense this termination penalty?" the agent cites the standard, not a guess.
The compute sandbox
When the agent needs to calculate a present value, it writes JavaScript and executes it in an isolated V8 sandbox.
Click any ring to explore the security layer
Seven financial functions match Excel/Google Sheets signatures:
PV(rate, nper, pmt, fv?) // Present value
FV(rate, nper, pmt, pv?) // Future value
PMT(rate, nper, pv, fv?) // Payment amount
NPV(rate, cashflows[]) // Net present value
IRR(cashflows[], guess?) // Internal rate of return
round(value, decimals?) // Precise rounding
formatCurrency(value) // USD formattingIRR uses Newton-Raphson with a bisection fallback for cashflow patterns with multiple sign changes. The round function uses exponential notation to sidestep JavaScript's floating point precision issues -- because in lease accounting, a $0.01 error compounded over 360 months becomes a $3.60 discrepancy that an auditor will flag.
The code, output, and result are displayed in the activity feed as a collapsible artifact. Every number has a visible derivation. This is critical for audit transparency.
Scenario modeling: forward and reverse
Two tools handle what-if analysis, entirely in memory.
Forward modeling: "What happens if we extend the term to 15 years?" Clone the lease, apply modifications, generate a fresh schedule, return the deltas -- liability, ROU asset, expense, total payments.
Reverse modeling: "What term gives me a $50,000 monthly expense?" Bisection search finds the parameter value that achieves a target metric. The search detects inverted relationships (longer term = lower monthly expense but higher total liability) and adjusts direction automatically. Converges within 50 iterations to 1% tolerance.
Streaming architecture: SSE from agent to browser
The agent loop is not request-response. It is a stream.
Click any event type for details
When a user sends a message, the frontend opens an SSE connection. The backend emits events as JSON objects -- each token flows from the Claude API to the SSE transport to the browser with zero buffering. The user sees words appearing as they are generated, even during a multi-tool workflow that takes 10+ seconds to complete.
System tools: the workspace primitives
Seven tools shape the workspace itself:
- update_progress: Real-time task plan in the sidebar (pending, in_progress, completed)
- write_scratchpad: Working notes for intermediate calculations, capped at 100KB
- compute: Sandboxed JavaScript executor with financial functions
- export_spreadsheet: Styled Excel files uploaded to Supabase Storage with signed URLs
- export_document: Professional DOCX with title pages, tables, and headers/footers
- memory_update: Persistent long-term memory across conversations ("remember we use 5.5% for new leases")
- propose_action: Interactive approval cards for mutations (Accept/Reject buttons)
System tools are split into "silent" and "visible." Silent tools (progress, scratchpad, compute, memory) emit SSE events but skip the execution card. Export tools are visible because they take noticeable time.
Prompt caching and smart truncation
Every loop iteration sends ~12K tokens of system prompt and tool definitions. We use Anthropic's prompt caching to serve this from cache at 90% reduced cost. A typical 5-iteration workflow drops from ~150K billed input tokens to ~40K.
For large tool results (a 360-period amortization schedule is 150K+ characters), binary search finds the largest data slice that fits within 25KB. The agent gets the first 55-60 real periods plus metadata saying "300 more available via pagination." No zero-data truncation.
What the user actually sees
Animated walkthrough of a typical Q4 close workflow
The activity feed is not a chat log. It is a structured execution trace. Tool calls appear as compact cards. 3+ identical calls auto-collapse into groups. Mutations go through interactive action cards with Accept/Reject buttons. Artifacts appear inline: tables, spreadsheet downloads, code execution blocks.
The mockup above shows a typical Q4 close workflow. One message in, 45 seconds later: 12 leases validated, 36 journal entries generated and ready for approval, spreadsheet exported. Zero manual data entry.
The numbers
| Metric | Value |
|---|---|
| Total tools | 39 (10 lease + 8 document + 7 match + 8 financial + 2 scenario + 7 system) |
| System prompt | 461 lines |
| Context window | 200K tokens |
| Prompt cache savings | ~90% per iteration |
| Compute sandbox timeout | 5 seconds |
Looking forward
Parallel tool execution is next -- when the agent validates 12 leases, it should fire 12 calls concurrently instead of sequentially. The architecture supports it; we are waiting for the API to mature.
The goal remains: the user says what they want, and the agent delivers the outcome. The workspace is the interface between intent and execution. Everything we build makes that gap smaller.
See Intelligence and Lease Accounting to learn more about these capabilities.