How We Built Context Management for a Financial AI Agent
Enterprise lease accounting is unforgiving. A single misplaced decimal in a present value calculation cascades through amortization schedules, journal entries, and financial statements. When we built Arvexi's AI Workspace (an autonomous agent that handles ASC 842, IFRS 16, and GASB 87/96) we knew the context window would eventually become our biggest constraint.
The problem: context windows have a hard ceiling
Claude's context window is 200K tokens. That sounds like a lot until you account for everything that goes into every API call.
Click any segment for details — typical 30-lease audit session
A user asks "prepare the KPMG audit for all leases" and the agent starts working. It queries the portfolio, pulls each lease's schedule, validates classifications, generates journal entries. By lease 15, the conversation history alone is pushing 120K tokens. By lease 30, we're hitting the wall.
The naive solution is to warn the user and ask them to start a new chat. But that defeats the purpose. We're selling an outcome, a complete audit package, not a tool that makes you restart every 15 minutes.
What we tried first
Our first approach used Claude's context editing to silently drop old tool results once usage crossed a threshold. We shipped it. Then we thought harder.
In a financial product, "silently dropping data" is a terrifying phrase. If the agent references a liability figure from a tool result that was cleared, it's working from memory of a number it can no longer verify. For a consumer chatbot, that's fine. For a platform where auditors rely on the output, it's not.
We ripped it out.
The three phases of context
Phase 1: Full context (0 to 150K tokens). Everything stays. Every tool result, every conversation turn, every financial figure. No dropping, no summarizing. This is where most conversations live. A typical session uses 30-50K tokens.
Phase 2: Compaction (at 150K tokens). Claude generates a structured summary of older conversation turns while preserving recent ones verbatim. The critical difference from our first approach: we control exactly what gets preserved through domain-specific instructions.
Phase 3: Repeat. After compaction frees 60-80% of the window, the agent has fresh runway. If it fills up again, compaction fires again. Each cycle only summarizes turns added since the last compaction; prior summaries are preserved verbatim. The agent can run indefinitely.
What compaction preserves
The compaction instructions are opinionated. We tell the model exactly what matters in lease accounting and what's safe to discard.
What compaction preserves
exact precision required
safe to summarize
Domain-specific instructions ensure every dollar survives compaction
The key insight: $1,234,567.89 and ~$1.2M are not the same number in accounting. Our instructions encode that domain knowledge directly into the summarization process.
Smart truncation
Even with compaction, individual tool results can be enormous. A 30-year monthly lease has 360 schedule periods, totaling 150K+ characters of JSON for a single tool call.
Our original truncation replaced anything over 30KB with { _truncated: true }. The agent would work with zero data from that call. We replaced it with a binary search that preserves real data:
Binary search converges in 6 steps to the largest slice fitting 25KB
The agent always has something real to work with. We also added pagination to high-volume tools. get_lease_schedule accepts offset/limit (default 60, max 120) so the agent can page through large datasets deliberately.
What this enables
Before this work, our agent hit a wall after 15 leases in a complex analysis. Now a "prepare audit for all leases" command processes the entire portfolio in a single session. The agent can run 15-20 minutes uninterrupted, making 25+ tool calls, with multiple compaction cycles and zero degradation.
The goal hasn't changed: the user says what they want, and the agent delivers the outcome. Context management, pagination, compaction: all of it should be invisible.
See Intelligence to learn more about these capabilities.