How We Built Context Management for a Financial AI Agent

Rachel Torres·March 10, 2026

Enterprise lease accounting is unforgiving. A single misplaced decimal in a present value calculation cascades through amortization schedules, journal entries, and financial statements. When we built Arvexi's AI Workspace (an autonomous agent that handles ASC 842, IFRS 16, and GASB 87/96) we knew the context window would eventually become our biggest constraint.

The problem: context windows have a hard ceiling

Claude's context window is 200K tokens. That sounds like a lot until you account for everything that goes into every API call.

System Prompt

4K

Tool Definitions

8K

Conversation History

60K

30%

Tool Results

120K

60%

Available

8K

Click any segment for details. Typical 30-lease audit session

A user asks "prepare the KPMG audit for all leases" and the agent starts working. It queries the portfolio, pulls each lease's schedule, validates classifications, generates journal entries. By lease 15, the conversation history alone is pushing 120K tokens. By lease 30, we're hitting the wall.

The naive solution is to warn the user and ask them to start a new chat. But that defeats the purpose. We're selling an outcome, a complete audit package, not a tool that makes you restart every 15 minutes.

What we tried first

Our first approach used Claude's context editing to silently drop old tool results once usage crossed a threshold. We shipped it. Then we thought harder.

In a financial product, "silently dropping data" is a terrifying phrase. If the agent references a liability figure from a tool result that was cleared, it's working from memory of a number it can no longer verify. For a consumer chatbot, that's fine. For a platform where auditors rely on the output, it's not.

We ripped it out.

The three phases of context

Context Strategy

Three phases, infinite runway

Full Context

0 → 150K tokens

Everything stays. Every tool result, every figure, every turn.

Compaction

At 150K tokens

Structured summary preserving all financial data.

Repeat

Post-compaction

Fresh runway. Agent continues indefinitely.

0150K200K

200K

context window

~60%

freed per compaction

∞

total session length

Phase 1: Full context (0 to 150K tokens). Everything stays. Every tool result, every conversation turn, every financial figure. No dropping, no summarizing. This is where most conversations live. A typical session uses 30-50K tokens.

Phase 2: Compaction (at 150K tokens). Claude generates a structured summary of older conversation turns while preserving recent ones verbatim. The critical difference from our first approach: we control exactly what gets preserved through domain-specific instructions.

Phase 3: Repeat. After compaction frees 60-80% of the window, the agent has fresh runway. If it fills up again, compaction fires again. Each cycle only summarizes turns added since the last compaction; prior summaries are preserved verbatim. The agent can run indefinitely.

What compaction preserves

The compaction instructions are opinionated. We tell the model exactly what matters in lease accounting and what's safe to discard.

What compaction preserves

Preserve

exact precision required

Financial Data

$1,234,567.89 not ~$1.2M

All dollar amounts, interest rates, discount rates, payment amounts, liability balances, ROU asset values

Lease Identifiers

Every lease ID, lease number, lessor name, classification

▮▮▯

Processing State

Processed 47 of 200 leases

Which items completed vs pending, with counts

Calculation Results

NPV, PV, WARLT, amortization totals, journal entry sums

→

User Requests

The original task and any modifications

▶

Next Steps

What the agent was about to do next

Discard

safe to summarize

Raw Metadata

API response headers, tool execution timestamps, request IDs

Period-by-Period Data

360-period schedules replaced with totals + period count

Retry Attempts

Failed tool calls, debugging traces, intermediate errors

Domain-specific instructions ensure every dollar survives compaction

The key insight: $1,234,567.89 and ~$1.2M are not the same number in accounting. Our instructions encode that domain knowledge directly into the summarization process.

Smart truncation

Even with compaction, individual tool results can be enormous. A 30-year monthly lease has 360 schedule periods, totaling 150K+ characters of JSON for a single tool call.

Our original truncation replaced anything over 30KB with { _truncated: true }. The agent would work with zero data from that call. We replaced it with a binary search that preserves real data:

Binary search truncation360-period schedule → 25KB target

0360 periods

Binary search converges in 6 steps to the largest slice fitting 25KB

The agent always has something real to work with. We also added pagination to high-volume tools. get_lease_schedule accepts offset/limit (default 60, max 120) so the agent can page through large datasets deliberately.

What this enables

Before this work, our agent hit a wall after 15 leases in a complex analysis. Now a "prepare audit for all leases" command processes the entire portfolio in a single session. The agent can run 15-20 minutes uninterrupted, making 25+ tool calls, with multiple compaction cycles and zero degradation.

The goal hasn't changed: the user says what they want, and the agent delivers the outcome. Context management, pagination, compaction: all of it should be invisible.

See Intelligence to learn more about these capabilities.