Hello, condense.chat

Faster context compaction for Claude Code and the Claude Agent SDK, in beta.

By Imir Lab · note 001

condense.chat is a drop-in proxy that compacts Claude Code and Claude Agent SDK conversations before they reach Anthropic. Beta access opened today. Sign up, get approved, run Claude Code through condense.chat on $5 of starter credit.

Who we are

Imir Lab is a Lithuanian research group founded in 2026 by a team with backgrounds in frontier AI research and large-scale infrastructure. We build fast-inference models for tasks where latency is the dominant cost. condense.chat is our first release.

Why Claude Code context compaction is slow

Claude Code context compaction is built into the SDK. When the context window fills, Claude Code runs a sequential LLM call that summarises the conversation and replaces it inline. The Claude Agent SDK uses the same approach. It works, but the compaction step blocks your next turn for several seconds, and the rewrite drops detail you usually need.

condense.chat sits between your client and Anthropic. It compresses history in the background, in parallel, before the request goes upstream. Same SDK (Claude Code or Claude Agent SDK), same model, same evals. The result shows up in three places: a smaller upstream bill, around 5x less wall-clock time on the compaction step itself, and a rewrite that scores higher than the two compaction services we benchmark against. Numbers on the last two ride with the next post. The first you can watch happen in real time on your own helm dashboard.

The bill, with and without compaction

The bill is the easiest part to show, so we start there. Same Claude Opus pricing on both axes ($5 per million input tokens). The X axis is the underlying conversation length your agent would have produced on its own. The blue line is what you pay today. Drag the slider to see the same session priced at four condense compaction levels.

Autocompaction efficiency
no compaction 70% compaction 80% compaction 90% compaction
compaction level 80% compaction
savings vs no compaction
$23.00
saved at 1M underlying tokens (46% off)

What you see after you sign up

Once you are approved, you land in your helm dashboard at helm.condense.chat. The numbers above the fold are the four we care about: dollars saved, compression ratio, cache hit rate, and request count. The dashboard polls every couple of seconds, so the first few requests through the proxy show up within a turn.

dashboard preview
Your savings
Money saved
$12.34USD
842k input tokens stripped before reaching Anthropic
Per-model rates from tokencost. Each row in your ledger is multiplied by its served SKU's input price. Cache hits are billed separately by Anthropic and not counted here.
70%
Compression
1.2M 360k
pre → on-the-wire
62%
Cache hit rate
R 220k / W 84k
read / write tokens
Requests
84
avg overhead 412ms

Try it

The whole setup is one line in your terminal once your account is approved. The installer mints a key, writes a small ~/.claude config that points the Claude SDK at api.condense.chat, and starts Claude Code on your account.

runs against your approved beta account · your Anthropic key keeps being forwarded byte for byte, we do not store it

The numbers we are happy to put in writing

Three for now. The harder ones (full latency curves, accuracy on each benchmark) ride along with the next post.

5x
latency vs built-in compaction
Same compression ratio, same input size, end-to-end wall clock to first usable rewrite.
accuracy vs competitors
Higher on the long-context benchmarks we ran against the top two compaction services. Numbers and method land in the next post.
$5
starter credit on approval
Enough for tens of long Claude Code sessions on Opus, going by typical compression ratios.

Up next

New posts arrive here every Saturday from now on. Next Saturday we ship Adeline, our v1 compaction model.

Run Claude Code on the cheaper bill.

Sign up, wait for approval, paste the curl line.

Get access
← All posts Imir Lab, May 16 2026