Adeline V1 ships. The API ships with it.

Our diffusion-based compaction model goes live today, and the proxy now works with Claude Code, Codex, and any OpenAI or Anthropic SDK.

By Imir Lab · note 002

Last Saturday we said Adeline would ship this week. It has. Adeline v1 is the compaction model now. The Claude Code curl from note 001 still works, and condense.chat now also exposes a route per provider so any OpenAI or Anthropic client (Codex included) is a drop-in. Point your base_url and nothing else changes.

Before getting to Adeline and the API, last week in numbers.

First week in beta

Our first week in beta, measured on the live proxy. Aggregates across every account, no per-user breakdown and nothing identifiable. This is what real Claude Code and SDK traffic looked like on the way upstream.

$730
saved this week
The “money saved” figure each account sees on its dashboard, summed across the week.
65%
fewer input tokens
2.26B tokens of conversation compressed to 788M on the wire before they reached the provider.
8.2k
requests through the proxy
Real beta traffic, Claude Code and SDK sessions routed through condense.chat this week.

Adeline v1

v0 was a placeholder, an open model with a compaction adapter, enough to prove the proxy worked end to end while we built the real thing. Adeline v1 is the real thing. It is diffusion-based, and it resolves the compacted rewrite in a handful of parallel passes instead of token by token. That is where the latency win comes from. The full mechanism is a later post.

If you are already running through the proxy, there is nothing to do. Adeline v1 is the default compaction engine as of today; your next session uses it automatically.

How Adeline v1 compares

We promised the hard numbers would land in this post, so here they are. We ran Adeline v1 head to head against Claude Opus 4.7, the model Adeline learns to imitate, and against The Token Company’s bear-1.2, compacting long contexts none of them saw in training. Two metrics per track: how many input tokens it cut, and whether the answer survived the cut.

94.2%
facts kept on agent traces
At 90.2% input token reduction. Within a point of Claude Opus 4.7 on faithfulness, while compacting slightly harder.
90.2% vs 7.6%
input token reduction on agent traces
Adeline v1 against The Token Company’s bear-1.2 on the same input.
methodLongBench v2CoderForge (SWE)
accuracyreductionfaithfulnessreduction
adeline v127.5%76%94.2%90.2%
Claude Opus 4.730%82%95.0%87.6%
The Token Companybear-1.225%13%72.5%7.6%

Method. Two tracks of unseen, 30k–100k-token inputs. Long-document QA is LongBench v2 — the long-context benchmark The Token Company publishes on — scored by whether a downstream model answers the question correctly from the compacted context alone. Agentic coding sessions are CoderForge SWE traces, scored by atomic-fact recall against the original. Token reduction is measured in Claude tokens; higher means more compression. The Token Company runs at its 0.30 “light” setting; Opus and Adeline at their normal compaction.

On the agent sessions Adeline is built for, it keeps 94.2% of the facts while cutting 90.2% of the tokens — within a point of the Opus teacher it learned from, and far ahead of a service that strips only ~8%. On adversarial long-document QA — hard enough that every method lands in the 25–30% range — Adeline stays close to Opus and beats The Token Company, while compressing several times harder.

One proxy, both SDKs

Until now condense.chat was an Anthropic-shaped endpoint. As of this release the proxy carries a route for each provider, and it is a true drop-in: keep your existing SDK, keep your upstream key, change one line.

The whole integration is three steps:

  1. Mint a key. Grab an ak_… token from your dashboard.
  2. Point your base_url at the provider route — /openai/v1 or /anthropic — instead of the provider's own URL.
  3. Send the key in the X-Condense-Auth-Token header. Your upstream provider key keeps flowing through untouched; condense never stores it.

OpenAI — point the client at the /openai/v1 route:

pythonfrom openai import OpenAI

client = OpenAI(
    base_url="https://api.condense.chat/openai/v1",
    api_key="sk-...",
    default_headers={"X-Condense-Auth-Token": "ak_..."},
)

Anthropic — point it at the /anthropic route:

pythonfrom anthropic import Anthropic

client = Anthropic(
    base_url="https://api.condense.chat/anthropic",
    api_key="sk-ant-...",
    default_headers={"X-Condense-Auth-Token": "ak_..."},
)

Your upstream provider key is still forwarded verbatim and never stored. The only thing condense needs is your own key — an ak_… token from your dashboard — in the X-Condense-Auth-Token header. Endpoints we do not compact (model lists, embeddings, anything else on the provider's surface) pass straight through untouched, so the SDK behaves exactly as it would talking to the provider directly.

Compact, or just rewrite

A new header, X-Condense-Function, picks what the proxy does with a request:

bashcurl https://api.condense.chat/anthropic/v1/messages \
  -H "X-Condense-Auth-Token: ak_..." \
  -H "X-Condense-Function: rewrite" \
  -H "content-type: application/json" \
  -d @conversation.json

Same two-axis choice on the OpenAI route. Unknown header values fall back to proxy, so the safe default is always the working one.

Full API reference Provider routes, headers, auth, and copy-paste examples for curl and both SDKs.
Read the API docs

See you next week.

Run your agent on the cheaper bill.

Sign up, wait for approval, point your base_url at condense.

Get access
← All posts Imir Lab, May 23 2026