Before getting to Adeline and the API, last week in numbers.
First week in beta
Our first week in beta, measured on the live proxy. Aggregates across every account, no per-user breakdown and nothing identifiable. This is what real Claude Code and SDK traffic looked like on the way upstream.
Adeline v1
v0 was a placeholder, an open model with a compaction adapter, enough to prove the proxy worked end to end while we built the real thing. Adeline v1 is the real thing. It is diffusion-based, and it resolves the compacted rewrite in a handful of parallel passes instead of token by token. That is where the latency win comes from. The full mechanism is a later post.
If you are already running through the proxy, there is nothing to do. Adeline v1 is the default compaction engine as of today; your next session uses it automatically.
How Adeline v1 compares
We promised the hard numbers would land in this post, so here they are. We ran Adeline v1 head to head against Claude Opus 4.7, the model Adeline learns to imitate, and against The Token Company’s bear-1.2, compacting long contexts none of them saw in training. Two metrics per track: how many input tokens it cut, and whether the answer survived the cut.
| method | LongBench v2 | CoderForge (SWE) | ||
|---|---|---|---|---|
| accuracy | reduction | faithfulness | reduction | |
| adeline v1 | 27.5% | 76% | 94.2% | 90.2% |
| Claude Opus 4.7 | 30% | 82% | 95.0% | 87.6% |
| The Token Companybear-1.2 | 25% | 13% | 72.5% | 7.6% |
Method. Two tracks of unseen, 30k–100k-token inputs. Long-document QA is LongBench v2 — the long-context benchmark The Token Company publishes on — scored by whether a downstream model answers the question correctly from the compacted context alone. Agentic coding sessions are CoderForge SWE traces, scored by atomic-fact recall against the original. Token reduction is measured in Claude tokens; higher means more compression. The Token Company runs at its 0.30 “light” setting; Opus and Adeline at their normal compaction.
On the agent sessions Adeline is built for, it keeps 94.2% of the facts while cutting 90.2% of the tokens — within a point of the Opus teacher it learned from, and far ahead of a service that strips only ~8%. On adversarial long-document QA — hard enough that every method lands in the 25–30% range — Adeline stays close to Opus and beats The Token Company, while compressing several times harder.
One proxy, both SDKs
Until now condense.chat was an Anthropic-shaped endpoint. As of this release the proxy carries a route for each provider, and it is a true drop-in: keep your existing SDK, keep your upstream key, change one line.
The whole integration is three steps:
- Mint a key. Grab an
ak_…token from your dashboard. - Point your base_url at the provider route —
/openai/v1or/anthropic— instead of the provider's own URL. - Send the key in the
X-Condense-Auth-Tokenheader. Your upstream provider key keeps flowing through untouched; condense never stores it.
OpenAI — point the client at the /openai/v1 route:
pythonfrom openai import OpenAI
client = OpenAI(
base_url="https://api.condense.chat/openai/v1",
api_key="sk-...",
default_headers={"X-Condense-Auth-Token": "ak_..."},
)
Anthropic — point it at the /anthropic route:
pythonfrom anthropic import Anthropic
client = Anthropic(
base_url="https://api.condense.chat/anthropic",
api_key="sk-ant-...",
default_headers={"X-Condense-Auth-Token": "ak_..."},
)
Your upstream provider key is still forwarded verbatim and never stored. The only thing condense needs is your own key — an ak_… token from your dashboard — in the X-Condense-Auth-Token header. Endpoints we do not compact (model lists, embeddings, anything else on the provider's surface) pass straight through untouched, so the SDK behaves exactly as it would talking to the provider directly.
Compact, or just rewrite
A new header, X-Condense-Function, picks what the proxy does with a request:
proxy(the default) — compact the conversation, forward it upstream, stream the answer back. This is the normal path.rewrite— compact the conversation and hand the rewritten request body straight back to you as JSON, without calling the provider. You see exactly what would have gone upstream, and can inspect it, cache it, or route it yourself.
bashcurl https://api.condense.chat/anthropic/v1/messages \
-H "X-Condense-Auth-Token: ak_..." \
-H "X-Condense-Function: rewrite" \
-H "content-type: application/json" \
-d @conversation.json
Same two-axis choice on the OpenAI route. Unknown header values fall back to proxy, so the safe default is always the working one.
See you next week.
Run your agent on the cheaper bill.
Sign up, wait for approval, point your base_url at condense.