Helene 1 joins Adeline 1 in the condense compaction family. Adeline 1 does the heavy lifting on long agent traces. Helene 1 is the fast, accuracy-first pass for general use. The two models run side by side, and as of today Helene 1 is the default compaction engine on the proxy. Helene 1 itself runs two ways: auto, which reads each input and decides how much to remove, and a fixed ratio you set yourself.
The headline isn’t just that Helene 1 saves tokens: every compressor does that. It is that on a standard question-answering benchmark, a model answering from Helene 1’s compacted context is more accurate than the same model reading the full, untouched transcript, while sending far less. Smaller bill, better answers, in the same call.
If you are already running through the proxy there is nothing to do: your next session uses Helene 1 automatically. The numbers below cover auto and a fixed 0.2 ratio.
Today Helene 1 runs inside the proxy. A way to call it directly, the same drop-in API routes Adeline 1 uses, is coming soon.
A clean win on CoQA
We ran Helene 1 head to head against the two compressors teams reach for, Microsoft’s LLMLingua 2 and The Token Company’s bear 2, plus a control that sends the full, uncompressed transcript. Same answerer, same judge, same 150 turns for every arm. Two numbers per arm: how accurately a downstream model answers from the (compacted) context, and how many input tokens were removed to get there.
| method | judge accuracy | tokens saved | % saved |
|---|---|---|---|
| controluncompressed | 90.0% | 0 | 0.0% |
| Helene 1auto | 92.0% | 16,843 | 30.6% |
| Helene 1ratio 0.2 | 91.3% | 10,874 | 19.8% |
| LLMLingua 2ratio 0.2 | 90.7% | 11,169 | 20.3% |
| bear 2ratio 0.2 (medium) | 90.7% | 4,572 | 8.3% |
More accurate than sending everything. A gpt-5.4 model answering from Helene 1’s compacted context scored 92.0%, above the 90.0% the same model scored reading the full, uncompressed transcript. Compression is usually framed as a trade: spend some accuracy to spend fewer tokens. Here the cheaper request was the more accurate one, because dropping the low-signal parts of a long transcript leaves the answerer with less to get distracted by.
The deepest cut in the test. auto removed 30.6% of the input tokens, more than any other arm, and accuracy went up, not down. Nothing else in the table does both. At a matched 0.2 ratio, Helene 1 still edged both baselines (91.3% vs 90.7%), and against The Token Company’s bear 2 it removes 2.4× more tokens at a higher score.
Faster, not just smaller. A Helene 1 compress call runs in tens of milliseconds, so the compaction step adds no latency you would notice, and the shorter prompt is quicker to answer too: the gpt-5.4 answerer returned in 2,689 ms from auto’s context versus 3,332 ms on the full transcript, 19.3% faster end to end.
Method. CoQA, 150 conversational question-answering turns. A gpt-5.4 answerer responds from the compacted passage, and a gpt-5.4 judge grades each answer against the reference. Tokens saved is the reduction against the 55,016-token uncompressed transcript. Every fixed-ratio arm ran at 0.2, the same setting The Token Company recommends as the middle ground for bear 2 (“medium”), alongside Helene’s auto. One benchmark and one model pairing. We will publish more arms as they finish.
Long context, and structured data
CoQA is short and conversational. To see whether the same advantage survives at length, we ran LongBench v2, all domains, up to ~500k tokens of context per item, 15.7M input tokens in total. auto again removed the most of any arm, 38.8%, a 1.64× reduction, and tied the highest accuracy in the field while doing it.
| arm | accuracy | tokens saved | % saved |
|---|---|---|---|
| uncompressedcontrol | 48.7% | 0 | 0.0% |
| Helene 1auto | 49.3% | 6,089,910 | 38.8% |
| Helene 1ratio 0.2 | 48.7% | 2,011,249 | 12.8% |
| LLMLingua 2ratio 0.2 | 49.3% | 3,342,363 | 21.3% |
| bear 2ratio 0.2 (medium) | 48.7% | 843,338 | 5.4% |
Top accuracy, deepest cut. auto ties LLMLingua 2 for the highest score in the field (49.3%) and edges the uncompressed control (48.7%), but it gets there while removing 1.7× more tokens than LLMLingua and roughly 5× more than bear 2. Same accuracy, far smaller bill.
Method. LongBench v2, n=150 across all domains, contexts up to ~500k tokens, gpt-5.4 answerer and judge. Fixed-ratio arms ran at 0.2 (bear 2 at its recommended “medium”). Accuracy sits within about a point across every arm, and auto is at the top of that band while cutting the most. One benchmark and one model pairing. We will publish more as they finish.
Where it shows most: code and structured context
Two LongBench domains stress a compressor hardest: code repositories, where a dropped line breaks meaning, and long structured data, tables and records where the wrong cut corrupts the answer. Across those 50 items both Helene arms clear the uncompressed control while every other compressor holds flat or drops: auto lands at 44.0% (against 40.0%) while removing 40.9% of the tokens, and the 0.2 ratio reaches 42.0%. LLMLingua-2 only matches the control, and bear 2 falls to 36.0%.
| arm | accuracy | tokens saved | % saved |
|---|---|---|---|
| uncompressedcontrol | 40.0% | 0 | 0.0% |
| Helene 1auto | 44.0% | 3,698,035 | 40.9% |
| Helene 1ratio 0.2 | 42.0% | 870,618 | 9.6% |
| LLMLingua 2ratio 0.2 | 40.0% | 1,756,098 | 19.4% |
| bear 2ratio 0.2 (medium) | 36.0% | 142,138 | 1.6% |
Method. The code-repository and long-structured-data items from LongBench v2, 50 items, effectively the whole of the benchmark’s code and structured-data slice, and the domains most relevant to coding agents. This is where a careless cut is most likely to corrupt an answer.
Also launching today: Codex and OpenCode
dense, the one-line install that routes a
coding agent through condense, now drives Codex and
OpenCode in addition to Claude Code. Install once, point your
agent at condense, and Helene 1 compacts the context on the way
upstream. Nothing about your model, your keys, or your workflow
changes.
shellcurl -fsSL https://cli.condense.chat/unix | sh # macOS / Linux
dense claude # run Claude Code through condense
dense codex # run Codex through condense
dense opencode # run OpenCode through condense
Same agent, same results, a fraction of the input tokens, now for whichever of the three you reach for. The proxy stays transparent and zero-retention: it sees just enough to compact a request in flight, and nothing persists beyond that.
See you next week.
Run your agent on the cheaper bill.
Sign up, wait for approval, point your agent at condense.