Your context condensed

condense.chat compresses up to 70% without losing meaning in under 6 seconds per one hundred thousand tokens.

runs claude through the proxy — no signup, no key swap.

70% compression <6s one hundred thousand tokens 1 line to integrate ↻ replay

Built by engineers shipping LLM infra at

Nord Security

kilo.health

nexos.ai basedcollective_

before / after

Paste real context. Watch it shrink.

Run condense on the three payload shapes that bloat modern agent stacks. Same input, same downstream model — the difference is everything between them.

autoregressive token-by-token

▍

0.00s waiting

parallel generation ~/payload/system-prompt.txt

0.00s

input— output— saved— faithfulness—

benchmark

Numbers, not adjectives.

avg compression

70%

across 12k real agent sessions (claude code, cline, codex, cursor). p50.

latency per one hundred thousand tokens

<6s

parallel generation, not sequential summarization. p95 under 8.4s.

faithfulness

90

semantic match vs. ground truth on LongMemEval. competitors: 84.

integrate

One line. Any stack.

Drop-in proxy for the OpenAI SDK and Claude Code — just point your existing client at api.condense.chat with a cxk_ key. Or call POST /v1/condense directly for fine-grained control. Your model, your tools, your evals — ours just makes them cheaper.

→ zero retraining, zero tuning
→ streams transparently
→ per-request override with x-condense-ratio / x-condense-target-tokens
→ read the docs

use cases

Wherever context is the bill.

coding agents

Ship longer sessions.

Tool outputs, file reads, test runs — the stuff that eats your window. Condense rewrites it on the edge, every turn, so sessions don't collapse into compact-and-lose-everything.

typical saving−64%

RAG pipelines

Fit more retrieved docs.

Pack 3× the chunks into the same window without re-ranking or dropping recall. Faithfulness holds at 90 on LongMemEval even with long, citation-heavy payloads.

typical saving−70%

chat products

Cut your per-msg cost.

System prompt + history + tool schemas add up fast when you're serving millions of turns. Condense runs once per request, transparent to your SDK.

typical saving−55%