Your context condensed

condense.chat compresses up to 70% without losing meaning in under 6 seconds per one hundred thousand tokens.

runs claude through the proxy — no signup, no key swap.
70% compression <6s one hundred thousand tokens 1 line to integrate ↻ replay
Built by engineers shipping LLM infra at
Nord Security kilo.health nexos.ai basedcollective_
before / after

Paste real context. Watch it shrink.

Run condense on the three payload shapes that bloat modern agent stacks. Same input, same downstream model — the difference is everything between them.

autoregressive token-by-token
0.00s waiting
parallel generation ~/payload/system-prompt.txt
0.00s
input output saved faithfulness
benchmark

Numbers, not adjectives.

avg compression
70%
across 12k real agent sessions (claude code, cline, codex, cursor). p50.
latency per one hundred thousand tokens
<6s
parallel generation, not sequential summarization. p95 under 8.4s.
faithfulness
90
semantic match vs. ground truth on LongMemEval. competitors: 84.
integrate

One line. Any stack.

Drop-in proxy for the OpenAI SDK and Claude Code — just point your existing client at api.condense.chat with a cxk_ key. Or call POST /v1/condense directly for fine-grained control. Your model, your tools, your evals — ours just makes them cheaper.

→ zero retraining, zero tuning
→ streams transparently
→ per-request override with x-condense-ratio / x-condense-target-tokens

      
use cases

Wherever context is the bill.

coding agents

Ship longer sessions.

Tool outputs, file reads, test runs — the stuff that eats your window. Condense rewrites it on the edge, every turn, so sessions don't collapse into compact-and-lose-everything.

typical saving−64%
RAG pipelines

Fit more retrieved docs.

Pack 3× the chunks into the same window without re-ranking or dropping recall. Faithfulness holds at 90 on LongMemEval even with long, citation-heavy payloads.

typical saving−70%
chat products

Cut your per-msg cost.

System prompt + history + tool schemas add up fast when you're serving millions of turns. Condense runs once per request, transparent to your SDK.

typical saving−55%

Your next turn starts with fewer tokens.

Sign up to claim a key, then drop one command into your terminal.

Sign up →