Cut your OpenAI bill in half. One line of code.

TokenTrim compresses your prompts before they hit OpenAI or Anthropic. Same responses. 30–50% fewer input tokens. No infra to manage.

Before
import OpenAI from "openai";
const client = new OpenAI();

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: longPrompt }],
});
After
import { OptimizedOpenAI } from "tokentrim";
const client = new OptimizedOpenAI({
  cfsaverKey: process.env.TOKENTRIM_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: longPrompt }],
});

console.log(response.tokentrim);
// → { tokens_in: 658, tokens_in_compressed: 397,
//     tokens_saved: 261, cost_saved_usd: "0.0013" }
Get a beta key → npm install tokentrim

Pay less.

Typical savings: 30–50% on input tokens. Same quality, same responses.

Swap the import, get savings back in metadata.

Replace OpenAI with OptimizedOpenAI. Your call stays the same — response.tokentrim shows what you saved.

See every dollar saved.

Real-time dashboard showing compression ratio, tokens saved, and exact $ back in your pocket.

How it works

Your app calls OptimizedOpenAI.chat.completions.create() — same API you already use.

TokenTrim gateway compresses the prompt via LLMLingua-2, a Microsoft Research model for lossless prompt compression.

The compressed prompt hits OpenAI. You get the response plus response.tokentrim metadata showing exactly what you saved.

Join the beta.

Get your API key within 24 hours. Free during beta. Cancel anytime — there's nothing to cancel.