Build Token-Efficient AI Workflows
A practical guide to reducing LLM costs without compromising output quality. From input formatting to pipeline architecture, every optimization that actually matters.
LLM costs scale directly with token count. Most teams focus on choosing cheaper models or reducing output length, but the highest-impact optimizations happen at the input stage. Cleaning up what you send to the model is easier, safer, and more effective than constraining what comes back.
1. Format Input as Markdown
The single most impactful change. Converting PDFs, DOCX files, and HTML to clean Markdown before sending to an LLM reduces token count by 80-95% while preserving all semantic content.
- PDFs -- strip layout metadata, font encoding, and page artifacts (up to 95% token reduction)
- DOCX -- remove Word's XML style definitions and revision history (up to 89% reduction)
- HTML -- extract content from tags, scripts, and stylesheets (up to 85% reduction)
- AI output -- normalize formatting inconsistencies from ChatGPT, Claude, and others
2. Optimize Context Window Usage
Your context window is a budget. Spend it on information the model actually needs.
- Summarize long documents -- send a summary first, then the relevant sections
- Remove boilerplate -- headers, footers, legal disclaimers, and repeated disclaimers waste context
- Chunk strategically -- split large documents at semantic boundaries, not arbitrary token limits
- Use references -- instead of pasting 10 pages, paste 2 and reference the rest
3. Design Efficient Pipelines
For production AI workflows, architecture decisions compound over millions of requests.
- Pre-process before prompting -- clean and format documents in a preprocessing step
- Cache repeated context -- system prompts and reference documents should be cached, not re-sent
- Route by complexity -- use smaller, cheaper models for simple tasks and reserve large models for complex ones
- Batch similar requests -- combine related queries into single prompts where possible
4. Monitor and Measure
You can't optimize what you don't track.
- Log token counts -- track input and output tokens per request type
- Calculate cost per task -- not just cost per token, but cost per successful outcome
- A/B test formats -- compare response quality between raw and formatted inputs
- Set budgets -- establish per-task token budgets and alert on overruns
Token efficiency isn't about being cheap -- it's about being intentional. Teams that optimize their AI input pipeline spend less, get faster responses, and often see better output quality because the model focuses on content instead of parsing noise.
Start with the easiest optimization
Convert your documents to clean Markdown before sending to any LLM.
Open Converter