Token Optimization

Every token costs money. QANATIX is designed to minimize token usage while maximizing information density.

The problem

A typical web-scraped result sends 2,000-5,000 tokens to your LLM — HTML fragments, navigation, ads, boilerplate. Most of it is noise. At scale, this burns budget fast.

QANATIX sends ~120 tokens per result in compact format. That's 15-40x more efficient than web scraping.

How it works

1. Structured data in, structured data out

Your data is uploaded as structured fields (collection_data), not free text. QANATIX returns exactly the fields that matter — no parsing needed.

2. Three output formats

Format	Tokens/result	Best for
`json`	~800	Applications that need full metadata
`yaml`	~200	Human readability, debugging
`compact`	~120	LLM context windows, MCP

3. Compact format details

The compact format returns a markdown table:

| # | Name | Score | Key Data |
|---|------|-------|----------|
| 1 | Stainless Steel Bolt M8x40 A2 | 0.87 | part_number: SS-M8-40-A2, material: Stainless Steel A2, price_eur: 0.12 |

Key optimizations:

Only top-level collection_data fields included (no nested objects)
Field names abbreviated where possible
No record IDs, timestamps, or metadata unless needed
Score included for relevance context

4. MCP uses compact automatically

The MCP server returns compact format by default. When Claude calls qanatix_search(), it gets the markdown table — maximum information, minimum tokens.

5. Limit your results

Default limit is 20, but most AI queries need 3-5 results. Set limit explicitly:

{"query": "M8 bolt", "limit": 5, "format": "compact"}

5 results in compact = ~600 tokens. Same query in full JSON = ~4,000 tokens.

Token budget comparison

Scenario	Web scraping	QANATIX (compact)
5 results	~15,000 tokens	~600 tokens
10 results	~30,000 tokens	~1,200 tokens
20 results	~60,000 tokens	~2,400 tokens

At $3/M input tokens (Claude Sonnet), 1,000 queries/day with 10 results each:

Web scraping: ~$90/day
QANATIX compact: ~$3.60/day

Best practices

Use compact format for all AI agent integrations
Set explicit limits — don't fetch 20 results if you need 5
Use filters to narrow results before search, reducing irrelevant matches
Use the MCP server — it handles format and limit optimization automatically
Cache on your side if you're making repeated identical queries (QANATIX also caches for 30 seconds server-side)

Token Optimization

On this page