Token Efficiency¶

Gatemini's context savings come from shape, not magic: a small fixed gateway surface, brief discovery responses by default, full schemas only on demand, and automatic output reduction on every call_tool_chain response.

Token comparison

Where the savings come from¶

Fixed gateway surface¶

Every session sees the same 7 gateway tools instead of every backend tool schema.

Brief defaults¶

search_tools defaults to brief=true
tool_info defaults to detail="brief"

That means discovery usually starts with names, short descriptions, parameter names, and generated call examples instead of full JSON Schema blobs.

On-demand schema loading¶

Agents only pull a full schema for tools they are likely to call.

Resources as compact indexes¶

gatemini://tools provides a compressed inventory view without the cost of loading every full schema.

Three-tier search reducing wasted queries¶

search_tools uses a three-tier fallback — BM25 → trigram substring → fuzzy Levenshtein correction — so typos and partial terms still return the right tool without a follow-up query. The response also includes try_also IDF-scored distinctive terms for narrowing down follow-up searches.

Automatic output processing in `call_tool_chain`¶

Every response from call_tool_chain passes through a pipeline of output reductions (all on by default, configurable via output_config):

Reduction	What it does
Smart truncation	Preserves head 60% and tail 40% of output at line boundaries when size exceeds the limit
Auto-chunking	JSON responses over 10 KB are recursively decomposed into path-labeled chunks (e.g., `results > items > [0-4]`) at a 4 KB target chunk size
Uniform array collapse	Arrays where all items share the same key structure are collapsed: first 3 items shown in full, remaining items summarized by identity fields (`id`, `name`, `title`, `slug`, `key`, `label`)
Intent filtering	When the `intent` parameter is set and output exceeds 5 KB, lines are scored for relevance to the intent string and non-matching sections are suppressed
Response metadata	When any reduction occurs, the response includes a metadata header showing KB returned vs. KB processed and the savings ratio

Session stats via `gatemini://stats`¶

The gatemini://stats resource exposes per-session byte accounting:

total bytes returned to context (after all reductions)
total bytes processed (before reduction)
per-tool savings breakdowns
estimated reduction percentage

This lets you quantify how much context the output pipeline is saving in a live session without any external tooling.

What is fixed versus variable¶

Fixed:

the number of gateway meta-tools
the existence of brief/full response modes
the existence of resource templates
the output reduction pipeline (smart truncation, auto-chunking, uniform array collapse, intent filtering)

Variable:

total backend count
total live tool count
schema size per tool
discovery depth per task
effective savings ratio (depends on backend response sizes)

Because of that, any hard-coded token figure should be treated as an example or local measurement, not as a permanent truth about the repo.

Practical measurement points¶

If you want to measure the real savings in your own config, compare:

the bytes returned by search_tools in brief mode versus full mode
the bytes returned by tool_info in brief mode versus full mode
the bytes for gatemini://tools versus serializing every registry entry with full schemas
the gatemini://stats resource before and after a representative task to see output-pipeline savings

Cache and startup interaction¶

The cache system in src/cache.rs improves startup ergonomics:

namespaced tools can be restored immediately from cache
optional embeddings can be restored with them
usage stats are restored into the tracker

Current details:

cache version: 4
default path: platform cache directory plus gatemini/cache.json
atomic writes: temp file plus rename

The old sibling-of-config cache path still exists in tests and migration helpers, but the normal runtime default is the platform cache directory.

What the code already tracks¶

CallTracker in src/tracker.rs records:

recent tool calls
per-tool usage counts
per-backend latency percentiles (HDR histogram, p50/p95/p99)
per-tool bytes returned (after reduction) and bytes processed (before reduction)
session start time and total calls

The record_bytes(tool, returned, processed) method is called after every call_tool_chain output pass. session_stats() aggregates this into the SessionStats struct that backs gatemini://stats.