Token Efficiency¶
Gatemini's context savings come from shape, not magic: a small fixed gateway surface, brief discovery responses by default, full schemas only on demand, and automatic output reduction on every call_tool_chain response.
Where the savings come from¶
Fixed gateway surface¶
Every session sees the same 7 gateway tools instead of every backend tool schema.
Brief defaults¶
search_toolsdefaults tobrief=truetool_infodefaults todetail="brief"
That means discovery usually starts with names, short descriptions, parameter names, and generated call examples instead of full JSON Schema blobs.
On-demand schema loading¶
Agents only pull a full schema for tools they are likely to call.
Resources as compact indexes¶
gatemini://tools provides a compressed inventory view without the cost of loading every full schema.
Three-tier search reducing wasted queries¶
search_tools uses a three-tier fallback — BM25 → trigram substring → fuzzy Levenshtein correction — so typos and partial terms still return the right tool without a follow-up query. The response also includes try_also IDF-scored distinctive terms for narrowing down follow-up searches.
Automatic output processing in call_tool_chain¶
Every response from call_tool_chain passes through a pipeline of output reductions (all on by default, configurable via output_config):
| Reduction | What it does |
|---|---|
| Smart truncation | Preserves head 60% and tail 40% of output at line boundaries when size exceeds the limit |
| Auto-chunking | JSON responses over 10 KB are recursively decomposed into path-labeled chunks (e.g., results > items > [0-4]) at a 4 KB target chunk size |
| Uniform array collapse | Arrays where all items share the same key structure are collapsed: first 3 items shown in full, remaining items summarized by identity fields (id, name, title, slug, key, label) |
| Intent filtering | When the intent parameter is set and output exceeds 5 KB, lines are scored for relevance to the intent string and non-matching sections are suppressed |
| Response metadata | When any reduction occurs, the response includes a metadata header showing KB returned vs. KB processed and the savings ratio |
Session stats via gatemini://stats¶
The gatemini://stats resource exposes per-session byte accounting:
- total bytes returned to context (after all reductions)
- total bytes processed (before reduction)
- per-tool savings breakdowns
- estimated reduction percentage
This lets you quantify how much context the output pipeline is saving in a live session without any external tooling.
What is fixed versus variable¶
Fixed:
- the number of gateway meta-tools
- the existence of brief/full response modes
- the existence of resource templates
- the output reduction pipeline (smart truncation, auto-chunking, uniform array collapse, intent filtering)
Variable:
- total backend count
- total live tool count
- schema size per tool
- discovery depth per task
- effective savings ratio (depends on backend response sizes)
Because of that, any hard-coded token figure should be treated as an example or local measurement, not as a permanent truth about the repo.
Practical measurement points¶
If you want to measure the real savings in your own config, compare:
- the bytes returned by
search_toolsin brief mode versus full mode - the bytes returned by
tool_infoin brief mode versus full mode - the bytes for
gatemini://toolsversus serializing every registry entry with full schemas - the
gatemini://statsresource before and after a representative task to see output-pipeline savings
Cache and startup interaction¶
The cache system in src/cache.rs improves startup ergonomics:
- namespaced tools can be restored immediately from cache
- optional embeddings can be restored with them
- usage stats are restored into the tracker
Current details:
- cache version:
4 - default path: platform cache directory plus
gatemini/cache.json - atomic writes: temp file plus rename
The old sibling-of-config cache path still exists in tests and migration helpers, but the normal runtime default is the platform cache directory.
What the code already tracks¶
CallTracker in src/tracker.rs records:
- recent tool calls
- per-tool usage counts
- per-backend latency percentiles (HDR histogram, p50/p95/p99)
- per-tool bytes returned (after reduction) and bytes processed (before reduction)
- session start time and total calls
The record_bytes(tool, returned, processed) method is called after every call_tool_chain output pass. session_stats() aggregates this into the SessionStats struct that backs gatemini://stats.