Token Savings Benchmarks¶
This repository does not ship one canonical benchmark fixture for registry size, backend count, or schema mix. Treat this page as a measurement guide and a place to record local snapshots, not as a permanent source-of-truth count.
What to benchmark¶
The most meaningful comparisons are:
search_toolsbrief versus fulltool_infobrief versus fullgatemini://toolsversus serializing all full schemas- fixed gateway-tool overhead versus direct exposure of every backend tool
call_tool_chainoutput before versus after the reduction pipeline (visible viagatemini://stats)
Suggested methodology¶
Discovery payload size¶
For a chosen task:
- call
search_toolswithbrief=true - call
search_toolswithbrief=false - compare response bytes or tokenized output
Tool inspection payload size¶
For a chosen tool:
- call
tool_info(detail="brief") - call
tool_info(detail="full") - compare response bytes or tokenized output
Session-level overhead¶
Record:
- fixed gateway tool definitions
- server instruction block
- number of discovery steps needed before the first execution
Output pipeline savings¶
After running a representative task:
- read
gatemini://stats - record
bytes_returned,bytes_processed, and the reduction percentage - note which tools generated the largest reductions (JSON auto-chunking and uniform array collapse produce the highest ratios for data-heavy backends)
Example result template¶
Registry snapshot date:
Backends loaded:
Tools loaded:
search_tools brief:
search_tools full:
tool_info brief:
tool_info full:
gatemini://tools:
all full schemas:
Session stats (gatemini://stats):
bytes_returned:
bytes_processed:
reduction_pct:
top reducing tools:
Interpretation¶
You should expect the relative advantage of Gatemini to grow as:
- backend count grows
- tool count grows
- schema size grows
- backend responses contain large JSON payloads (auto-chunking kicks in above 10 KB)
- backends return uniform-structure arrays (collapse is most aggressive for repeating-object arrays)
The gateway surface stays fixed while the naive "expose everything" surface expands with every backend you add. The output reduction pipeline provides an additional layer of savings orthogonal to discovery efficiency.