Token Savings Benchmarks¶

This repository does not ship one canonical benchmark fixture for registry size, backend count, or schema mix. Treat this page as a measurement guide and a place to record local snapshots, not as a permanent source-of-truth count.

What to benchmark¶

The most meaningful comparisons are:

search_tools brief versus full
tool_info brief versus full
gatemini://tools versus serializing all full schemas
fixed gateway-tool overhead versus direct exposure of every backend tool
call_tool_chain output before versus after the reduction pipeline (visible via gatemini://stats)

Suggested methodology¶

Discovery payload size¶

For a chosen task:

call search_tools with brief=true
call search_tools with brief=false
compare response bytes or tokenized output

Tool inspection payload size¶

For a chosen tool:

call tool_info(detail="brief")
call tool_info(detail="full")
compare response bytes or tokenized output

Session-level overhead¶

Record:

fixed gateway tool definitions
server instruction block
number of discovery steps needed before the first execution

Output pipeline savings¶

After running a representative task:

read gatemini://stats
record bytes_returned, bytes_processed, and the reduction percentage
note which tools generated the largest reductions (JSON auto-chunking and uniform array collapse produce the highest ratios for data-heavy backends)

Example result template¶

Registry snapshot date:
Backends loaded:
Tools loaded:

search_tools brief:
search_tools full:
tool_info brief:
tool_info full:
gatemini://tools:
all full schemas:

Session stats (gatemini://stats):
  bytes_returned:
  bytes_processed:
  reduction_pct:
  top reducing tools:

Interpretation¶

You should expect the relative advantage of Gatemini to grow as:

backend count grows
tool count grows
schema size grows
backend responses contain large JSON payloads (auto-chunking kicks in above 10 KB)
backends return uniform-structure arrays (collapse is most aggressive for repeating-object arrays)

The gateway surface stays fixed while the naive "expose everything" surface expands with every backend you add. The output reduction pipeline provides an additional layer of savings orthogonal to discovery efficiency.