Backend Management¶
The backend subsystem turns a config file into running MCP transports, restart behavior, and a unified registry.
Public backend states¶
The public enum in src/backend/mod.rs is intentionally small:
StartingHealthyUnhealthyStopped
There are no public states such as "Degraded", "Restarting", or "Circuit Open". Circuit-breaker timing exists internally in the health checker and is reflected through Unhealthy and Stopped.
BackendManager¶
BackendManager owns:
- the live backend map
- the backend config map
- per-backend semaphores
- per-backend retry configs
- rate limiters
- dynamic backend tracking
- managed prerequisite PIDs
- in-flight call draining
- optional call tracking hooks
The manager is transport-agnostic; it delegates concrete behavior to Backend implementations.
Supported transports¶
stdio¶
Child process backends communicate over stdin/stdout using rmcp.
Key details from src/backend/stdio.rs:
- stdin and stdout are piped
- stderr is piped to a 200-line ring buffer (exposed via
gatemini://backend/{name}andgatemini://health) - Unix builds place the child in a new process group
- a reaper task watches for unexpected exit and marks the backend stopped
That process-group isolation is what lets shutdown send SIGTERM to the whole backend tree instead of only the parent process.
streamable-http¶
Remote HTTP backends are implemented in src/backend/http.rs.
Use this transport in config:
backends:
github:
transport: streamable-http
url: "https://api.githubcopilot.com/mcp/"
headers:
Authorization: "Bearer ${GITHUB_PAT_TOKEN}"
The lenient client wrapper exists to tolerate some imperfect servers that omit expected response headers.
cli-adapter¶
CLI adapter backends let you publish tools without writing a separate MCP server.
You can either define tools inline:
backends:
jq-tools:
transport: cli-adapter
timeout: 30s
tools:
filter:
description: "Apply a jq filter to JSON input"
input_schema:
type: object
properties:
filter: { type: string }
input: { type: string }
required: [filter, input]
command: "jq '{{filter}}'"
stdin: "{{input}}"
output: json
Or point to an external adapter file:
backends:
ffmpeg-tools:
transport: cli-adapter
adapter_file: ~/.config/gatemini/adapters/ffmpeg.yaml
The adapter file path supports ~ expansion in the CLI adapter loader.
Dedicated instance mode¶
By default, all proxy sessions share a single backend instance (instance_mode: shared). For stateful backends like sequential-thinking that maintain per-session state, this causes state bleed across sessions.
Setting instance_mode: dedicated gives each proxy session its own isolated backend instance from an autoscaling pool:
backends:
sequential-thinking:
command: mcp-server-sequential-thinking
timeout: 120s
instance_mode: dedicated
pool:
min_idle: 1
max_instances: 10
acquire_timeout: 30s
Pool behavior:
- pre-warms
min_idleinstances at startup (default: 1) - lazily spawns new instances on demand up to
max_instances(default: 20) - on session disconnect, the assigned instance is stopped and a fresh one is spawned to maintain the idle pool
- if all instances are busy, new sessions wait up to
acquire_timeout(default: 30s) before failing
Only stdio and cli-adapter transports support dedicated mode. HTTP backends ignore the setting.
The pool implementation lives in src/backend/pool.rs. The health checker calls restart_pool_primary() instead of restart_backend() for dedicated backends.
| Setting | Default |
|---|---|
pool.min_idle | 1 |
pool.max_instances | 20 |
pool.acquire_timeout | 30s |
Concurrency, retries, and fallback¶
Per-backend limits come from config:
max_concurrent_callssemaphore_timeoutretryrate_limitfallback_chain
Retry behavior only applies to the Starting state, where the manager waits briefly for a backend that is still connecting. Calls to Unhealthy or Stopped backends fail immediately unless the manager routes into a fallback backend for a transient error.
Health checker¶
The health loop in src/backend/health.rs runs in three phases:
- ping healthy backends
- handle unhealthy and stopped backends
- retry pending configured backends that never became live
Current defaults from src/config.rs:
| Setting | Default |
|---|---|
health.interval | 30s |
health.timeout | 5s |
health.failure_threshold | 3 |
health.max_restarts | 5 |
health.restart_window | 60s |
health.restart_initial_backoff | 1s |
health.restart_max_backoff | 30s |
health.restart_timeout | 30s |
health.recovery_multiplier | 3 |
health.drain_timeout | 10s |
health.memory_check_interval | 30s |
health.memory_restart_cooldown | 60s |
Internal circuit-breaker behavior:
- healthy backends are pinged
- failures increment
consecutive_failures - once the threshold is reached, the backend is marked
Unhealthy - the health checker records
circuit_open_since - after
interval * recovery_multiplier, a half-open probe is attempted - if the probe fails, restart logic or another recovery window applies
Prerequisites¶
Some backends depend on another process already running. That is handled by src/backend/prerequisite.rs.
Features:
- optional
pgrep -fdedup viaprocess_match - optional managed lifecycle on shutdown
- startup delay before backend connect
If managed: true, Gatemini records the spawned prerequisite PID and terminates the process group during shutdown.
Process supervision¶
Backend child processes are supervised with configurable shutdown behavior and memory monitoring.
Graceful shutdown¶
When a stdio backend is stopped, Gatemini sends SIGTERM to the process group (or taskkill /T on Windows), then polls try_wait() every 100ms for up to shutdown_grace_period (default 5s). If the child hasn't exited by the deadline, SIGKILL is sent (or taskkill /F /T on Windows).
Prerequisite processes follow the same pattern with a fixed 5s grace period.
Stderr capture¶
Backend stderr is piped to a 200-line ring buffer per backend. Recent lines are exposed in gatemini://backend/{name} and gatemini://health. When a backend exits unexpectedly, the last stderr lines are logged at warn level.
Memory monitoring¶
The health checker samples RSS for all backends every memory_check_interval (default 30s) via a single ps call. Stats are exposed in gatemini://health. If a backend's RSS exceeds max_memory_mb, it is restarted with a cooldown of memory_restart_cooldown (default 60s). A warning is logged at 80% of the limit.
| Setting | Default |
|---|---|
shutdown_grace_period | 5s |
max_memory_mb | none |
pool.replenish_delay | 2s |
Output processing pipeline¶
Tool call responses pass through a three-stage pipeline before being returned to the client.
Stage 1 — Intent filtering: if the caller passes an intent string to call_tool_chain, the raw output is filtered to sections relevant to that intent before any further processing.
Stage 2 — Auto-chunk: if output_config.auto_chunk_json is enabled and the output is parseable JSON above output_config.chunk_threshold, the response is decomposed. Uniform arrays are collapsed to the first 3 items plus a count summary; non-uniform objects are rendered as a key-path summary.
Stage 3 — Truncation: if the output after the previous stages exceeds max_output_size, it is truncated using a head-60%/tail-40% split to preserve both the beginning and end of the response.
The tracker records bytes_returned (after the pipeline) and bytes_processed (raw bytes before) per tool call. These are exposed through gatemini://stats as a savings ratio and reduction percentage.
Composite tools¶
Composite tools are not a separate transport. They are registered under the virtual __composite backend and executed through the sandbox layer.
Important limitation:
- config watcher notices composite tool changes
- those changes are logged
- they are not hot-reloaded; daemon restart is required