Skip to content

Backend Management

The backend subsystem turns a config file into running MCP transports, restart behavior, and a unified registry.

Backend lifecycle

Public backend states

The public enum in src/backend/mod.rs is intentionally small:

  • Starting
  • Healthy
  • Unhealthy
  • Stopped

There are no public states such as "Degraded", "Restarting", or "Circuit Open". Circuit-breaker timing exists internally in the health checker and is reflected through Unhealthy and Stopped.

BackendManager

BackendManager owns:

  • the live backend map
  • the backend config map
  • per-backend semaphores
  • per-backend retry configs
  • rate limiters
  • dynamic backend tracking
  • managed prerequisite PIDs
  • in-flight call draining
  • optional call tracking hooks

The manager is transport-agnostic; it delegates concrete behavior to Backend implementations.

Supported transports

stdio

Child process backends communicate over stdin/stdout using rmcp.

Key details from src/backend/stdio.rs:

  • stdin and stdout are piped
  • stderr is piped to a 200-line ring buffer (exposed via gatemini://backend/{name} and gatemini://health)
  • Unix builds place the child in a new process group
  • a reaper task watches for unexpected exit and marks the backend stopped

That process-group isolation is what lets shutdown send SIGTERM to the whole backend tree instead of only the parent process.

streamable-http

Remote HTTP backends are implemented in src/backend/http.rs.

Use this transport in config:

backends:
  github:
    transport: streamable-http
    url: "https://api.githubcopilot.com/mcp/"
    headers:
      Authorization: "Bearer ${GITHUB_PAT_TOKEN}"

The lenient client wrapper exists to tolerate some imperfect servers that omit expected response headers.

cli-adapter

CLI adapter backends let you publish tools without writing a separate MCP server.

You can either define tools inline:

backends:
  jq-tools:
    transport: cli-adapter
    timeout: 30s
    tools:
      filter:
        description: "Apply a jq filter to JSON input"
        input_schema:
          type: object
          properties:
            filter: { type: string }
            input: { type: string }
          required: [filter, input]
        command: "jq '{{filter}}'"
        stdin: "{{input}}"
        output: json

Or point to an external adapter file:

backends:
  ffmpeg-tools:
    transport: cli-adapter
    adapter_file: ~/.config/gatemini/adapters/ffmpeg.yaml

The adapter file path supports ~ expansion in the CLI adapter loader.

Dedicated instance mode

By default, all proxy sessions share a single backend instance (instance_mode: shared). For stateful backends like sequential-thinking that maintain per-session state, this causes state bleed across sessions.

Setting instance_mode: dedicated gives each proxy session its own isolated backend instance from an autoscaling pool:

backends:
  sequential-thinking:
    command: mcp-server-sequential-thinking
    timeout: 120s
    instance_mode: dedicated
    pool:
      min_idle: 1
      max_instances: 10
      acquire_timeout: 30s

Pool lifecycle

Pool behavior:

  • pre-warms min_idle instances at startup (default: 1)
  • lazily spawns new instances on demand up to max_instances (default: 20)
  • on session disconnect, the assigned instance is stopped and a fresh one is spawned to maintain the idle pool
  • if all instances are busy, new sessions wait up to acquire_timeout (default: 30s) before failing

Only stdio and cli-adapter transports support dedicated mode. HTTP backends ignore the setting.

The pool implementation lives in src/backend/pool.rs. The health checker calls restart_pool_primary() instead of restart_backend() for dedicated backends.

Setting Default
pool.min_idle 1
pool.max_instances 20
pool.acquire_timeout 30s

Concurrency, retries, and fallback

Per-backend limits come from config:

  • max_concurrent_calls
  • semaphore_timeout
  • retry
  • rate_limit
  • fallback_chain

Retry behavior only applies to the Starting state, where the manager waits briefly for a backend that is still connecting. Calls to Unhealthy or Stopped backends fail immediately unless the manager routes into a fallback backend for a transient error.

Health checker

The health loop in src/backend/health.rs runs in three phases:

  1. ping healthy backends
  2. handle unhealthy and stopped backends
  3. retry pending configured backends that never became live

Health checker

Current defaults from src/config.rs:

Setting Default
health.interval 30s
health.timeout 5s
health.failure_threshold 3
health.max_restarts 5
health.restart_window 60s
health.restart_initial_backoff 1s
health.restart_max_backoff 30s
health.restart_timeout 30s
health.recovery_multiplier 3
health.drain_timeout 10s
health.memory_check_interval 30s
health.memory_restart_cooldown 60s

Internal circuit-breaker behavior:

  • healthy backends are pinged
  • failures increment consecutive_failures
  • once the threshold is reached, the backend is marked Unhealthy
  • the health checker records circuit_open_since
  • after interval * recovery_multiplier, a half-open probe is attempted
  • if the probe fails, restart logic or another recovery window applies

Prerequisites

Some backends depend on another process already running. That is handled by src/backend/prerequisite.rs.

Features:

  • optional pgrep -f dedup via process_match
  • optional managed lifecycle on shutdown
  • startup delay before backend connect

If managed: true, Gatemini records the spawned prerequisite PID and terminates the process group during shutdown.

Process supervision

Backend child processes are supervised with configurable shutdown behavior and memory monitoring.

Graceful shutdown

When a stdio backend is stopped, Gatemini sends SIGTERM to the process group (or taskkill /T on Windows), then polls try_wait() every 100ms for up to shutdown_grace_period (default 5s). If the child hasn't exited by the deadline, SIGKILL is sent (or taskkill /F /T on Windows).

Prerequisite processes follow the same pattern with a fixed 5s grace period.

Stderr capture

Backend stderr is piped to a 200-line ring buffer per backend. Recent lines are exposed in gatemini://backend/{name} and gatemini://health. When a backend exits unexpectedly, the last stderr lines are logged at warn level.

Memory monitoring

The health checker samples RSS for all backends every memory_check_interval (default 30s) via a single ps call. Stats are exposed in gatemini://health. If a backend's RSS exceeds max_memory_mb, it is restarted with a cooldown of memory_restart_cooldown (default 60s). A warning is logged at 80% of the limit.

Setting Default
shutdown_grace_period 5s
max_memory_mb none
pool.replenish_delay 2s

Output processing pipeline

Tool call responses pass through a three-stage pipeline before being returned to the client.

Stage 1 — Intent filtering: if the caller passes an intent string to call_tool_chain, the raw output is filtered to sections relevant to that intent before any further processing.

Stage 2 — Auto-chunk: if output_config.auto_chunk_json is enabled and the output is parseable JSON above output_config.chunk_threshold, the response is decomposed. Uniform arrays are collapsed to the first 3 items plus a count summary; non-uniform objects are rendered as a key-path summary.

Stage 3 — Truncation: if the output after the previous stages exceeds max_output_size, it is truncated using a head-60%/tail-40% split to preserve both the beginning and end of the response.

The tracker records bytes_returned (after the pipeline) and bytes_processed (raw bytes before) per tool call. These are exposed through gatemini://stats as a savings ratio and reduction percentage.

Composite tools

Composite tools are not a separate transport. They are registered under the virtual __composite backend and executed through the sandbox layer.

Important limitation:

  • config watcher notices composite tool changes
  • those changes are logged
  • they are not hot-reloaded; daemon restart is required