Skip to main content
  • Claude Code MCP-CLI Mode for Token Optimization

    If you use multiple MCP servers with Claude Code, you probably noticed your context window filling up fast. Like, 40-50k tokens gone before you even ask a question. That’s because every MCP tool schema gets loaded into the system prompt at startup.

    The Problem

    MCP servers are great for extending Claude’s capabilities, but the default behavior dumps all tool definitions upfront. Got five servers with 20 tools each? That’s a lot of JSON schemas eating your context. Some users reported starting sessions at 63% context usage—leaving barely any room for actual work.

    The Fix

    There’s an experimental feature that switches to on-demand tool loading. Instead of preloading everything, Claude uses bash commands to fetch tool info only when needed.

    Enable it by adding this to your shell config:

    export ENABLE_EXPERIMENTAL_MCP_CLI=true

    Restart your terminal and Claude Code. Run /context to verify—your MCP token usage should drop significantly.

    How It Works

    Instead of having all tool schemas in memory, Claude now uses mcp-cli commands:

    # List available tools
    mcp-cli tools
    
    # Get schema for a specific tool
    mcp-cli info server/tool
    
    # Call the tool
    mcp-cli call server/tool '{"param": "value"}'

    The tradeoff is an extra step before using any MCP tool, but the context savings are substantial—going from 63% to 11% at session start in some cases. That’s roughly 100k tokens reclaimed for actual work.

    TL;DR

    Set ENABLE_EXPERIMENTAL_MCP_CLI=true in your shell config to load MCP tools on-demand instead of upfront. Saves a ton of context if you’re using multiple MCP servers.

    Unlike DDR5 prices in 2025, this memory is actually getting cheaper.