Building a Gemini MCP Server: Integrating Google's AI with Claude Code¶
This post documents the architecture and implementation of a production-ready MCP (Model Context Protocol) server that wraps Google's Gemini CLI, enabling Claude Code to leverage Gemini as an adjacent AI for deep analysis, code review, and structured AI-to-AI debates.
Why Combine Claude and Gemini?¶
Each AI model has unique strengths:
| Capability | Claude | Gemini |
|---|---|---|
| Context Window | 200k tokens | 1M tokens |
| Coding | Excellent | Excellent |
| Web Search | Limited | Native Google Search |
| Image Analysis | Good | Excellent |
| Reasoning | Excellent | Excellent (Gemini 3 Pro) |
By integrating Gemini into Claude Code via MCP, you get the best of both worlds:
- Claude handles primary development workflow
- Gemini provides second opinions, large context analysis, and web research
Architecture Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Claude Code │
│ │ │
│ ▼ │
│ ┌─────────────────────┐ │
│ │ MCP Protocol │ │
│ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ gemini-mcp server (Docker) │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ FastMCP Server (Python) │ │ │
│ │ │ - 5 unified tools (v0.5.0) │ │ │
│ │ │ - HTTP transport (port 33020) │ │ │
│ │ │ - Health check endpoint │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌────────────────────────────────────────────────────┐ │ │
│ │ │ Gemini CLI (Node.js) │ │ │
│ │ │ - OAuth authentication │ │ │
│ │ │ - Google Search grounding │ │ │
│ │ │ - Multiple model support │ │ │
│ │ └────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Tool Consolidation: 44 → 5 Tools¶
Version 0.5.0 introduced a major token optimization by consolidating 44 individual tools into 5 unified tools:
| Tool | Purpose | Token Savings |
|---|---|---|
gemini |
All AI query modes | 12 tools → 1 |
analyze |
Code/file/codebase analysis | 15 tools → 1 |
search |
Web and docs research | 6 tools → 1 |
debate |
AI-to-AI discussions | 6 tools → 1 |
trading |
Domain-specific analysis | 5 tools → 1 |
Result: ~93% token reduction (~29,000 → ~2,500 tokens)
Tool 1: gemini - Unified AI Query¶
async def gemini(
prompt: str,
mode: str = "fast", # fast, reasoning, explain, summarize, models
model: str = None, # Override: gemini-3-pro-preview, gemini-2.5-flash
context: str = "",
) -> dict | str:
Mode options:
| Mode | Model Used | Use Case |
|---|---|---|
fast |
gemini-2.5-flash | Quick questions, syntax help |
reasoning |
gemini-3-pro-preview | Complex analysis, architecture |
explain |
gemini-2.5-flash | Detailed explanations |
summarize |
gemini-2.5-flash | Bullet point summaries |
models |
N/A | List available models |
Tool 2: analyze - Code Analysis¶
async def analyze(
target: str, # File path, directory, code snippet, or git diff
instruction: str, # What to look for
focus: str = "general", # general, security, performance, architecture, patterns
) -> dict | str:
The tool automatically detects input type:
- File path: Reviews single file
- Directory: Analyzes entire codebase with 1M token context
- Code snippet: Inline code review
- Git diff: PR review with recommendations
Tool 3: search - Web Research¶
async def search(
query: str,
depth: str = "quick", # quick, deep, academic, docs
topic_context: str = None,
) -> dict:
Depth options:
| Depth | Description |
|---|---|
quick |
Single web search |
deep |
Comprehensive multi-step research |
academic |
Academic papers, scholarly sources |
docs |
Library documentation lookup |
Docker Configuration¶
Multi-Stage Dockerfile¶
FROM python:3.12-slim as builder
# Install Node.js 22 for Gemini CLI
RUN curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y nodejs
# Install Gemini CLI
RUN npm install -g @google/gemini-cli@nightly
# Install Python dependencies
COPY pyproject.toml .
RUN pip install --no-cache-dir .
FROM python:3.12-slim
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin/gemini /usr/local/bin/gemini
# Run as non-root user
RUN useradd -m gemini
USER gemini
COPY src/gemini_mcp /app/gemini_mcp
WORKDIR /app
CMD ["python", "-m", "gemini_mcp.server"]
Docker Compose¶
services:
gemini-mcp-http:
build:
context: .
dockerfile: docker/Dockerfile
container_name: gemini_mcp_http
restart: unless-stopped
ports:
- "33020:8765"
environment:
- GEMINI_MCP_TRANSPORT=streamable-http
- GEMINI_MCP_SERVER_HOST=0.0.0.0
- GEMINI_MCP_SERVER_PORT=8765
- GEMINI_MCP_GEMINI_DEFAULT_MODEL=gemini-3-pro-preview
- GEMINI_MCP_GEMINI_TIMEOUT=300
- GEMINI_MCP_GEMINI_ACTIVITY_TIMEOUT=1800
volumes:
# Mount OAuth credentials (read-only staging)
- ${HOME}/.gemini:/home/gemini/.gemini-host:ro
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8765/health"]
interval: 30s
timeout: 10s
retries: 3
OAuth Authentication Strategy¶
Gemini CLI uses Google OAuth, not API keys. The container needs write access to refresh tokens, but we mount credentials read-only for security.
Solution: Copy-and-symlink strategy in entrypoint:
#!/bin/bash
# entrypoint.sh
# Copy credentials from read-only mount to writable location
cp -r /home/gemini/.gemini-host/* /home/gemini/.gemini-active/ 2>/dev/null || true
# Create symlink for Gemini CLI
ln -sf /home/gemini/.gemini-active /home/gemini/.gemini
# Run server
exec python -m gemini_mcp.server
Streaming for Long Operations¶
Large codebase analysis can run for minutes. We use activity-based timeouts instead of fixed timeouts:
class GeminiRequest:
prompt: str
stream: bool = False
activity_timeout: int = 1800 # Reset on each event (30 min max between events)
How it works:
- Use
--output-format stream-jsonfor real-time JSONL events - Events include:
init,message,tool_use,tool_result,result - Activity timeout resets on each event received
- Only times out if Gemini stops producing output for 30+ minutes
Claude Code Integration¶
HTTP Transport (Recommended)¶
Add to your Claude profile JSON:
Stateless Mode for MCP Compatibility¶
Claude Code doesn't maintain session IDs between requests. Enable stateless HTTP:
mcp = FastMCP(
"gemini-mcp",
host=config.server_host,
port=config.server_port,
stateless_http=True, # Critical for Claude Code
json_response=True,
)
Health Check Endpoint¶
Custom HTTP endpoint for Docker/Kubernetes health probes:
@mcp.custom_route("/health", methods=["GET"])
async def health_check(request: Request) -> JSONResponse:
return JSONResponse({
"status": "healthy",
"service": "gemini-mcp",
"version": "0.5.0",
"tools_enabled": ["gemini", "analyze", "search", "debate", "trading"],
"models": {
"default": "gemini-3-pro-preview",
"fast": "gemini-2.5-flash",
},
})
Pricing: Flat Rate, Not Per-Token¶
Critical distinction: Gemini CLI uses Google AI subscription, not per-token API pricing.
| Tier | Price | Limits |
|---|---|---|
| Free (AI Studio) | $0/mo | Basic daily limits |
| Google AI Pro | $19.99/mo | Higher daily requests |
| Google AI Ultra | $149.99/mo | Highest limits (20x Pro) |
Benefits:
- Unlimited tokens per request (1M context window)
- Predictable costs
- No API key management
- All models included
AI-to-AI Debates¶
The debate tool enables structured Claude-Gemini discussions:
async def debate(
topic: str,
action: str = "start", # start, list, stats, search, load, context
strategy: str = "collaborative", # collaborative, adversarial, socratic, devil_advocate
context: str = "",
debate_id: str = None,
) -> dict:
Use cases:
- Architectural decisions
- Strategy validation
- Risk assessment
- Design reviews
Production Checklist¶
- Docker container running (
docker ps | grep gemini_mcp) - Health endpoint responding (
curl http://localhost:33020/health) - OAuth credentials mounted
- MCP configuration in Claude profile
- Systemd service enabled (optional)
Conclusion¶
The Gemini MCP server transforms Claude Code into a dual-AI development environment. By leveraging Gemini's 1M token context, Google Search integration, and reasoning capabilities alongside Claude's excellent coding abilities, you get a more powerful development assistant.
The tool consolidation from 44 to 5 tools demonstrates how thoughtful API design can dramatically improve token efficiency while maintaining full functionality.
Resources: