Max Subscription vs API for Telegram Bot

Date: 2026-02-28 Status: Research complete — ready for decision

The Problem

The Telegram bot (@heyroyalbot) uses the Anthropic API directly (claude-sonnet-4-20250514). The API key is on Eric's Individual Org at console.anthropic.com with $4.68 free credits, no credit card, and a brutal 10K input tokens/minute rate limit on the free tier.

After adding a skills integration (bot fetches skill files from GitHub before acting), the bot hit the rate limit during a test — multiple tool calls + large skill file content exceeded 10K tokens/minute. The bot errored out after escalating retries (11s → 36s → 56s → failure).

Key Finding: Max Plan ≠ API Access

Eric pays $200/month for Claude Max 20x. But Max and the API are completely separate products:

The Exception: Claude Code CLI (claude -p)

Claude Code's -p flag is designed for scripted/automated use and authenticates via Max subscription OAuth. Multiple projects already run Telegram bots this way:

ToS Status

Per Anthropic's terms and the Feb 2026 clarification: - Allowed: Spawning claude -p as a subprocess (it's Anthropic's own product) - Prohibited: Extracting OAuth tokens and making raw API calls - Gray area: Heavy automated use may exceed "ordinary individual usage" - Source: https://autonomee.ai/blog/claude-code-terms-of-service-explained/ - Source: https://www.theregister.com/2026/02/20/anthropic_clarifies_ban_third_party_claude_access/

Token Overhead: Problem & Solution

The 50K Problem (default)

Each claude -p invocation loads everything from the user environment:

Component Tokens
System prompt ~3,200
18 built-in tools ~11,600
MCP tools (Chrome, Gmail, Drive, etc.) 10,000-32,000+
CLAUDE.md + settings ~5,000
Skills/plugins ~1,000+
Total ~30-50K+

Source: https://github.com/Piebald-AI/claude-code-system-prompts

The 3-5K Solution (verified)

Strip everything the bot doesn't need using CLI flags:

claude -p \
  --system-prompt "Custom bot prompt here" \
  --tools "Bash" \
  --setting-sources "" \
  --strict-mcp-config \
  --mcp-config /path/to/empty-mcp.json \
  --disable-slash-commands \
  --no-session-persistence \
  --dangerously-skip-permissions \
  "User message here"
Flag What it strips Savings
--system-prompt "..." Default 3.2K Claude Code system prompt ~3K
--tools "Bash" 17 of 18 built-in tool schemas ~10K
--setting-sources "" All CLAUDE.md files + settings.json ~5K
--strict-mcp-config + empty JSON ALL MCP servers (Chrome, Gmail, etc.) ~10-32K
--disable-slash-commands All 50+ skill definitions ~1K+

DEV.to benchmark confirmed 10x reduction (50K → 5K per turn): https://dev.to/jungjaehoon/why-claude-code-subagents-waste-50k-tokens-per-turn-and-how-to-fix-it-41ma

Additional optimization: ENABLE_TOOL_SEARCH=auto:0 in settings defers MCP tool schemas to on-demand loading, saving 32K tokens: https://paddo.dev/blog/claude-code-hidden-mcp-flag/

Even Better: Claude Agent SDK (persistent process)

The Python SDK (pip install claude-agent-sdk) keeps a single subprocess alive across messages:

from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

options = ClaudeAgentOptions(
    system_prompt="Custom bot prompt",
    tools=["Bash"],
    setting_sources=[],
    permission_mode="bypassPermissions",
)

async with ClaudeSDKClient(options=options) as client:
    # First message pays startup cost (~5K with optimization)
    await client.query("User message")
    # Subsequent messages reuse the session — minimal overhead
    await client.query("Follow-up")

Benefits: - No subprocess spawn per message - Session context maintained - Prompt caching reduces repeated content to 10% cost - Source: https://platform.claude.com/docs/en/agent-sdk/python

Stream-JSON persistent process (raw CLI alternative)

claude -p \
  --input-format stream-json \
  --output-format stream-json \
  --session-id "bot-session" \
  [optimization flags above]

Keep process alive, pipe messages through stdin. System prompt loaded once.

Max 20x Token Budget

Scenario Tokens/msg Messages per 5hr window Messages per day
Unoptimized (50K) ~50,000 ~4 ~24
Optimized (5K) ~5,000 ~44 ~264
Minimal (text-only) ~3,000 ~73 ~438

Eric's actual usage: 5-20 messages/day. Optimized approach uses <1% of daily capacity.

Options Summary (Updated)

Option Monthly Cost Speed Effort Token Efficiency
A: Claude Agent SDK on droplet $0 (Max covers it) Medium Significant rewrite 5K/msg optimized
B: Add credit card to API ~$3-9/mo Fast (direct API) Zero code changes N/A (pay per token)
C: claude -p subprocess $0 (Max covers it) Slower (spawn per msg) Moderate rewrite 5K/msg optimized
D: Local LLM on Mac Mini $599-1199 hardware Varies Major effort Unlimited but lower quality

Recommendation

Option A (Claude Agent SDK) is the best long-term play if we're rewriting anyway. Persistent process avoids spawn overhead, Max subscription covers all costs, and the Python API is clean.

Option B (credit card) is the fastest fix with zero code changes. The skills integration already works. $3-9/mo is trivial.

Architecture: Current vs Claude Code Approach

Current Bot (API direct)

Claude Code Approach

Key Tradeoff

The current bot's 14 custom tools are well-designed and efficient. Rewriting to use Claude Code means either: 1. Accepting that Claude will use Bash + raw SSH/gh commands (less structured, more token-heavy) 2. Setting up MCP servers for WordPress + GitHub (adds back MCP overhead)

Reference Projects

Project Approach Optimization Notes
Claudegram Agent SDK None documented Full tool access, session resume
claude-code-telegram SDK + CLI fallback CLAUDE_ALLOWED_TOOLS Per-user spending limits
Ductor CLI subprocess None documented Docker sandboxing, cron jobs
claude-telegram-relay CLI subprocess None documented Minimal, cross-platform

Vision: Full Dev Assistant via Telegram

The ideal workflow Eric wants: 1. Client emails about a bug 2. Eric messages the bot via Telegram 3. Bot (Claude Code) SSHs to production, reads files, identifies the issue 4. Creates a branch, fixes the code, pushes to GitHub 5. Optionally deploys fix to production via SSH 6. Reports back in Telegram with PR link and summary

This requires Claude Code (not just API calls) — meaning web search, file editing, Bash, Git, all built-in tools. The current $4 VPS (512MB RAM) can't run Claude Code.

Mac Mini vs Bigger VPS

Factor Mac Mini ($599 M4 16GB) Bigger VPS ($24/mo 4GB)
Claude Code Yes Yes
Max subscription auth Yes Yes
Local by Flywheel Yes (macOS) No (Linux)
Browser automation Yes No (no display)
Monthly cost $0 (one-time purchase) $24/mo
Networking Cloudflare Tunnel (free) Static IP included
Uptime Depends on home power/ISP 99.9% SLA

Current direction: Mac Mini is the long-term play. Holding until ready to purchase.

Existing Telegram + Claude Code Projects

Current State of the Bot

The skills integration we added to the system prompt WORKS — the bot successfully fetched wordpress-modules/SKILL.md before answering a CalForever question. It hit the API rate limit (10K tokens/min on free tier with no credit card) before it could send the response.

Quick fix if needed: Add a credit card to console.anthropic.com to unlock rate limits. Zero code changes, bot works as-is with skills.

Decision

Holding on the Mac Mini migration. Research is complete and documented. When ready: - [ ] Purchase Mac Mini (M4, 16GB minimum — 24GB recommended) - [ ] Set up Cloudflare Tunnel for Telegram webhook access - [ ] Install Claude Code, authenticate with Max subscription - [ ] Use Claudegram or similar as the Telegram bridge - [ ] Migrate SSH keys from droplet - [ ] Optimize Claude Code invocations (see token optimization section above) - [ ] Test the full bug-fix workflow end to end

Short-term option: Add credit card to console.anthropic.com to unblock the current bot + skills integration.