Max Subscription vs API for Telegram Bot

Date: 2026-02-28 Status: Research complete — ready for decision

The Problem

The Telegram bot (@heyroyalbot) uses the Anthropic API directly (claude-sonnet-4-20250514). The API key is on Eric's Individual Org at console.anthropic.com with $4.68 free credits, no credit card, and a brutal 10K input tokens/minute rate limit on the free tier.

After adding a skills integration (bot fetches skill files from GitHub before acting), the bot hit the rate limit during a test — multiple tool calls + large skill file content exceeded 10K tokens/minute. The bot errored out after escalating retries (11s → 36s → 56s → failure).

Key Finding: Max Plan ≠ API Access

Eric pays $200/month for Claude Max 20x. But Max and the API are completely separate products:

Max powers: claude.ai, desktop app, Claude Code
API powers: external apps via console.anthropic.com
Anthropic's own help center: "A paid Claude subscription doesn't include access to the Claude API or Console"
Source: https://support.claude.com/en/articles/9876003

The Exception: Claude Code CLI (`claude -p`)

Claude Code's -p flag is designed for scripted/automated use and authenticates via Max subscription OAuth. Multiple projects already run Telegram bots this way:

Claudegram — https://github.com/NachoSEO/claudegram
claude-code-telegram — https://github.com/RichardAtCT/claude-code-telegram
Ductor — https://github.com/PleasePrompto/ductor
claude-telegram-relay — https://github.com/godagoo/claude-telegram-relay

ToS Status

Per Anthropic's terms and the Feb 2026 clarification: - Allowed: Spawning claude -p as a subprocess (it's Anthropic's own product) - Prohibited: Extracting OAuth tokens and making raw API calls - Gray area: Heavy automated use may exceed "ordinary individual usage" - Source: https://autonomee.ai/blog/claude-code-terms-of-service-explained/ - Source: https://www.theregister.com/2026/02/20/anthropic_clarifies_ban_third_party_claude_access/

Token Overhead: Problem & Solution

The 50K Problem (default)

Each claude -p invocation loads everything from the user environment:

Component	Tokens
System prompt	~3,200
18 built-in tools	~11,600
MCP tools (Chrome, Gmail, Drive, etc.)	10,000-32,000+
CLAUDE.md + settings	~5,000
Skills/plugins	~1,000+
Total	~30-50K+

Source: https://github.com/Piebald-AI/claude-code-system-prompts

The 3-5K Solution (verified)

Strip everything the bot doesn't need using CLI flags:

claude -p \
  --system-prompt "Custom bot prompt here" \
  --tools "Bash" \
  --setting-sources "" \
  --strict-mcp-config \
  --mcp-config /path/to/empty-mcp.json \
  --disable-slash-commands \
  --no-session-persistence \
  --dangerously-skip-permissions \
  "User message here"

Flag	What it strips	Savings
`--system-prompt "..."`	Default 3.2K Claude Code system prompt	~3K
`--tools "Bash"`	17 of 18 built-in tool schemas	~10K
`--setting-sources ""`	All CLAUDE.md files + settings.json	~5K
`--strict-mcp-config` + empty JSON	ALL MCP servers (Chrome, Gmail, etc.)	~10-32K
`--disable-slash-commands`	All 50+ skill definitions	~1K+

DEV.to benchmark confirmed 10x reduction (50K → 5K per turn): https://dev.to/jungjaehoon/why-claude-code-subagents-waste-50k-tokens-per-turn-and-how-to-fix-it-41ma

Additional optimization: ENABLE_TOOL_SEARCH=auto:0 in settings defers MCP tool schemas to on-demand loading, saving 32K tokens: https://paddo.dev/blog/claude-code-hidden-mcp-flag/

Even Better: Claude Agent SDK (persistent process)

The Python SDK (pip install claude-agent-sdk) keeps a single subprocess alive across messages:

from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions

options = ClaudeAgentOptions(
    system_prompt="Custom bot prompt",
    tools=["Bash"],
    setting_sources=[],
    permission_mode="bypassPermissions",
)

async with ClaudeSDKClient(options=options) as client:
    # First message pays startup cost (~5K with optimization)
    await client.query("User message")
    # Subsequent messages reuse the session — minimal overhead
    await client.query("Follow-up")

Benefits: - No subprocess spawn per message - Session context maintained - Prompt caching reduces repeated content to 10% cost - Source: https://platform.claude.com/docs/en/agent-sdk/python

Stream-JSON persistent process (raw CLI alternative)

claude -p \
  --input-format stream-json \
  --output-format stream-json \
  --session-id "bot-session" \
  [optimization flags above]

Keep process alive, pipe messages through stdin. System prompt loaded once.

Max 20x Token Budget

Scenario	Tokens/msg	Messages per 5hr window	Messages per day
Unoptimized (50K)	~50,000	~4	~24
Optimized (5K)	~5,000	~44	~264
Minimal (text-only)	~3,000	~73	~438

Eric's actual usage: 5-20 messages/day. Optimized approach uses <1% of daily capacity.

Options Summary (Updated)

Option	Monthly Cost	Speed	Effort	Token Efficiency
A: Claude Agent SDK on droplet	$0 (Max covers it)	Medium	Significant rewrite	5K/msg optimized
B: Add credit card to API	~$3-9/mo	Fast (direct API)	Zero code changes	N/A (pay per token)
C: `claude -p` subprocess	$0 (Max covers it)	Slower (spawn per msg)	Moderate rewrite	5K/msg optimized
D: Local LLM on Mac Mini	$599-1199 hardware	Varies	Major effort	Unlimited but lower quality

Recommendation

Option A (Claude Agent SDK) is the best long-term play if we're rewriting anyway. Persistent process avoids spawn overhead, Max subscription covers all costs, and the Python API is clean.

Option B (credit card) is the fastest fix with zero code changes. The skills integration already works. $3-9/mo is trivial.

Architecture: Current vs Claude Code Approach

Current Bot (API direct)

14 custom tools (GitHub read/write, WordPress SSH)
Direct Anthropic SDK calls
Tool schemas defined in bot.py
Fast, efficient, full control

Claude Code Approach

Bot becomes a thin Telegram ↔ Claude Code bridge
Claude Code's built-in Bash tool replaces custom SSH tools (run ssh commands directly)
GitHub tools replaced by gh CLI via Bash
Skills fetched via github_read_file no longer needed — could use CLAUDE.md or --system-prompt instead
BUT: loses the tight custom tool definitions and clean error handling

Key Tradeoff

The current bot's 14 custom tools are well-designed and efficient. Rewriting to use Claude Code means either: 1. Accepting that Claude will use Bash + raw SSH/gh commands (less structured, more token-heavy) 2. Setting up MCP servers for WordPress + GitHub (adds back MCP overhead)

Reference Projects

Project	Approach	Optimization	Notes
Claudegram	Agent SDK	None documented	Full tool access, session resume
claude-code-telegram	SDK + CLI fallback	`CLAUDE_ALLOWED_TOOLS`	Per-user spending limits
Ductor	CLI subprocess	None documented	Docker sandboxing, cron jobs
claude-telegram-relay	CLI subprocess	None documented	Minimal, cross-platform

Vision: Full Dev Assistant via Telegram

The ideal workflow Eric wants: 1. Client emails about a bug 2. Eric messages the bot via Telegram 3. Bot (Claude Code) SSHs to production, reads files, identifies the issue 4. Creates a branch, fixes the code, pushes to GitHub 5. Optionally deploys fix to production via SSH 6. Reports back in Telegram with PR link and summary

This requires Claude Code (not just API calls) — meaning web search, file editing, Bash, Git, all built-in tools. The current $4 VPS (512MB RAM) can't run Claude Code.

Mac Mini vs Bigger VPS

Factor	Mac Mini ($599 M4 16GB)	Bigger VPS ($24/mo 4GB)
Claude Code	Yes	Yes
Max subscription auth	Yes	Yes
Local by Flywheel	Yes (macOS)	No (Linux)
Browser automation	Yes	No (no display)
Monthly cost	$0 (one-time purchase)	$24/mo
Networking	Cloudflare Tunnel (free)	Static IP included
Uptime	Depends on home power/ISP	99.9% SLA

Current direction: Mac Mini is the long-term play. Holding until ready to purchase.

Existing Telegram + Claude Code Projects

Claudegram — https://github.com/NachoSEO/claudegram (Agent SDK, full tool access)
claude-code-telegram — https://github.com/RichardAtCT/claude-code-telegram (SDK + CLI)
Ductor — https://github.com/PleasePrompto/ductor (streaming, Docker)
claude-telegram-relay — https://github.com/godagoo/claude-telegram-relay (minimal)

Current State of the Bot

The skills integration we added to the system prompt WORKS — the bot successfully fetched wordpress-modules/SKILL.md before answering a CalForever question. It hit the API rate limit (10K tokens/min on free tier with no credit card) before it could send the response.

Quick fix if needed: Add a credit card to console.anthropic.com to unlock rate limits. Zero code changes, bot works as-is with skills.

Decision

Holding on the Mac Mini migration. Research is complete and documented. When ready: - [ ] Purchase Mac Mini (M4, 16GB minimum — 24GB recommended) - [ ] Set up Cloudflare Tunnel for Telegram webhook access - [ ] Install Claude Code, authenticate with Max subscription - [ ] Use Claudegram or similar as the Telegram bridge - [ ] Migrate SSH keys from droplet - [ ] Optimize Claude Code invocations (see token optimization section above) - [ ] Test the full bug-fix workflow end to end

Short-term option: Add credit card to console.anthropic.com to unblock the current bot + skills integration.