Max Subscription vs API for Telegram Bot
Date: 2026-02-28 Status: Research complete — ready for decision
The Problem
The Telegram bot (@heyroyalbot) uses the Anthropic API directly (claude-sonnet-4-20250514). The API key is on Eric's Individual Org at console.anthropic.com with $4.68 free credits, no credit card, and a brutal 10K input tokens/minute rate limit on the free tier.
After adding a skills integration (bot fetches skill files from GitHub before acting), the bot hit the rate limit during a test — multiple tool calls + large skill file content exceeded 10K tokens/minute. The bot errored out after escalating retries (11s → 36s → 56s → failure).
Key Finding: Max Plan ≠ API Access
Eric pays $200/month for Claude Max 20x. But Max and the API are completely separate products:
- Max powers: claude.ai, desktop app, Claude Code
- API powers: external apps via console.anthropic.com
- Anthropic's own help center: "A paid Claude subscription doesn't include access to the Claude API or Console"
- Source: https://support.claude.com/en/articles/9876003
The Exception: Claude Code CLI (claude -p)
Claude Code's -p flag is designed for scripted/automated use and authenticates via Max subscription OAuth. Multiple projects already run Telegram bots this way:
- Claudegram — https://github.com/NachoSEO/claudegram
- claude-code-telegram — https://github.com/RichardAtCT/claude-code-telegram
- Ductor — https://github.com/PleasePrompto/ductor
- claude-telegram-relay — https://github.com/godagoo/claude-telegram-relay
ToS Status
Per Anthropic's terms and the Feb 2026 clarification:
- Allowed: Spawning claude -p as a subprocess (it's Anthropic's own product)
- Prohibited: Extracting OAuth tokens and making raw API calls
- Gray area: Heavy automated use may exceed "ordinary individual usage"
- Source: https://autonomee.ai/blog/claude-code-terms-of-service-explained/
- Source: https://www.theregister.com/2026/02/20/anthropic_clarifies_ban_third_party_claude_access/
Token Overhead: Problem & Solution
The 50K Problem (default)
Each claude -p invocation loads everything from the user environment:
| Component | Tokens |
|---|---|
| System prompt | ~3,200 |
| 18 built-in tools | ~11,600 |
| MCP tools (Chrome, Gmail, Drive, etc.) | 10,000-32,000+ |
| CLAUDE.md + settings | ~5,000 |
| Skills/plugins | ~1,000+ |
| Total | ~30-50K+ |
Source: https://github.com/Piebald-AI/claude-code-system-prompts
The 3-5K Solution (verified)
Strip everything the bot doesn't need using CLI flags:
claude -p \
--system-prompt "Custom bot prompt here" \
--tools "Bash" \
--setting-sources "" \
--strict-mcp-config \
--mcp-config /path/to/empty-mcp.json \
--disable-slash-commands \
--no-session-persistence \
--dangerously-skip-permissions \
"User message here"
| Flag | What it strips | Savings |
|---|---|---|
--system-prompt "..." |
Default 3.2K Claude Code system prompt | ~3K |
--tools "Bash" |
17 of 18 built-in tool schemas | ~10K |
--setting-sources "" |
All CLAUDE.md files + settings.json | ~5K |
--strict-mcp-config + empty JSON |
ALL MCP servers (Chrome, Gmail, etc.) | ~10-32K |
--disable-slash-commands |
All 50+ skill definitions | ~1K+ |
DEV.to benchmark confirmed 10x reduction (50K → 5K per turn): https://dev.to/jungjaehoon/why-claude-code-subagents-waste-50k-tokens-per-turn-and-how-to-fix-it-41ma
Additional optimization: ENABLE_TOOL_SEARCH=auto:0 in settings defers MCP tool schemas to on-demand loading, saving 32K tokens: https://paddo.dev/blog/claude-code-hidden-mcp-flag/
Even Better: Claude Agent SDK (persistent process)
The Python SDK (pip install claude-agent-sdk) keeps a single subprocess alive across messages:
from claude_agent_sdk import ClaudeSDKClient, ClaudeAgentOptions
options = ClaudeAgentOptions(
system_prompt="Custom bot prompt",
tools=["Bash"],
setting_sources=[],
permission_mode="bypassPermissions",
)
async with ClaudeSDKClient(options=options) as client:
# First message pays startup cost (~5K with optimization)
await client.query("User message")
# Subsequent messages reuse the session — minimal overhead
await client.query("Follow-up")
Benefits: - No subprocess spawn per message - Session context maintained - Prompt caching reduces repeated content to 10% cost - Source: https://platform.claude.com/docs/en/agent-sdk/python
Stream-JSON persistent process (raw CLI alternative)
claude -p \
--input-format stream-json \
--output-format stream-json \
--session-id "bot-session" \
[optimization flags above]
Keep process alive, pipe messages through stdin. System prompt loaded once.
Max 20x Token Budget
| Scenario | Tokens/msg | Messages per 5hr window | Messages per day |
|---|---|---|---|
| Unoptimized (50K) | ~50,000 | ~4 | ~24 |
| Optimized (5K) | ~5,000 | ~44 | ~264 |
| Minimal (text-only) | ~3,000 | ~73 | ~438 |
Eric's actual usage: 5-20 messages/day. Optimized approach uses <1% of daily capacity.
Options Summary (Updated)
| Option | Monthly Cost | Speed | Effort | Token Efficiency |
|---|---|---|---|---|
| A: Claude Agent SDK on droplet | $0 (Max covers it) | Medium | Significant rewrite | 5K/msg optimized |
| B: Add credit card to API | ~$3-9/mo | Fast (direct API) | Zero code changes | N/A (pay per token) |
C: claude -p subprocess |
$0 (Max covers it) | Slower (spawn per msg) | Moderate rewrite | 5K/msg optimized |
| D: Local LLM on Mac Mini | $599-1199 hardware | Varies | Major effort | Unlimited but lower quality |
Recommendation
Option A (Claude Agent SDK) is the best long-term play if we're rewriting anyway. Persistent process avoids spawn overhead, Max subscription covers all costs, and the Python API is clean.
Option B (credit card) is the fastest fix with zero code changes. The skills integration already works. $3-9/mo is trivial.
Architecture: Current vs Claude Code Approach
Current Bot (API direct)
- 14 custom tools (GitHub read/write, WordPress SSH)
- Direct Anthropic SDK calls
- Tool schemas defined in bot.py
- Fast, efficient, full control
Claude Code Approach
- Bot becomes a thin Telegram ↔ Claude Code bridge
- Claude Code's built-in
Bashtool replaces custom SSH tools (run ssh commands directly) - GitHub tools replaced by
ghCLI via Bash - Skills fetched via
github_read_fileno longer needed — could use CLAUDE.md or --system-prompt instead - BUT: loses the tight custom tool definitions and clean error handling
Key Tradeoff
The current bot's 14 custom tools are well-designed and efficient. Rewriting to use Claude Code means either:
1. Accepting that Claude will use Bash + raw SSH/gh commands (less structured, more token-heavy)
2. Setting up MCP servers for WordPress + GitHub (adds back MCP overhead)
Reference Projects
| Project | Approach | Optimization | Notes |
|---|---|---|---|
| Claudegram | Agent SDK | None documented | Full tool access, session resume |
| claude-code-telegram | SDK + CLI fallback | CLAUDE_ALLOWED_TOOLS |
Per-user spending limits |
| Ductor | CLI subprocess | None documented | Docker sandboxing, cron jobs |
| claude-telegram-relay | CLI subprocess | None documented | Minimal, cross-platform |
Vision: Full Dev Assistant via Telegram
The ideal workflow Eric wants: 1. Client emails about a bug 2. Eric messages the bot via Telegram 3. Bot (Claude Code) SSHs to production, reads files, identifies the issue 4. Creates a branch, fixes the code, pushes to GitHub 5. Optionally deploys fix to production via SSH 6. Reports back in Telegram with PR link and summary
This requires Claude Code (not just API calls) — meaning web search, file editing, Bash, Git, all built-in tools. The current $4 VPS (512MB RAM) can't run Claude Code.
Mac Mini vs Bigger VPS
| Factor | Mac Mini ($599 M4 16GB) | Bigger VPS ($24/mo 4GB) |
|---|---|---|
| Claude Code | Yes | Yes |
| Max subscription auth | Yes | Yes |
| Local by Flywheel | Yes (macOS) | No (Linux) |
| Browser automation | Yes | No (no display) |
| Monthly cost | $0 (one-time purchase) | $24/mo |
| Networking | Cloudflare Tunnel (free) | Static IP included |
| Uptime | Depends on home power/ISP | 99.9% SLA |
Current direction: Mac Mini is the long-term play. Holding until ready to purchase.
Existing Telegram + Claude Code Projects
- Claudegram — https://github.com/NachoSEO/claudegram (Agent SDK, full tool access)
- claude-code-telegram — https://github.com/RichardAtCT/claude-code-telegram (SDK + CLI)
- Ductor — https://github.com/PleasePrompto/ductor (streaming, Docker)
- claude-telegram-relay — https://github.com/godagoo/claude-telegram-relay (minimal)
Current State of the Bot
The skills integration we added to the system prompt WORKS — the bot successfully fetched wordpress-modules/SKILL.md before answering a CalForever question. It hit the API rate limit (10K tokens/min on free tier with no credit card) before it could send the response.
Quick fix if needed: Add a credit card to console.anthropic.com to unlock rate limits. Zero code changes, bot works as-is with skills.
Decision
Holding on the Mac Mini migration. Research is complete and documented. When ready: - [ ] Purchase Mac Mini (M4, 16GB minimum — 24GB recommended) - [ ] Set up Cloudflare Tunnel for Telegram webhook access - [ ] Install Claude Code, authenticate with Max subscription - [ ] Use Claudegram or similar as the Telegram bridge - [ ] Migrate SSH keys from droplet - [ ] Optimize Claude Code invocations (see token optimization section above) - [ ] Test the full bug-fix workflow end to end
Short-term option: Add credit card to console.anthropic.com to unblock the current bot + skills integration.