OpenClaw Token Usage: How to Monitor and Reduce API Costs

If you’re running OpenClaw for your daily batch processing or real-time inference tasks, you’ve likely seen those API bills climb. It’s easy to get lost in the abstraction of “tokens” until the monthly statement hits. The problem isn’t just about the raw number of requests; it’s about the efficiency of those requests. Without a clear picture of token usage per model, per task, and over time, you’re effectively flying blind when it comes to cost optimization.

Affiliate Disclosure: As an Amazon Associate, we earn from qualifying purchases. This means we may earn a small commission when you click our links and make a purchase on Amazon. This comes at no extra cost to you and helps support our site.

Understanding OpenClaw’s Token Reporting

OpenClaw, by default, provides some basic token usage information in its verbose logs, but it’s often not granular enough for real cost analysis. When you execute a command like openclaw process --config my_batch_job.yaml, you’ll see summary lines if your log level is set appropriately. For instance, an entry might look like this:

[INFO] [2024-07-23 10:35:12] [claude-haiku-4-5] Request complete. Prompt: 1200 tokens, Completion: 300 tokens. Total: 1500 tokens. Cost: $0.00315.

This is useful for individual requests, but aggregating this across hundreds or thousands of calls, potentially with different models, is manual and error-prone. The real challenge comes when you want to see which specific parts of your prompts are consuming the most tokens, or if a particular task is consistently over-budget. OpenClaw’s built-in metrics, while present, aren’t exposed in a way that allows for easy, real-time aggregation and visualization without some extra plumbing.

Setting Up Custom Token Monitoring

To get a better handle on costs, we need to capture this data systematically. OpenClaw offers a callback mechanism that can be leveraged. We’ll set up a simple local Redis instance to store token data. This isn’t production-grade telemetry, but for a single VPS or a small cluster, it’s incredibly effective and lightweight. First, ensure Redis is installed and running on your system:

sudo apt update
sudo apt install redis-server
sudo systemctl enable redis-server
sudo systemctl start redis-server

Next, we need to configure OpenClaw to send token events to a custom script. Create a new Python file, say ~/.openclaw/callbacks/token_logger.py:

import json
import redis
import os
from datetime import datetime

# Initialize Redis client
REDIS_HOST = os.getenv('OPENCLAW_REDIS_HOST', 'localhost')
REDIS_PORT = int(os.getenv('OPENCLAW_REDIS_PORT', 6379))
REDIS_DB = int(os.getenv('OPENCLAW_REDIS_DB', 0))

try:
    r = redis.Redis(host=REDIS_HOST, port=REDIS_PORT, db=REDIS_DB)
    r.ping() # Test connection
except redis.exceptions.ConnectionError as e:
    print(f"ERROR: Could not connect to Redis: {e}")
    r = None # Disable Redis logging if connection fails

def on_request_complete(data):
    """
    Callback function executed by OpenClaw after each model request.
    Data contains: model_name, prompt_tokens, completion_tokens, total_tokens, cost, task_id (if available), timestamp.
    """
    if r is None:
        return

    try:
        log_entry = {
            "timestamp": datetime.now().isoformat(),
            "model_name": data.get("model_name"),
            "prompt_tokens": data.get("prompt_tokens"),
            "completion_tokens": data.get("completion_tokens"),
            "total_tokens": data.get("total_tokens"),
            "cost": data.get("cost"),
            "task_id": data.get("task_id", "unknown"), # Optional, useful for grouping
            "session_id": data.get("session_id", "default") # Optional, useful for multi-session runs
        }
        # Store as a list, or as a sorted set with timestamp for easier range queries
        r.rpush(f"openclaw:tokens:{log_entry['model_name']}", json.dumps(log_entry))
        r.rpush("openclaw:tokens:all", json.dumps(log_entry)) # Global log
        # You could also use an HASH for daily summaries: r.hincrby("openclaw:daily_tokens:2024-07-23", model_name, total_tokens)
    except Exception as e:
        print(f"ERROR in token_logger callback: {e}")

Now, tell OpenClaw to use this callback. Add this to your ~/.openclaw/config.json:

{
  "callbacks": {
    "on_request_complete": [
      "~/.openclaw/callbacks/token_logger.py:on_request_complete"
    ]
  },
  "redis_host": "localhost",
  "redis_port": 6379
}

The redis_host and redis_port entries are custom config parameters for our callback script; OpenClaw itself doesn’t use them directly, but our script can pick them up via os.getenv or by reading the config (though os.getenv is often simpler for callback parameters). Every time OpenClaw completes a request, it will call on_request_complete in our Python script, which will push the token data to Redis.

Non-Obvious Insight: Model Selection and Prompt Engineering

Here’s the kicker: the default models recommended in some OpenClaw tutorials, like gpt-4o or claude-3-opus-20240229, are often overkill for 90% of tasks. We’ve found that for summarization, entity extraction, or even light classification, models like claude-haiku-4-5, gpt-3.5-turbo-0125, or even gemini-pro offer a significantly better cost-to-performance ratio. A typical claude-haiku-4-5 interaction might cost $0.00025 per 1K tokens, whereas claude-3-opus could be $0.015 per 1K, a 60x difference. Monitor your Redis logs. If you consistently see gpt-4o or opus models used for simple tasks, you have a prime optimization target.

Another crucial insight comes from prompt engineering. Often, developers use very chatty or verbose system prompts, or include excessive examples in few-shot prompts. For instance, providing 5 examples when 2 would suffice, or using a paragraph-long system instruction when a concise sentence would achieve the same result. The data in Redis will help you identify which models are used for which tasks. If your summarize_document task is pulling 5000 prompt tokens and only 200 completion tokens, investigate the prompt for unnecessary context or overly verbose instructions. Try to front-load important information and be as direct as possible. Avoid “fluff” in your prompts. Small changes here can have massive ripple effects on cost.

Limitations

This Redis-based monitoring setup is fantastic for understanding your own OpenClaw usage on a single machine or a small, self-managed cluster. It’s designed for quick, actionable insights without the overhead of a full-blown monitoring stack like Prometheus and Grafana. However, it does have limitations:

  • No long-term persistence: While Redis is persistent, it’s not designed for petabytes of historical data. For year-long trends or compliance, you’d want to periodically dump this data to a data warehouse or object storage.
  • Manual aggregation: You’ll need to write simple Python scripts to query Redis and aggregate the data into daily, weekly, or per-task summaries. It doesn’t give you a fancy dashboard out-of-the-box.
  • Scalability: For very high-throughput OpenClaw deployments across many machines, a single Redis instance might become a bottleneck. In such cases, a centralized logging solution like ELK or a dedicated metrics pipeline would be more appropriate.
  • Resource usage: While lightweight, Redis does consume some RAM. On a tiny VPS (e.g., a 512MB RAM instance), running OpenClaw, your application, and Redis might push memory limits. This setup is generally fine for VPS instances with at least 2GB RAM. Raspberry Pi 4 (4GB or 8

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *