If you’ve been experimenting with OpenClaw and feeling like your agents are either too verbose, not following instructions, or just plain expensive, you’re not alone. I’ve spent the last month iterating on my AGENTS.md file, trying to find the sweet spot between performance, cost, and output quality. The official examples are a great starting point, but they often lead to agents that overthink or consume too many tokens. This isn’t just about prompt engineering; it’s about structuring the AGENTS.md file itself to leverage OpenClaw’s capabilities efficiently. My setup, which I’ll detail below, focuses on a multi-agent approach where each agent has a very specific, limited role, and communication is explicit.
Affiliate Disclosure: As an Amazon Associate, we earn from qualifying purchases. This means we may earn a small commission when you click our links and make a purchase on Amazon. This comes at no extra cost to you and helps support our site.
The Problem with Monolithic Agents
My initial approach, like many I’ve seen, was to create a single “Super Agent” designed to handle an entire complex task from start to finish. For example, an agent named ContentCreator would be responsible for generating an article idea, outlining it, writing sections, and then refining the whole thing. While this sounds intuitive, it quickly became problematic. The context window would balloon, leading to higher token usage and increased latency. More critically, the agent would often “forget” earlier instructions or get sidetracked, requiring extensive prompt tuning and frequent human intervention. Debugging became a nightmare because it was hard to pinpoint exactly where the agent went off track. The cost also escalated rapidly, especially when using larger models like claude-opus-20240229.
Introducing the Specialized Multi-Agent Framework
The breakthrough for me came when I realized OpenClaw excels when agents are highly specialized and tasks are broken down into granular steps. Instead of one large agent, I now use a chain of smaller, focused agents. Each agent has a single responsibility, and they pass their output to the next agent in the chain. This mirrors a traditional software pipeline and makes debugging significantly easier. If the final output is bad, I can examine the output of each preceding agent to find the bottleneck. This also allows for more flexible model selection; I can use cheaper, faster models for simpler tasks and reserve more powerful (and expensive) models for steps requiring deeper reasoning.
Here’s a snippet of my refined AGENTS.md structure:
# AGENTS.md
Agent: TaskDeconstructor
Model: claude-haiku-20240307
Temperature: 0.2
System Prompt: You are an expert task breakdown specialist. Given a user's high-level request, you will break it down into a list of specific, actionable sub-tasks. Each sub-task should be clear enough for another specialized agent to execute independently. Focus on logical sequential steps.
Instructions:
Output a numbered list of sub-tasks.
Do not add any conversational filler.
Agent: ResearchAssistant
Model: claude-sonnet-20240229
Temperature: 0.3
System Prompt: You are a diligent research assistant. Your goal is to gather relevant information for a given sub-task. You will use the available `search` tool extensively. Synthesize findings concisely.
Tools:
- search
Instructions:
Given a sub-task, use the `search` tool to find 2-3 credible sources.
Summarize the key findings relevant to the sub-task.
Cite your sources clearly.
Agent: ContentGenerator
Model: claude-sonnet-20240229
Temperature: 0.7
System Prompt: You are a creative content writer. Your task is to generate high-quality, engaging content based on the provided research and a specific sub-task. Focus on clarity, accuracy, and tone.
Instructions:
Based on the provided research and sub-task, write the content.
Ensure the content flows logically.
Maintain a consistent tone (e.g., informative, persuasive, casual).
Agent: Editor
Model: claude-haiku-20240307
Temperature: 0.1
System Prompt: You are a meticulous editor. Your job is to refine, proofread, and improve the clarity, grammar, and style of the provided content. Ensure it meets the specified requirements.
Instructions:
Review the content for grammar, spelling, punctuation, and syntax errors.
Improve sentence structure and word choice.
Ensure the content is concise and easy to read.
Check for consistency in tone and style.
The Non-Obvious Insight: Model Selection for Each Step
This is where the real cost savings and performance gains come in. The documentation often suggests picking a single model for your agent. However, with a specialized multi-agent setup, you can be much more strategic. For instance, my TaskDeconstructor uses claude-haiku-20240307. Haiku is incredibly fast and cheap, and for simply breaking down a task into a list, it performs perfectly. There’s no need for Opus here. Similarly, the Editor agent, focused on refinement and grammar, also benefits from Haiku’s speed and precision without needing a large context window or complex reasoning. Its primary job is pattern matching and correction, which Haiku handles very well.
For agents like ResearchAssistant and ContentGenerator, I opt for claude-sonnet-20240229. Sonnet strikes an excellent balance between cost and capability. The research phase often requires synthesizing information from multiple sources, and content generation needs a certain level of creativity and coherence. Opus would certainly do a stellar job, but for 90% of my use cases, Sonnet is more than sufficient and significantly cheaper per token. The key is to match the model’s capabilities to the agent’s specific function, not the overall task complexity.
Managing State and Communication
One of the biggest challenges with chained agents is ensuring smooth communication and state management. OpenClaw handles this implicitly when you chain agents in your workflow (e.g., `openclaw run TaskDeconstructor -> ResearchAssistant -> ContentGenerator -> Editor`). Each agent receives the output of the previous one as its input. However, it’s crucial to instruct each agent clearly on what to expect as input and what format its output should take for the next agent. For example, TaskDeconstructor outputs a numbered list, which ResearchAssistant is then prompted to act upon, typically iteratively.
I also use a simple convention: each agent’s output should be as “clean” as possible – no conversational filler, just the raw, processed data or content. This minimizes token usage for subsequent agents and prevents unnecessary context from accumulating. The Instructions section in each agent’s definition is paramount for this.
Limitations and When This Falls Short
This multi-agent strategy is highly effective for tasks that can be broken down into discrete, sequential steps. However, it’s not a silver bullet. If your task requires extremely deep, multi-faceted reasoning that spans across several steps and requires the agent to maintain a very complex internal state or perform highly iterative, non-linear problem-solving, a single, more powerful agent (like Opus) with a larger context window might still be necessary. For instance, deeply technical coding tasks or complex scientific simulations might push the limits of this chained approach, as the explicit context passing might become cumbersome or lose nuance.
Furthermore, this setup is most efficient on a VPS with at least 2GB of RAM, especially if you’re running multiple OpenClaw processes concurrently or dealing with very large outputs. While OpenClaw itself is lightweight, the underlying LLM calls and the processing of potentially large JSON outputs can consume memory. A Raspberry Pi, while capable of running OpenClaw for simpler tasks, might struggle with a complex multi-agent pipeline processing extensive research findings.
This setup also assumes a stable internet connection for consistent API calls. If your connection is flaky, the chained calls might fail more frequently, requiring more robust error handling in your OpenClaw scripts.
To implement this setup, start by defining your specialized agents in your AGENTS.md file, matching models to their specific roles as described. Then, chain them together in your workflow. The most direct next step is to open your existing AGENTS.md file and begin refactoring your monolithic agents into specialized, single-responsibility units, and then update your workflow script to chain them together, for example: openclaw run TaskDeconstructor -> ResearchAssistant -> ContentGenerator -> Editor --request "Write a blog post about the benefits of specialized AI agents."
Leave a Reply