|
AI assistants are powerful. They can answer questions, summarize documents, and write code. But out of the box they can't check your phone bill, file an insurance rebuttal, or track your deadlines across WhatsApp, Slack, and email. Every interaction dead-ends at conversation. OpenClaw changed that. It is an open-source personal AI agent that crossed 100,000 GitHub stars within its first week in late January 2026. People started paying attention when developer AJ Stuyvenberg published a detailed account of using the agent to negotiate $4,200 off a car purchase by having it manage dealer emails over several days. People call it "Claude with hands." That framing is catchy, and almost entirely wrong. What OpenClaw actually is, underneath the lobster mascot, is a concrete, readable implementation of every architectural pattern that powers serious production AI agents today. If you understand how it works, you understand how agentic systems work in general. In this guide, you'll learn how OpenClaw's three-layer architecture processes messages through a seven-stage agentic loop, build a working life admin agent with real configuration files, and then lock it down against the security threats most tutorials bury in a footnote.
What Is OpenClaw?Most people install OpenClaw expecting a smarter chatbot. What they actually get is a local gateway processthat runs as a background daemon on your machine or a VPS (Virtual Private Server). It connects to the messaging platforms you already use and routes every incoming message through a Large Language Model (LLM)-powered agent runtime that can take real actions in the world. You can read more about how OpenClaw works in Bibek Poudel's architectural deep dive. There are three layers that make the whole system work: The Channel LayerWhatsApp, Telegram, Slack, Discord, Signal, iMessage, and WebChat all connect to one Gateway process. You communicate with the same agent from any of these platforms. If you send a voice note on WhatsApp and a text on Slack, the same agent handles both. The Brain LayerYour agent's instructions, personality, and connection to one or more language models live here. The system is model-agnostic: Claude, GPT-4o, Gemini, and locally-hosted models via Ollama all work interchangeably. You choose the model. OpenClaw handles the routing. The Body LayerTools, browser automation, file access, and long-term memory live here. This layer turns conversation into action: opening web pages, filling forms, reading documents, and sending messages on your behalf. The Gateway itself runs as That separation between orchestration layer and model is the first architectural principle worth internalizing. You don't expose raw LLM API calls to user input. You put a controlled process in between that handles routing, queuing, and state management. You can also configure different agents for different channels or contacts. One agent might handle personal DMs with access to your calendar. Another manages a team support channel with access to product documentation. PrerequisitesBefore you start, make sure you have the following:
How the Agentic Loop Works: Seven StagesEvery message flowing through OpenClaw passes through seven stages. Understanding each one helps when something breaks, and something will break eventually. Poudel's architecture walkthrough covers the internals in detail. Stage 1: Channel NormalizationA voice note from WhatsApp and a text message from Slack look nothing alike at the protocol level. Channel Adapters handle this: Baileys for WhatsApp, grammY for Telegram, and similar libraries for the rest. Each adapter transforms its input into a single consistent message object containing sender, body, attachments, and channel metadata. Voice notes get transcribed before the model ever sees them. Stage 2: Routing and Session SerializationThe Gateway routes each message to the correct agent and session. Sessions are stateful representations of ongoing conversations with IDs and history. OpenClaw processes messages in a session one at a timevia a Command Queue. If two simultaneous messages arrived from the same session, they would corrupt state or produce conflicting tool outputs. Serialization prevents exactly this class of corruption. Stage 3: Context AssemblyBefore inference, the agent runtime builds the system prompt from four components: the base prompt, a compact skills list (names, descriptions, and file paths only, not full content), bootstrap context files, and per-run overrides. The model doesn't have access to your history or capabilities unless they are assembled into this context package. Context assembly is the most consequential engineering decision in any agentic system. Stage 4: Model InferenceThe assembled context goes to your configured model provider as a standard API call. OpenClaw enforces model-specific context limits and maintains a compaction reserve, a buffer of tokens kept free for the model's response, so the model never runs out of room mid-reasoning. Stage 5: The ReAct LoopWhen the model responds, it does one of two things: it produces a text reply, or it requests a tool call. A tool call is the model outputting, in structured format, something like "I want to run this specific tool with these specific parameters." The agent runtime intercepts that request, executes the tool, captures the result, and feeds it back into the conversation as a new message. The model sees the result and decides what to do next. This cycle of reason, act, observe, and repeat is what separates an agent from a chatbot. Here is what the ReAct loop looks like in pseudocode: Here's what's happening:
Stage 6: On-Demand Skill LoadingA Skillis a folder containing a When the model decides a skill is relevant to the current task, it reads the full Here is an example skill definition: A few things to notice:
Stage 7: Memory and PersistenceMemory lives in plain Markdown files inside Daily logs ( Embedding-based search uses the Alright now that you have the background you need, let's install and work with OpenClaw. Step 1: Install OpenClawRun the install script for your platform: After installation, verify everything is working: These two commands do different things:
Your workspace is now set up at Every file that shapes your agent's behavior is plain Markdown. No black boxes. You can read every file, understand every decision, and change anything you don't like. Diamant's setup tutorial walks through additional configuration options. Step 2: Write the Agent's Operating ManualThree Markdown files define how your agent thinks and behaves. You'll build a life admin agent that monitors bills, tracks deadlines, and delivers a daily briefing over WhatsApp. Life admin is the right starting point because the tasks are repetitive, the information is scattered, and the consequences of individual errors are low. Define the Agent's Identity: SOUL.mdOpen Each section serves a different purpose:
These are not just suggestions. The model treats these instructions as operational constraints during every interaction. Tell the Agent About You: USER.mdOpen The key fields:
Set Operational Rules: AGENTS.mdOpen Let's walk through each section:
Step 3: Connect WhatsAppOpen A few things to configure here:
Now start the gateway and link your phone: A QR code appears in your terminal. Open WhatsApp on your phone, go to Settings > Linked Devices, and scan it. Your agent is now connected. Step 4: Configure ModelsA hybrid model strategy keeps costs low and quality high. You route complex reasoning to a capable cloud model and background heartbeat checks to a cheaper one. Add this to your Breaking down each key:
Set your API key and start the gateway: What does this cost?Real cost data from practitioners: Sonnet for heavy daily use (hundreds of messages, frequent tool calls) runs roughly \(3-\)5 per day. Moderate conversational use lands around \(1-\)2 per day. A Haiku-only setup for lighter workloads costs well under $1 per day. You can read more cost breakdowns in Aman Khan's optimization guide. Running Sensitive Tasks LocallyFor tasks involving sensitive data like medical records or full account numbers, you can run a local model through Ollama and route those tasks to it. Add this to your config: The important details:
Step 5: Give It ToolsNow let's enable browser automation so the agent can open portals, check balances, and fill forms: Two settings worth noting:
Connect External Services via MCPMCP (Model Context Protocol) servers let you connect the agent to external services like your file system and Google Calendar: This configuration does five things:
What a Browser Task Looks Like End-to-EndHere is a concrete example. You send a WhatsApp message: "Check how much my phone bill is this month." The agent handles it in steps:
The model replaces CSS selectors and brittle Selenium scripts with visual reasoning, reading what appears on the page and deciding what to click next. How to Lock It Down Before You Ship AnythingGetting OpenClaw running is roughly 20% of the work. The other 80% is making sure an agent with shell access, file read/write permissions, and the ability to send messages on your behalf doesn't become a liability. Bind the Gateway to LocalhostBy default, the gateway listens on all network interfaces. Any device on your Wi-Fi can reach it. Lock it to loopback only so only your machine connects: On a shared network, this is the difference between your agent and everyone's agent. Enable Token AuthenticationWithout token auth, any connection to the gateway is trusted. This is not optional for any deployment beyond local testing: Lock Down File PermissionsYour These permission values mean:
Configure Group Chat BehaviorWithout explicit configuration, an agent added to a WhatsApp group responds to every message from every participant. Set Handle the Bootstrap ProblemOpenClaw ships with a You can fix this by sending the following as your absolute first message after connecting: Defend Against Prompt InjectionThis is the most serious threat class for any agent with real-world access. Snyk researcher Luca Beurer-Kellner demonstrated this directly: a spoofed email asked OpenClaw to share its configuration file. The agent replied with the full config, including API keys and the gateway token. The attack surface is not limited to strangers messaging you. Any content the agent reads, including email bodies, web pages, document attachments, and search results, can carry adversarial instructions. Researchers call this indirect prompt injectionbecause the content itself carries the adversarial instructions. You can defend against it explicitly in your Audit Community Skills Before InstallingSkills installed from ClawHub or third-party repositories can contain malicious instructions that inject into your agent's context. Snyk audits have found community skills with prompt injection payloads, credential theft patterns, and references to malicious packages. Make sure you read every Run the Security AuditBefore connecting the gateway to any external network, run the built-in audit: This scans your configuration for common misconfigurations: open gateway bindings, missing authentication, overly permissive tool access, and known vulnerable skill patterns. Where the Field Is MovingNow that you have a working agent, it's worth understanding where OpenClaw fits in the broader landscape. Four distinct approaches to personal AI agents have emerged, and each one makes different trade-offs. Cloud-native agent platforms get you to a working agent the fastest because you don't manage any infrastructure. The downside is that your data, prompts, and conversation history all flow through someone else's servers. Framework-based DIY assembly using tools like LangChain or LlamaIndex gives you full control over every component. The cost is setup time: building a multi-channel agent with memory, scheduling, and tool execution from scratch takes significant integration work. Wrapper products and consumer AI assistants hide complexity on purpose. They work well within their designed use cases, but you can't extend them arbitrarily. Local-first, file-based agent runtimes like OpenClaw treat configuration, memory, and skills as plain files you can read, audit, and modify directly. Every decision the agent makes traces back to a file on disk. Your agent's behavior doesn't change because a platform silently updated its system prompt. Which approach should you pick? It depends on what your agent will access. If it summarizes your calendar, any of these approaches works fine. If it touches production systems, personal financial data, or sensitive communications, you want the approach where you can audit every decision the agent makes. ConclusionIn this guide, you built a working personal AI agent with OpenClaw that connects to WhatsApp, monitors your bills and deadlines, delivers daily briefings, and uses browser automation to interact with web portals on your behalf. Here are the key takeaways:
What to Explore Next
As language models get cheaper and agent frameworks mature, the question of who controls the agent's behavior will matter more than which model powers it. Auditability matters more than apparent functionality when your agent handles real money and real deadlines. You can find me on LinkedIn where I write about what breaks when you deploy AI at scale. |
