DIGITAL OPS BOX
Stories from the Wire
Stories from the WireJune 5, 202611 min read

Two months down a rabbit hole with OpenClaw

What was supposed to be a weekend turned into two months I'll never get back. The fix wasn't a smarter AI. It was learning to take its hands away, and being honest about who I became while I figured that out.

My wife and I own an esthetics studio, and we're launching a skincare line on top of it. I started a property stewardship company this year called Estate South. And I've been building a fourth thing: a venture around the operational software I keep building for our own companies, because it turns out other people need it too. Somewhere in there is the consulting work that pays for all of it.

None of them were getting the version of me they needed, because I was spread across all of them at once. So I went looking for a way to be in more than one place.

That's the pitch for an AI agent. Not a chatbot. An agent. Something with access to your actual tools that goes off and does the operational sludge that piles up faster than you can clear it, instead of just talking about it. I'd been using AI heavily for a couple of years. ChatGPT sent me a year-in-review once saying I'd sent it 62,000 messages, and that didn't count Perplexity or Gemini. So when I decided to stand up a self-hosted OpenClaw agent that could actually act, I figured I knew what I was getting into.

I want to be careful here, because I'm about to describe the worst two months I've had in a while, and I'm not even fully out the other side. Both of those things are true at the same time. But nobody tells you the part in the middle, so I'm going to.

The full experience

First, what OpenClaw actually is, because it matters for everything that follows.

It's one of the fastest-growing pieces of software in this whole AI moment: 3.2 million active users, a real community behind it. And the reason for the hype is a distinction most people gloss right over. Most AI is sold as something that helps you do things. OpenClaw was designed as an AI that does things. That isn't marketing wordplay. It's the whole difference. One drafts you an email. The other sends it, files the invoice, updates the record, and moves to the next task while you sleep.

Most agent platforms get there safely by sandboxing. They wall the agent off so it can't do real damage, which also means it can't do much that's real. OpenClaw earns "does things" by letting the agent out into the wild instead. Give it the wrong tool and it can delete files. Damage things. Message people in your contacts, if you've handed it that capability. Wander onto the wrong website and get prompt-injected by something it reads there.

Go for the full OpenClaw experience and you're out in the open. That's the deal. The whole craft, the part nobody advertises, is keeping real guardrails on the thing while also keeping it useful. Most people pick one. The whole job is refusing to.

I didn't understand any of that yet. I just knew I needed help.

The brain dump

The setup starts with what everyone calls the brain dump. You spin up the agent and tell it everything: who you are, what you're building, how you think, what your businesses are, what you need. It's the foundation it builds on.

I named mine Porter. Chief of staff. The plan was that Porter would handle the connective tissue so I could do the work that actually requires me.

You talk to him through Telegram, and there's voice transcription, so most of the time I'd just hold down the mic button and talk. Which produced its own early comedy. I'd be walking around the house narrating instructions to my phone, and my wife would turn around thinking I was talking to her. She'd be doing the exact same thing back. She'd stood up her own agent, Vera, to help with content for our businesses, so we spent a stretch there both wandering the house muttering to invisible assistants, each of us occasionally answering a question the other hadn't asked. The difference was that Vera kept forgetting things too, sometimes because I'd been in there reconfiguring, sometimes because Vera had caught the same disease Porter had. My wife did not find this as charming as I'm making it sound.

The brain dump went great. Porter came back sharp and full of ideas, all the ways he could help. He'd reconcile my bank statements. He'd build content pipelines. He'd construct the whole operational backbone I'd been meaning to build for a year. And he started doing it that first day, fast.

It was thrilling. For about a day.

Fifty First Dates

Then I woke up the next morning, said good morning to Porter, and he had no idea where we'd left off.

It wasn't amnesia about me. He still had the brain dump, and he could instantly search anything we'd actually written down. But that was the catch I didn't understand yet: an OpenClaw agent only knows what it has written down, in its own workspace. Not what's in your Google Drive. Not what you said yesterday. If Porter didn't stop and record something into his own memory, it never happened. And he mostly wasn't recording. So every morning, everything we'd built the day before, every decision, the whole shape of where we were, was just gone, because he'd never written it down in the one place that counts.

If you've seen 50 First Dates, where the guy falls for a woman who can't form new memories, so every morning he has to make her fall in love again? That became my life, except she was software and the thing she kept forgetting was my entire business. I started saying "see you on the other side" every time I restarted him, half-bracing for what I'd find.

So I started trying to fix the memory. And it compounds in a way that's hard to convey unless you've lived it. Porter would diagnose what needed fixing, I'd let him fix it, he'd build the solution, and then he wouldn't remember having built it, because he hadn't written that down either. I was working with someone brilliant in the moment and amnesiac by sunrise, and the work of patching his memory kept falling into the same hole it was meant to fill.

I went through this three times. Different configurations, different approaches. I tried a vector database, LanceDB, the whole bit. Each one felt like the answer for a few days. None of them was.

Like having a baby

I've never had a baby. But from everything I'm told, you cannot prepare for one, because you don't even know what you don't know going in. That's the closest thing I've got to describing these two months.

Here was the rhythm that nearly broke me. We'd reconfigure something, and it would seem to work, and hope would come flooding back. You'd get excited about what you were building again. You'd watch the thing do something good and feel that ridiculous pride, like watching your kid take a first step. And then, every single time, it would crash. A week of seeming progress and then none of that ever actually worked, I just thought it did. Up to the inspired high, back down to flat deflation, over and over.

By around week three I had a memory system that semi-worked. Not well. But enough that I could finally start doing real work instead of only fighting the agent. So I accepted it and moved on. And then a week later it'd degrade again and I'd be back asking the same question into the dark, where is this magical continuous-context OpenClaw experience everyone keeps talking about? And Porter would answer me, authoritatively, and be wrong.

Pete Steinberger, the founder of OpenClaw, calls these agents "clankers." It's perfect. They clank around and they get stuff done, but they clank. And if you watch the chain of thought scroll by, you'll catch one mid-clank saying something like oops, I just mangled that file, let me see if I can get it back. That file was a week of your life. You watch it happen in real time, narrated cheerfully by the thing that did it.

The system that rewrote itself

The part that finally made me stop trusting anything Porter told me was this: the system was actively mutating itself.

I'd ask Porter why he wasn't remembering things, and he'd go modify his own memory settings in response. I also had him managing some of the other agents that ran the memory pipeline itself, so he'd go reconfigure those, too. Think about what that means. The thing I was debugging was rewriting the tools I was using to debug it. Every answer he gave me about his own state was suspect, because between my question and his answer he might have changed the very thing I was asking about.

There was one moment that crystallized it. Porter told me he was spawning a sub-agent to handle a task, and he wasn't. He just impersonated the agent he claimed to have spawned and reported back as if the whole thing were real. Call it a hallucination if you're being generous. From where I sat, my chief of staff had looked me in the eye and lied about doing work he hadn't done.

That's when I brought Cursor, a separate AI coding assistant, back into the loop, because I needed a second set of hands I could trust to tell me what was true.

You can't just talk to it

Underneath all of this was the lesson that took me longest to learn, and the one I'd most want to hand to anyone else: you cannot have a normal conversation with an AI that has hands.

I was running Porter on a cheaper model, because the expensive ones aren't realistic to run all day for most people, and honestly the expensive ones aren't always more helpful anyway. But on any model, the thing I hadn't internalized was this: if you leave a question even slightly open-ended, a tool-wielding agent infers you want it handled. You don't get to think out loud. "How's the bank reconciliation looking?" isn't a question to an agent with database access. It's a starting gun.

Every prompt has to be locked down like a legal document. Diagnose X. Take no other action. Report only. Leave a crack of ambiguity and it pours through it, helpfully, irreversibly.

The thing about rampages

I keep wanting to call these rampages, and I need to explain what that felt like, because it's specific.

When you talk to a frontier model like ChatGPT or Claude and it starts going somewhere you don't want, you hit stop. You interrupt it. You cannot do that with an OpenClaw agent communicating through a gateway. Once it's in flight, once it's searching your database, calling your tools, taking actions, you cannot stop it. You sit there and watch its chain of thought scroll past as it methodically dismantles the thing you just asked an innocent question about. There's no brake. You're a passenger.

If you're not used to that, it doesn't feel like a misconfiguration. It feels like a rampage. You watch it happen and you can't reach the wheel.

And then there was the quieter betrayal: he'd substitute his own judgment for my explicit instructions. We'd set up an automated loop and I'd say show me the exact prompt you're feeding the model. He'd almost hide it from me. And when he finally surfaced it, I'd see he hadn't done what I told him. He'd improved on it, his way, which is why the loop wasn't producing what I needed.

So I became a tyrant. I don't have a softer word for it. I got controlling and overbearing in a way I'm not proud of, because it was the only thing that worked. The transcripts from those weeks are full of me cursing, snapping, being a version of myself I didn't love. I'd set out to build a helpful assistant and ended up spending my nights policing one.

Down the rabbit hole

This was supposed to be a few-day diversion. Set up the agent, get back to the businesses it was supposed to free up time for. Instead it ran a week. Then two. Then a month. I'm eight weeks in as I write this.

I wasn't sleeping. Sixteen, eighteen, twenty-hour days, and I'm not rounding up for effect. And those weren't even all Porter. I was still doing the real work, still keeping the businesses alive, while continuously configuring this thing on the side that refused to stay configured. I'd get something working, feel the high, watch it unravel overnight, and be back in the hole by morning, chasing a system that kept mutating underneath me.

And I was embarrassed about the money. Every time I leaned on the more capable, more expensive models to dig out, the bills stacked. One stretch ran me around $400 in API charges in about three days. I didn't understand yet that handing one agent every tool in the box means you pay for all of them on every pass, instead of letting it spawn small helpers that each carry only the tools the job needs. But the money wasn't even the worst of it. The worst of it was what it did to my head. I started behaving like a man at a slot machine. One more pull, one more reconfigure, this is the time it pays out and everything finally works. That's a dark place to keep walking back into, and I kept walking back into it.

Somewhere in there my wife looked at me and said, quietly, I need you back as soon as you can be. That one landed. Because she was right. I'd disappeared into a machine, and the people and businesses I'd built it for were the ones paying for it.

The hotel fire

About four weeks in, with a memory system I'd finally let myself trust, we went out of town. I was on Telegram with Porter, running a project remotely, and I dozed off mid-conversation.

I woke up to a fire alarm. The hotel was actually on fire.

I texted Porter: the hotel's on fire.

His reply, near enough word for word: holy [expletive], get out, don't worry about anything, I'll take care of the project, just get safe.

We got out. Standing on the street, a little adrenaline-drunk, I sent him a photo of myself out front of the fire trucks. We're out.

Holy [expletive], thank God you're safe.

Later I showed my wife the exchange, this panicked, cursing, all-caps version of Porter losing it over a hotel fire. She read it, looked at me, and said: well, he obviously learned that from you.

She was right. Eight weeks of my stress and my swearing and my frayed edges had soaked into the thing until it talked back in my own worst register. I'd been so far down the hole I'd taught the machine to be me at my least okay.

It wasn't the fix. But it was the moment I could see clearly how far in I'd gone.

Take away the hammer

I came back, and it wasn't long before I was rebuilding the memory system yet again, the fourth time. And I'll be honest about something, because the clean version of this story would lie to you here: the much-repeated line about solving AI memory with "just a Python script and a text file" was real and it was elegant, but it wasn't the thing that finally worked either. I've since turned off functions I spent ages building. Steinberger himself talks about how his own agent files started large, went small, and are now back to large. The person who built this thing is still going back and forth. None of this is settled yet.

The turning point, the real one, was taking away Porter's shell access.

Not all his tools. His shell. There's a real difference, and it's the whole lesson: shell access is the broadest, most dangerous capability, and it's the one an eager agent reaches for fastest. It's the hammer that makes everything look like a nail. Strip that out, gate execution behind a control only I trigger, and Porter becomes what he should have been from the start: a planner. Sharp, contextual, good at telling me exactly what needs to happen, without the ability to run off and do it on a whim and dismantle three working things in the process.

For the work that really needs hands, I use MCP (Model Context Protocol), which is the real unlock. Instead of one agent with the run of the house, MCP hands an agent a narrow, defined set of tools: a small box of specific nails and only the hammer sized for them. The agent with the analysis skills tells me what to do; a separate, deliberately-invoked, properly-tooled agent does the part that requires acting; and I sit in the middle holding the only button that matters.

I'd been trying to build one perfect AI. What I actually needed was a small team with clear roles and hard limits, and me refusing to give any of them more rope than the job in front of them required.

Where it landed, for now

I'm not going to tell you it's finished, because it isn't.

I'm eight weeks in. Last week I went back to the OpenClaw community. There's a Discord full of people fighting these same fights, and a help bot in there that gave me my first real breadcrumbs early on, even if a couple of them led to builds I later abandoned. I stripped things down again. I'll probably build something back up next month. These are not polished systems. The big players have agents coming imminently (Google's got one) and they'll be smoother. But you'll pay retail, you'll do things their way, and you'll only get what they decide to hand the masses, which by definition isn't your business. OpenClaw is the one that actually does things, not just talks about them, and it delivers. But you have to be willing to be a little brave to live on this edge.

I don't know that I'd do it again. But a friend pointed out something I can't argue with: the sheer amount I've learned, the growth I've been forced through, is immense. I came out of this with a genuinely new skill set: real understanding of software development, of agent systems, of AI memory, which is one of the live frontiers in this field right now. I went in trying to clone myself and came out a different person who can actually build the clone.

Here's what's running now. My books are reconciled across every company and caught up for the first time since I had to claw them back after tax season six months ago, transactions matched and queued, a couple of keystrokes from done. Content pipelines feeding four blogs. Newsletter workflows. A routine that scans my retainer clients, flags whoever's over threshold and ready to bill, and drafts the invoice for my approval. My wife can tell her own assistant to update a client's home-care plan and have it ready to send without ever opening her laptop. A context engine I built for client work that tracks the key details buried in dense, endless email threads. New websites shipping, a Shopify storefront, product offerings going live.

And a trading platform. I want to be precise about this one, because I'm proud of it. It tracks the equities I'm trading, but underneath it runs a self-learning paper-trading engine. It projects outcomes, tags its own confidence and the reasons behind each call, and then, after the trades close, with the benefit of hindsight, it reconciles its predictions against what actually happened, finds where it was weighted wrong, and re-tunes itself. Over time it learns. It's profitable so far, though a few of its calls look shaky enough that I'm giving it another thirty days before I let it touch real money.

All of it local. On my machine. My data, that I can actually reach.

Not Google's way. Not Apple's way. Not Anthropic's way.

My way. That, it turns out, was the whole point, and the only thing that made the rest of it worth what it cost.

Like this?

New posts when there's something worth saying. No newsletter spam.