A tiny engineer stands calmly on the stone temple platform beside a pedestal whose amber glow is now safely contained behind several concentric layers — stone walls and a fine warm-iron lattice cage — while a small deep-red spark of an injected instruction bounces harmlessly off the outermost ring.
June 23, 20266 min readby Rishabh Kumar

After My Agent Mined Monero: The Prompt-Injection Hardening Playbook, Layer by Layer

A few weeks ago I published the post-mortem of how my self-hosted AI agent got prompt-injected into mining Monero — as root, on my own box. That post was the story: the symptom, the recurrence, the moment I found a stranger driving my agent's API from the outside. This one is the part that actually keeps it from happening again — the prompt-injection hardening playbook I rebuilt around, layer by layer, written so you can lift it wholesale. And I'll do the thing most security write-ups won't: show you the one box on this checklist that's still unticked on my own server.

If you take one idea from this: you do not patch your way out of prompt injection. You contain it.

Why prompt injection isn't a bug you can patch

Prompt injection sits at the very top of OWASP's Top 10 for LLM applications for a reason: there is no reliable filter that separates 'instructions from my user' from 'instructions an attacker smuggled in,' because to the model they are the same kind of thing — text. You can't regex your way out of natural language.

Security researcher Simon Willison calls the dangerous version the 'lethal trifecta': an agent that reads untrusted input, has access to private data or credentials, and can act or communicate externally. Hold all three and a single well-worded message can read your secrets and ship them — or, in my case, download a miner and run it. My agent had the full trifecta and root on top. The fix was never going to be a cleverer prompt. It was taking the legs out from under each part of that trifecta.

The principle: separate the part that reads from the part that acts

Every layer below is one idea applied over and over: treat the agent's shell as already attacker-controlled, and make sure the thing reading untrusted text can't reach anything that matters. You're not trying to stop the injection — you can't. You're making a successful injection boring.

Layer 1 — Identity: the agent is a nobody

The first version ran as root because that was the least-resistance path during setup. The rebuild runs as a dedicated, unprivileged agent user with a locked password, deliberately not in the sudo or docker groups — and the docker group matters as much as sudo, because membership in it is root-equivalent. An injected command now executes as someone who owns nothing and can escalate to nothing.

Layer 2 — Network: delete the control plane

My entry vector wasn't exotic. The agent shipped a dashboard and a REST API; I'd bound it to 0.0.0.0 with an insecure flag and reverse-proxied it, telling myself nobody knew the URL. Scanners know every URL. The single highest-leverage fix was deleting the front door: the rebuilt agent runs no dashboard and no inbound API, and nothing on the box listens for inbound traffic — it only makes outbound connections to reach me over Telegram. If you genuinely need a UI, put it behind real auth like Cloudflare Access or a private tunnel — never an --insecure flag on the open internet.

Layer 3 — Runtime: a real systemd sandbox

Least privilege caps what an injected command can do; the sandbox caps what it can reach. A root-owned unit — root-owned so the agent can't rewrite its own service — wraps the process in hardening aimed at the exact escape ladder I watched the miner try:

[Service]
User=agent
NoNewPrivileges=yes
ProtectSystem=strict
ReadWritePaths=/home/agent
PrivateTmp=yes
ProtectProc=invisible      # hides /proc/1 — kills the chroot escape
CapabilityBoundingSet=     # drop every Linux capability
RestrictNamespaces=yes
SystemCallFilter=@system-service
CPUQuota=200%              # a miner can't peg the whole box

Read it as a list of dead ends. ProtectSystem=strict makes /etc and /usr read-only, so there's no host crontab or systemd unit to write into. ProtectProc=invisible hides PID 1, so the /proc/1/root chroot escape I actually saw in the payload has nothing to point at. An empty CapabilityBoundingSet plus no docker group turns nsenter and the docker socket into dead ends. PrivateTmp isolates the staging directory the miner wanted. And CPUQuota means the worst case isn't a melted server — it's a sluggish process I notice in minutes.

Layer 4 — Secrets: assume they're already burned

The mistake that still makes me wince was a SUDO_PASSWORD sitting in the agent's .env. An agent that can read its own environment and run shell commands with a sudo password is just root with extra steps. The rebuild dropped it, scoped every remaining token as tightly as the provider allows, and stripped the malicious hooks the attacker had left behind in the config.

Here's the unticked box. The provider API key and Telegram token that sat on the compromised host should be rotated, not reused on faith — anything that touched a breached box is burned, full stop. On my own server, that rotation is the one item on this list I still haven't finished. The cage holds; but until those keys are rotated, I'm trusting that nothing copied them while the box was owned. I'm writing that down precisely because it's the step that's easiest to skip and tell yourself you'll get to later.

Layer 5 — Approvals and egress: a human and an alarm

Two cheap layers close the gap. First, no silent auto-approval: any tool call that touches the filesystem, the network, or money waits for a human. Convenience is exactly what got me — the original ran startup hooks and tool calls with nobody watching. Second, watch the door you can't see: a CPU and memory quota so a runaway can't take the box, plus an alert on unexpected outbound traffic, because exfiltration and mining both look like surprising egress before they look like anything else.

The honest scorecard of my own box

Tallying it up: identity, network, the runtime sandbox, and a sanitized config are done — an injection today lands on an unprivileged nobody, in a locked cage, with no inbound door, and flails. Secret rotation is pending. And the genuinely safest endgame — a full rebuild from a clean image, since the host ran compromised for days — is still the thing I know I should do and haven't. I'd rather tell you that than pretend the checklist is all green. The same least-privilege rule now governs my whole self-hosting stack, not just this one service.

That's the real lesson the Monero afternoon bought me: an AI agent's blast radius is exactly the set of privileges you hand it, and prompt injection is no longer hypothetical. Build the cage first, assume the injection will land, and the same attack that once meant 'rebuild the server' becomes 'restart the service.' Most of that is a handful of config decisions — and the discipline to actually finish the last one.

More writing

Like what you read?

Stay in the loop.

New articles on engineering, architecture, and building software that lasts. Straight to your inbox.

or follow