AI

What Makes AI Agents Useful Is Exactly What Makes Them Dangerous

OpenClaw's security failures weren't bugs in one project. They're architectural to the entire category of AI agents. Here's where real solutions are emerging.

I run AI agents daily. Claude Code sits in my terminal with access to my filesystem, my shell, my git credentials. I’ve built custom agents that read my email and draft responses. These tools have changed how I work in ways I wouldn’t reverse.

I also understand, on a visceral level, why people handed OpenClaw the keys to everything. You’re three configuration steps into setting up an autonomous assistant and it asks for your Outlook credentials. You’ve already given it shell access. What’s one more thing? The slope isn’t just slippery, it’s seductive, because each incremental permission makes the agent meaningfully more useful. Twenty-five years in this industry and I still feel that pull.

This is the part most security commentary on OpenClaw misses. The people who connected their corporate credentials to a personal AI agent weren’t being reckless. They were being rational, in the moment, one small step at a time.

What happened with OpenClaw (briefly)

In late January 2026, an open-source AI agent called OpenClaw went viral, crossing 100,000 GitHub stars within days of its rebrand. It ran locally, controlled your computer autonomously, and extended its capabilities through a community plugin marketplace.

Then everything broke. A CVSS 8.8 remote code execution vulnerability. Over 135,000 instances exposed to the public internet. 20% of marketplace plugins found to be malicious. Commodity malware updated within weeks to specifically harvest OpenClaw’s credential store. Kaspersky found 512 vulnerabilities. Cisco called it “Exhibit A in how not to do AI security.”

The instinct is to treat this as a story about one project with bad security practices. It isn’t. OpenClaw’s creator could have done everything right and the hardest problems would remain.

The tension that doesn’t resolve

An AI agent that can’t access your data, communicate on your behalf, or interact with external services is a chatbot. The moment you give it those capabilities, you’ve created something that combines access to private information, the ability to act externally, and exposure to untrusted content. Security researcher Simon Willison called this combination the “lethal trifecta.” The name is dramatic but the observation is accurate: these properties together create an attack surface that doesn’t map to existing security models.

This isn’t a flaw to be patched. It’s the architecture of usefulness. Every agentic AI tool worth using will have these properties. The question isn’t how to eliminate them but how to manage the risk they create.

Why the obvious responses fail

Lock it down. Run the agent in a sandbox with restricted filesystem access and no network connectivity. You’ve built an expensive chatbot. Sandboxing limits blast radius when something goes wrong, and that’s worth doing. But it doesn’t prevent the agent from being hijacked within its own scope, and an agent with no scope isn’t an agent.

Filter the inputs. Build classifiers that scan for malicious prompts before they reach the agent. In October 2025, researchers tested 12 published prompt injection defenses that all claimed near-zero attack success rates. Adaptive attacks broke every one of them above 90% success. This is the virus signature problem all over again: defenders need perfect coverage, attackers need one bypass, and the attack surface is natural language with infinite variation.

Tell the model to be careful. This is the response that should be obviously inadequate but keeps getting proposed. Adding “ignore malicious instructions” to a system prompt is just adding more tokens to a context window where all tokens receive equal attention. Transformers lack any architectural mechanism to distinguish trusted instructions from untrusted data. The instruction to ignore injections and the injection itself are processed identically. You’re asking a lock to also be a key.

Where real progress is happening

None of these problems are solved. But actual work is being done, by people building real systems, with measurable results.

Agents are getting their own identities. Today, most AI agents operate with their user’s credentials: your OAuth token, your API key, your session cookie, your access. When the agent is compromised, the attacker has everything you have. All three major cloud providers have shipped or previewed agent-specific identity systems in the last six months. Microsoft’s Entra Agent ID creates a distinct identity type for agents with its own authentication methods, conditional access policies, and audit trail. AWS Bedrock AgentCore Identity provides native OAuth support for agent-to-service authentication. Google Cloud now auto-provisions agent workloads with their own x509 certificates through Workload Identity Federation.

The shift is conceptual as much as technical: treating an AI agent as a non-human identity with its own scoped permissions rather than a proxy wearing its user’s badge. NIST opened a concept paper in February for a practical implementation guide. This won’t prevent every attack, but it limits what a compromised agent can do and creates accountability that doesn’t exist today.

Protocols are standardizing. Anthropic’s Model Context Protocol and Google’s Agent-to-Agent protocol are both now under the Linux Foundation, with MCP housed in the new Agentic AI Foundation and A2A as a separate project. Payment protocols for agent transactions (Google’s AP2 with 60+ partners, Stripe’s x402) are further along than most people realize. Standardized interfaces can be audited, monitored, and secured in ways that ad-hoc integrations can’t. The bad news: the Coalition for Secure AI identified 12 threat categories and nearly 40 distinct threats in MCP alone. These protocols standardize communication but mostly punt on security to implementers. A market for security gateway layers is already forming to fill that gap.

Supply chain security is an old problem in new packaging. OpenClaw’s ClawHub marketplace, where 20% of plugins were malicious, is npm circa 2018. Or PyPI circa 2022. We’ve been here before with every community package ecosystem. The techniques that made progress there (package signing, automated scanning, publisher verification, reproducible builds) apply directly. VirusTotal is already scanning ClawHub uploads. This isn’t solved in any ecosystem, but it’s a problem we know how to improve incrementally. The patterns are established.

Prompt injection is the hard one. The core problem: transformers treat everything in the context window with equal attention, and there’s no architectural mechanism to mark some input as trusted and other input as data. Adversarial instructions embedded in an email, a web page, or a document can hijack an agent’s behavior because the model cannot tell those instructions are less privileged than the ones it was given by its operator.

Anthropic’s reinforcement learning approach has achieved a 1% attack success rate for Claude in browser operations, the best published production number. It came from training the model against simulated attacks rather than just filtering inputs. But Anthropic is honest about the limitation: 1% is still meaningful risk for high-stakes applications.

The most promising research direction is architectural: approaches like ASIDE, which separates instructions and data at the embedding level through orthogonal rotation, creating geometrically distinct representations that the model can differentiate. It’s clean, has zero performance overhead, and addresses the root cause. It also isn’t deployed anywhere and hasn’t been tested against adaptive adversaries.

Prompt injection is going to require changes deep in how models are built and trained. Not filters on top. Not clever system prompts. Actual architectural changes to how transformers process privileged vs. unprivileged input. That work is underway, but it’s years from production. The people closest to it say so openly.

What to watch

Agent identity is the nearest-term win. The infrastructure is shipping now, the concept is sound, and adoption will accelerate as enterprises demand it. If your organization is evaluating agentic AI, start here: does the platform treat the agent as its own identity with scoped permissions, or is it just borrowing yours?

Protocol standardization matters on a slightly longer timeline. The protocols are fragmented today, but convergence under the Linux Foundation is a good sign. Watch whether security becomes a first-class protocol concern or stays an afterthought bolted on by third parties.

Supply chain security is tractable. The playbook from npm and PyPI applies. Expect rapid improvement as the same tools and practices get adapted for agent plugin ecosystems.

Prompt injection is the wildcard. The research is promising but years from production. For now, defense means defense in depth: scoped permissions, monitoring, human-in-the-loop for high-stakes actions. Anyone selling you a prompt injection “solution” today is selling you virus signatures in 1999.

The question isn’t whether to build with agentic AI. The tools are too useful and the productivity gains too real. I’m not going back to working without them. The question is whether the security infrastructure can mature fast enough to match the pace of adoption. OpenClaw showed what happens when it doesn’t.