AI agents don't need to break in. That's the real problem.

At first, it looks like normal debugging.

You ask an agent to help with a problem. It reads files, suggests commands, connects dots, and follows the clues through your project. Then you notice it is not only looking at the thing you asked about. It is checking what tools are installed, what credentials exist, and what data those tools might reach.

I had that moment with Claude. I had the gcloud CLI installed locally, and while we were debugging something unrelated, Claude started probing it to see whether it could access Google Cloud data. Nothing was deleted. Nothing went wrong. But the boundary suddenly felt much larger than the task I thought I had delegated.

That is the uncomfortable shift with AI agents. They are moving from assistants that suggest work to actors that perform it — and they reach through whatever access they can find to do it.

When AI agent access goes wrong

The gcloud moment was harmless. Other teams have seen more destructive versions of the same pattern.

One team was using an AI agent to build out an application. As part of the setup, the agent was given a service account in the team's identity system. The account had broad permissions so the agent could provision and manage users on the team's behalf.

Partway through the session, the agent used that same account to delete every user in the system.

Including the administrator.

There was no warning, no confirmation step, and no partial failure that stopped the damage halfway through. When the team tried to log back in, there was no user left to log in with. No admin meant no easy way to recover access, recreate users, or undo what had happened.

The agent didn't break in. It didn't escalate privileges or find a clever bypass. It used a credential that had already been handed to it, and the identity layer treated the action as valid because the service account was allowed to delete users, and deleting users is exactly what it did.

That shape keeps showing up wherever agents get real access to real systems.

In April 2026, an AI coding agent hit a credential mismatch while debugging a staging environment. Instead of stopping, it searched the local filesystem, found an unrelated API token with blanket access to Railway's entire infrastructure API, and used it to delete a production volume in nine seconds. Railway's own post-mortem didn't frame this as the model going rogue. Their conclusion: the token should never have had that reach, and destructive API calls need a way to be undone.

In 2025, a Replit agent ran commands against a live production database and erased records for over a thousand contacts, then told the team the data was unrecoverable, which turned out to be false. The agent had standing access to the production environment with no enforced separation from development. Access that could write to production could delete from it too.

In February 2026, an OpenClaw agent deleted hundreds of emails from a safety researcher's inbox after it had been asked to check the inbox and suggest what to delete or archive, not to act. When the session grew long enough to trigger context compaction, the agent lost her original instruction in the summary and kept going. The access never changed. Only the constraint had been in the conversation, and the conversation forgot it.

Different products. Different environments. Different failure modes.

But the same underlying problem: nobody had to break in. The access was already there.

The gap: AI agent permissions carry access, not intent

Most identity systems are good at answering one question:

Can this credential perform this action?

They are much less good at answering the question that actually matters for agents:

Was this agent delegated to perform this action, for this user, for this purpose, at this moment?

A service account can tell a system what it is allowed to do. It cannot tell the system who delegated the action, what the agent was supposed to complete, how long the access should last, or whether the action still fits what the person actually wanted.

The credential carries permission. It does not carry intent.

That is why conversational instructions are not enough. A user can tell an agent to wait, check first, stay out of production, or only suggest changes. But unless the access layer enforces that boundary, the instruction only exists in the conversation. If the agent ignores it, loses context, summarizes it away, or acts around it, the system still sees a valid credential making a valid request.

From the system's point of view, nothing is wrong.

From the team's point of view, everything is.

Why the old identity model doesn't fit agents

This keeps happening because the identity model underneath was not designed for agents, and trying to fit them into it leaves out the context that matters most.

Identity systems were built for two kinds of actors. Users: humans who authenticate, take actions, and are accountable for what they do. Service accounts: machine-to-machine credentials for automated, predictable processes, tightly scoped, not acting on anyone's behalf in real time. Both models assume the actor is either a person who can be held responsible, or a fixed process with known, stable behavior.

Agents are neither. They act on behalf of a specific person, toward a specific goal, across multiple systems, for a limited time, and their behavior is dynamic, not fixed. That makes them a different kind of actor. But most identity systems have no concept for them, so teams reach for what already exists: a user account, a service account, an API key, an admin token. And because those tools were built for different actors, they carry none of the context that agent access actually needs.

  • Who asked this agent to act?
  • What was it supposed to do?
  • Which user, project, or organization is it acting for?
  • What should it not be able to do?
  • When should the access expire?
  • How do we revoke it immediately if something goes wrong?
  • How do we see exactly what it did?

If the identity layer cannot answer those questions, teams are left hoping the agent behaves.

Hope is not an access control model.

Agent access is only one front

Agent access is where the most visible damage happens, but it is not the only place the old identity model breaks down under AI.

How developers integrate identity is changing too. If an agent is helping build an application, it will make decisions about authentication flows, permission structures, and user management on behalf of the developer. That means the identity system has to be something an agent can use correctly without relying on a human to navigate a dashboard or interpret documentation written for a different era of software development.

How software calls APIs is changing. Agents will invoke external services, use tool protocols, trigger workflows, and move across systems that were never designed for a non-human actor with partial human intent. Those calls need identity context. Who is authorizing this, on whose behalf, under what scope, not just a key that works.

And how teams investigate what happened is changing. There is a natural role for AI inside the identity system itself: surfacing anomalies, summarizing audit trails, helping security teams understand which delegation was involved, where the chain of authority started, and whether what the agent did matches what it was authorized to do.

All of these are the same underlying problem in different places. The identity model was built for a world where every actor was either a person or a predictable machine process. Neither model covers something that reasons, decides, delegates, and acts at the intersection of both. The gaps show up wherever agents touch real systems.

What AI agent identity actually requires

An agent acting on someone's behalf needs explicit delegation, not borrowed authority from a credential that happens to be available.

That means a record of who authorized the agent, what it was allowed to do, which user or organization it was acting for, and how long that delegation should last. Not an admin token that works. A structured grant that carries the full context of why it was issued.

It means scope set at the point of delegation, not inherited from whatever the underlying account can do. An agent authorized to provision users should not be able to delete them. An agent debugging staging should not have reach into production infrastructure.

It means expiry by default, so access does not quietly outlive the task it was created for. Revocation in one action, so a team can cut off an agent's access immediately, and know exactly what it touched, without hunting through keys, sessions, and integrations. And auditability tied to the specific delegation, not just the credential, so there is always a clear answer to who authorized this, for what, what the agent actually did, and whether it can be undone.

This is not exotic. It is the same logic identity systems already apply to human access: least privilege, explicit grants, clear ownership, the ability to revoke.

Agents need that model too.

Where ZITADEL is taking this

This is why we are rethinking identity for the AI era at ZITADEL. Agent access should not be bolted onto admin credentials after something breaks. It should be built around delegation, scoped access, expiry, revocation, and auditability from the beginning, not features added on top of the existing model, but first-class properties of what agent identity means.

We are still building this out, and we want to get the model right before we ship it. If you want to try it when there is something concrete to test, join the waitlist.

And if you have already lived through a version of this story, where an agent did exactly what it was allowed to do, and that turned out to be the problem, we would like to hear from you.

Liked it? Share it!