Mapping Agent Trust in 2026, Part 1: A Framework for Agent Trust Infrastructure

In early 2026, OpenClaw users lost crypto and McKinsey lost 46 million internal messages to autonomous agents. The trust safeguards humans built around hiring don't yet exist for AI. This essay maps a framework for the missing infrastructure.

Published: 2026-05-27

I have been building AI agent products since 2024. The models got more capable every few months, and we kept handing them access to things that mattered, often before we knew whether we could trust them with it. By 2026 I wanted to understand what had actually changed, and what those of us building should be watching. So I set out to map the state of agent trust. This is Part 1 of what I found, the problem and a way to think about it.


1. Trusting AI Agents Is A Bargain

In February 2026, vulnerability researcher Paul McCarty found 386 malicious skills on ClawHub, OpenClaw's official repository. They posed as crypto trading tools branded with names like ByBit and Polymarket, and a single attacker had accumulated nearly 7,000 downloads before McCarty's report. By March, separate research teams had documented several more attack patterns against OpenClaw users:

By RSA Conference 2026 in late March, the count of internet-exposed OpenClaw instances had doubled in a single week, from 230,000 to nearly 500,000. The victims were not naive. They were developers, traders, and professionals who installed a tool to get the efficiency an AI agent promised.

While the OpenClaw wave was still building, the enterprise side had already been hit. On February 28, an autonomous AI agent built by a security startup called CodeWall pointed itself at the open internet and chose its own target. It picked McKinsey. Two hours later, with twenty dollars of compute and no credentials, the agent had full read and write access to Lilli, McKinsey's internal AI platform used by seventy percent of the firm's forty-three thousand consultants. The agent exposed 46.5 million chat messages, 728,000 files, and the 95 system prompts that governed how the AI answered every question. The vulnerability was SQL injection.

Ordinary people lose their crypto to an add-on they installed for convenience. The world's most prestigious consultancy loses its intellectual crown jewels in two hours. People grant agents access to things they care about, expecting utility in return, and the agent or the system around it fails to honor the bargain. It is hiring at machine scale, without the contracts, credentials, insurance, or recourse that make ordinary labor force hiring workable.

This essay is about why that bargain keeps failing, what the trust infrastructure should look like, and where the founders who haven't yet started a company should pay attention.


2. Both Halves of the Bargain

The principal-agent problem is older than AI. When a principal delegates authority to an agent acting on their behalf (a shareholder to a CEO, a client to a lawyer, a homeowner to a contractor), three structural problems show up: the principal cannot fully observe the agent's actions, the agent has interests of their own, and verification is rarely cheap. Jensen and Meckling formalized this in 1976, naming the costs as information asymmetry, monitoring costs, and residual loss. The paper is one of the most-cited in modern economics.

AI agents are not an exception to this problem. They are a new instance of it, compounded by machine speed and the breadth of systems an agent can reach. Hiring a human employee is the closest analog.

Two voices at RSA Conference 2026 captured how the modern field is framing what to do about it.

Adi Shamir, the "S" in RSA, was blunt at the Cryptographers' Panel on March 24 about what agents require to be useful. "I'm totally terrified by what's going on because in order to use it, I have to give access to all my files, all my communication, all my appointments, everything to those agents to make them useful." What Shamir describes is the cost side of a bargain. The principal grants total access in exchange for the agent's utility. Anyone who installs an agent today is taking it, like OpenClaw victims and McKinsey did. The terms are unfavorable not because the agents are uniquely dangerous, but because the principal cannot see what they are trading for utility.

Vasu Jakkal, Microsoft's corporate vice president for security, was equally direct at the opening keynote the day before Shamir about what the surrounding system needs to provide. "We cannot protect what we cannot see. And in this era of agentic AI, organizations will need an observability control plane." Agents, she argued, must be secured "with the same vigilance that we use to secure people", treated as digital coworkers with first-class identity, audit trails, and dynamic permissions. What Jakkal describes is the infrastructure side of the same bargain.

Shamir names what the principal gives up. Jakkal names what the system around the agent should provide in return.

The trust barrier with agents is higher than with humans not because agents are less capable, but because the safeguards that make the human bargain workable have not been built. Holding these views together gives a working definition:

Agent trust is the warranted confidence that an autonomous system will act in alignment with its principal's intent, within bounded authority, with attributable and verifiable consequences.


3. A Matrix for Agent Trust

OpenClaw was a failure of who could act on the principal's behalf. McKinsey was a failure of what that actor was allowed to do. They also broke at different relationships, OpenClaw between consumers and their agents, McKinsey between an enterprise and its own agents. The bargain breaks along two axes, one relational and one functional. Navigating agent trust takes a matrix that holds both.

The Relational Axis

The relational axis asks between whom trust needs to hold. Three relationships matter for any agent operating in 2026. Consumer-to-agent (you trusting your agent), agent-to-agent (your agent trusting another agent), and enterprise-to-agent (a company trusting an agent that acts in its name).

The Functional Axis

The functional axis asks what kind of trust is at stake. No single 2026 framework names exactly five, but each appears as its own dimension across the literature.

The Matrix

On the functional axis, reputation and behavior accumulate over time. The human's cognitive trust updates with every interaction and every new external signal. Both are stateful, threading across the full life of an agent's deployment. The other three (identity, authorization, accountability) are transactional, exercised at specific moments even when they generate state. The matrix below groups the transactional columns on the left and the stateful ones on the right.

Visual 1. The Agent Trust Matrix

Imagine you ask your personal AI agent to book a flight to New York. The agent reaches into your calendar, email, payment method, and saved preferences to carry out the request. Then it has to transact with the airline operator's agent on the other side, the one the airline deployed to handle customer requests. This matrix can be interpreted as follows:

Visual 2. The Agent Trust Matrix Applied to a Flight Booking

Two agents are now acting on behalf of two principals (you and the airline), with no shared infrastructure for verifying who either is or what authority they hold. You have just taken the bargain, and so has the airline. You trusted your agent to act for you. The airline trusted its agent to represent its policies correctly.

Take authorization/Consumer ↔ Personal Agent cell as an example. It carries two questions at once, what the principal permitted and what the principal intended. What you intended ("under $400 on a major carrier") and what you authorized ("book a flight") were not the same. The agent acts on what you authorized. OpenClaw victims hit the same gap when their skills did things authorization permitted but they never intended. On the Personal agent ↔ Airline's agent layer, the latter has no way to verify a human actually authorized the booking.

While authorization decides what should happen, the accountability function decides who answers when it doesn't. When Air Canada's chatbot misquoted a bereavement-fare policy in 2024, the airline argued the chatbot was a separate legal entity. The BC Civil Resolution Tribunal called that "a remarkable submission" and held Air Canada liable. Every enterprise that deploys an agent owns what its agent says.


4. The Current State of Agent Trust Infrastructure

Based on the §3 matrix, I took two views of the current state. The matrix view shows where vendors cluster across the relational and functional axes (who builds for whom, where the gaps sit). The lifecycle view adds a temporal cut (when each function engages in an agent's action) and surfaces vendor sub-categories that the matrix collapses.

The Matrix View

Visual 3. The Agent Trust Landscape, Matrix View

Enterprise buyers have the most products. The Enterprise↔Agent row in the matrix is dense across every transactional column. Dozens of vendors serve identity, authorization, monitoring, and governance, with platform consolidation underway.

The consumer row has almost no mature products. Auth0 for AI Agents handles consumer consent flows, though its broader identity and authorization features serve the enterprise side. Beyond that, little exists. Consumers can't evaluate an agent's reputation before installing it, can't audit what an agent did with their data afterward, and have no way to calibrate their trust against what's actually warranted. The 80% of OpenClaw victims who downloaded a malicious skill had no commercial product they could have consulted.

Agent-to-agent activity is mostly happening at standards bodies. Visa's Trusted Agent Protocol went live in October 2025. Ethereum's ERC-8004 launched on mainnet in January 2026. The WEF's Know Your Agent framework, the IETF's Trust-Scoring Internet-Draft, and ACHIVX's reputation taxonomy are in earlier stages. None of these are products a developer can install today. The infrastructure that would let two agents from different organizations transact safely is being designed in public.

The Lifecycle View

Visual 4. The Agent Trust Lifecycle, Lifecycle View

The lifecycle view surfaces vendor sub-categories that the matrix collapses and distinguishes two kinds of functions:

  1. Transactional functions happen at specific moments in an agent's action, shown as the four stage boxes.
  2. Stateful functions thread throughout the lifecycle, accumulating state at each stage, shown as the two bands below.

Reputation's phases trace how a Trust Score gets built, queried, refreshed, and reconciled across an agent's life. Cognitive trust's phases trace the same temporal arc for warranted trust: pre-deployment safeguards, authorization judgment, runtime calibration, post-action reconciliation.

Pre-deployment (§4.0). The testing and vetting that happens before an agent is deployed into production. This is an addition to the agent trust matrix, expanding its scope to include what happens before agents start operating. Pre-deployment produces the baseline that reputation later builds on.

Six reported acquisitions since 2024 (Robust Intelligence into Cisco, Protect AI into Palo Alto Networks, Prompt Security into SentinelOne, FairNow into AuditBoard, Lakera into Check Point, and WhyLabs into Apple) show the category consolidating into platform plays. The academic Reasoning Integrity Score literature grounds the methodology but warns that evaluating on accuracy alone is "dangerously insufficient."

Before the action (§4.1). The identity and authorization checks that happen the moment before an agent acts.

During the action (§4.2). The monitoring and intervention that happens while an agent is running.

After the action (§4.3). The auditing and accountability work that happens after an agent completes an action.

Reputation / behavior (§4.4). The state that accumulates about an agent's trustworthiness across its lifecycle.

Cognitive trust (§4.5). Whether the principal's confidence in the agent is warranted.


What This Map Does Not Yet Resolve

I went in expecting gaps in the trust infrastructure. Two things stood out. Almost all of it serves the enterprise, leaving the consumer side and the space between agents close to empty. And the products that do exist don't connect across an agent's life, so trust built at one stage rarely carries to the next. In Part 2, I plan to map the gaps in full, explore what they mean in practice, and look at who's building to close them.

The practice side is where I most need help. So far my read on how companies actually adopt and govern agents comes from public reporting and my own experience, which only goes so far. If you're building products for agent trust, or dealing with these challenges inside a company, I'd like to hear how it really works from where you sit. DM me on LinkedIn.