Mapping Agent Trust in 2026, Part 1: A Framework for Agent Trust Infrastructure
In early 2026, OpenClaw users lost crypto and McKinsey lost 46 million internal messages to autonomous agents. The trust safeguards humans built around hiring don't yet exist for AI. This essay maps a framework for the missing infrastructure.
Published: 2026-05-27
I have been building AI agent products since 2024. The models got more capable every few months, and we kept handing them access to things that mattered, often before we knew whether we could trust them with it. By 2026 I wanted to understand what had actually changed, and what those of us building should be watching. So I set out to map the state of agent trust. This is Part 1 of what I found, the problem and a way to think about it.
1. Trusting AI Agents Is A Bargain
In February 2026, vulnerability researcher Paul McCarty found 386 malicious skills on ClawHub, OpenClaw's official repository. They posed as crypto trading tools branded with names like ByBit and Polymarket, and a single attacker had accumulated nearly 7,000 downloads before McCarty's report. By March, separate research teams had documented several more attack patterns against OpenClaw users:
- Fake installers targeting 201 crypto wallets and 49 password managers, distributed through a convincing clone of the real OpenClaw website.
- GitHub phishing campaigns offering $5,000 in fake CLAW tokens that drained MetaMask and Trust Wallet once the user connected.
- WebSocket flaws that let any malicious website hijack a local agent running on the victim's machine.
By RSA Conference 2026 in late March, the count of internet-exposed OpenClaw instances had doubled in a single week, from 230,000 to nearly 500,000. The victims were not naive. They were developers, traders, and professionals who installed a tool to get the efficiency an AI agent promised.
While the OpenClaw wave was still building, the enterprise side had already been hit. On February 28, an autonomous AI agent built by a security startup called CodeWall pointed itself at the open internet and chose its own target. It picked McKinsey. Two hours later, with twenty dollars of compute and no credentials, the agent had full read and write access to Lilli, McKinsey's internal AI platform used by seventy percent of the firm's forty-three thousand consultants. The agent exposed 46.5 million chat messages, 728,000 files, and the 95 system prompts that governed how the AI answered every question. The vulnerability was SQL injection.
Ordinary people lose their crypto to an add-on they installed for convenience. The world's most prestigious consultancy loses its intellectual crown jewels in two hours. People grant agents access to things they care about, expecting utility in return, and the agent or the system around it fails to honor the bargain. It is hiring at machine scale, without the contracts, credentials, insurance, or recourse that make ordinary labor force hiring workable.
This essay is about why that bargain keeps failing, what the trust infrastructure should look like, and where the founders who haven't yet started a company should pay attention.
2. Both Halves of the Bargain
The principal-agent problem is older than AI. When a principal delegates authority to an agent acting on their behalf (a shareholder to a CEO, a client to a lawyer, a homeowner to a contractor), three structural problems show up: the principal cannot fully observe the agent's actions, the agent has interests of their own, and verification is rarely cheap. Jensen and Meckling formalized this in 1976, naming the costs as information asymmetry, monitoring costs, and residual loss. The paper is one of the most-cited in modern economics.
AI agents are not an exception to this problem. They are a new instance of it, compounded by machine speed and the breadth of systems an agent can reach. Hiring a human employee is the closest analog.
- When you bring on a new hire, you grant access based on a few interviews and reference checks. With limited information, you can't see how they think or predict every action they'll take. The hiring bargain survives because of all the safeguards humans built around it over centuries. Contracts, professional licensing, malpractice insurance, references from prior employers, courts you can go to when things go wrong.
- With AI agents, the bargain is the same but the safeguards haven't been built — there's no licensing, no E&O insurance, no standard contract for AI agents. Most production agents run on closed-source LLMs, so you can't interview their reasoning the way you would a new hire. And the tools they invoke have no professional licensing at all, which is how 386 malicious skills passed as legitimate on OpenClaw's official repository. Both played out in a single case in July 2025, when Replit's agent ignored an explicit code freeze, deleted a production database, and lied about what it had done.
Two voices at RSA Conference 2026 captured how the modern field is framing what to do about it.
Adi Shamir, the "S" in RSA, was blunt at the Cryptographers' Panel on March 24 about what agents require to be useful. "I'm totally terrified by what's going on because in order to use it, I have to give access to all my files, all my communication, all my appointments, everything to those agents to make them useful." What Shamir describes is the cost side of a bargain. The principal grants total access in exchange for the agent's utility. Anyone who installs an agent today is taking it, like OpenClaw victims and McKinsey did. The terms are unfavorable not because the agents are uniquely dangerous, but because the principal cannot see what they are trading for utility.
Vasu Jakkal, Microsoft's corporate vice president for security, was equally direct at the opening keynote the day before Shamir about what the surrounding system needs to provide. "We cannot protect what we cannot see. And in this era of agentic AI, organizations will need an observability control plane." Agents, she argued, must be secured "with the same vigilance that we use to secure people", treated as digital coworkers with first-class identity, audit trails, and dynamic permissions. What Jakkal describes is the infrastructure side of the same bargain.
Shamir names what the principal gives up. Jakkal names what the system around the agent should provide in return.
The trust barrier with agents is higher than with humans not because agents are less capable, but because the safeguards that make the human bargain workable have not been built. Holding these views together gives a working definition:
Agent trust is the warranted confidence that an autonomous system will act in alignment with its principal's intent, within bounded authority, with attributable and verifiable consequences.
- Warranted, because trust should be earned through evidence, not assumed.
- Action in alignment with intent, because one failure mode we will return to later was an agent that violated explicit instructions and then lied about it.
- Within bounded authority, because the McKinsey failure was an agent acquiring permissions it should never have had.
- Attributable and verifiable, because an altered system prompt that leaves no log trail is the failure mode that should worry CISOs and ordinary users alike.
3. A Matrix for Agent Trust
OpenClaw was a failure of who could act on the principal's behalf. McKinsey was a failure of what that actor was allowed to do. They also broke at different relationships, OpenClaw between consumers and their agents, McKinsey between an enterprise and its own agents. The bargain breaks along two axes, one relational and one functional. Navigating agent trust takes a matrix that holds both.
The Relational Axis
The relational axis asks between whom trust needs to hold. Three relationships matter for any agent operating in 2026. Consumer-to-agent (you trusting your agent), agent-to-agent (your agent trusting another agent), and enterprise-to-agent (a company trusting an agent that acts in its name).
The Functional Axis
The functional axis asks what kind of trust is at stake. No single 2026 framework names exactly five, but each appears as its own dimension across the literature.
- Identity (who or what is acting) carries the attributable property. It is the first element of the CSA Agentic Trust Framework and the first layer of the CSAI Foundation's Agentic Control Plane.
- Authorization carries bounded authority, and it carries intent alignment too, because the two are inseparable in practice. Authorization-related risks appear as three of the top ten in OWASP's 2026 agentic list.
- Accountability carries verifiability and is the focus of the Auditable Agents framework presented at ACL 2026 and the regulatory target of the EU AI Act.
- Reputation and behavior is where trust becomes warranted through evidence rather than assumed. It covers both runtime monitoring (CSA's framework, CSAI's runtime behavior layer) and the cross-actor history scoring being built by the WEF's Know Your Agent framework, Visa's Trusted Agent Protocol, and Ethereum's ERC-8004.
- Cognitive trust is whether the principal's confidence in the agent is warranted. The trust-in-AI psychology literature describes the internal side, how humans calibrate confidence to actual agent capability and risk (OWASP's ASI09 Human-Agent Trust Exploitation; the Human-Automation Trust Expectation Model (HATEM); the CHI 2026 paper on Trust Formation in AI Delegation).That calibration does not form in isolation. The institutional safeguards from §2 (certification, insurance, references, recourse) are the external inputs that shape it. This column covers both.
The Matrix
On the functional axis, reputation and behavior accumulate over time. The human's cognitive trust updates with every interaction and every new external signal. Both are stateful, threading across the full life of an agent's deployment. The other three (identity, authorization, accountability) are transactional, exercised at specific moments even when they generate state. The matrix below groups the transactional columns on the left and the stateful ones on the right.
Imagine you ask your personal AI agent to book a flight to New York. The agent reaches into your calendar, email, payment method, and saved preferences to carry out the request. Then it has to transact with the airline operator's agent on the other side, the one the airline deployed to handle customer requests. This matrix can be interpreted as follows:
Two agents are now acting on behalf of two principals (you and the airline), with no shared infrastructure for verifying who either is or what authority they hold. You have just taken the bargain, and so has the airline. You trusted your agent to act for you. The airline trusted its agent to represent its policies correctly.
Take authorization/Consumer ↔ Personal Agent cell as an example. It carries two questions at once, what the principal permitted and what the principal intended. What you intended ("under $400 on a major carrier") and what you authorized ("book a flight") were not the same. The agent acts on what you authorized. OpenClaw victims hit the same gap when their skills did things authorization permitted but they never intended. On the Personal agent ↔ Airline's agent layer, the latter has no way to verify a human actually authorized the booking.
While authorization decides what should happen, the accountability function decides who answers when it doesn't. When Air Canada's chatbot misquoted a bereavement-fare policy in 2024, the airline argued the chatbot was a separate legal entity. The BC Civil Resolution Tribunal called that "a remarkable submission" and held Air Canada liable. Every enterprise that deploys an agent owns what its agent says.
4. The Current State of Agent Trust Infrastructure
Based on the §3 matrix, I took two views of the current state. The matrix view shows where vendors cluster across the relational and functional axes (who builds for whom, where the gaps sit). The lifecycle view adds a temporal cut (when each function engages in an agent's action) and surfaces vendor sub-categories that the matrix collapses.
The Matrix View
Enterprise buyers have the most products. The Enterprise↔Agent row in the matrix is dense across every transactional column. Dozens of vendors serve identity, authorization, monitoring, and governance, with platform consolidation underway.
The consumer row has almost no mature products. Auth0 for AI Agents handles consumer consent flows, though its broader identity and authorization features serve the enterprise side. Beyond that, little exists. Consumers can't evaluate an agent's reputation before installing it, can't audit what an agent did with their data afterward, and have no way to calibrate their trust against what's actually warranted. The 80% of OpenClaw victims who downloaded a malicious skill had no commercial product they could have consulted.
Agent-to-agent activity is mostly happening at standards bodies. Visa's Trusted Agent Protocol went live in October 2025. Ethereum's ERC-8004 launched on mainnet in January 2026. The WEF's Know Your Agent framework, the IETF's Trust-Scoring Internet-Draft, and ACHIVX's reputation taxonomy are in earlier stages. None of these are products a developer can install today. The infrastructure that would let two agents from different organizations transact safely is being designed in public.
The Lifecycle View
The lifecycle view surfaces vendor sub-categories that the matrix collapses and distinguishes two kinds of functions:
- Transactional functions happen at specific moments in an agent's action, shown as the four stage boxes.
- Stateful functions thread throughout the lifecycle, accumulating state at each stage, shown as the two bands below.
Reputation's phases trace how a Trust Score gets built, queried, refreshed, and reconciled across an agent's life. Cognitive trust's phases trace the same temporal arc for warranted trust: pre-deployment safeguards, authorization judgment, runtime calibration, post-action reconciliation.
Pre-deployment (§4.0). The testing and vetting that happens before an agent is deployed into production. This is an addition to the agent trust matrix, expanding its scope to include what happens before agents start operating. Pre-deployment produces the baseline that reputation later builds on.
Six reported acquisitions since 2024 (Robust Intelligence into Cisco, Protect AI into Palo Alto Networks, Prompt Security into SentinelOne, FairNow into AuditBoard, Lakera into Check Point, and WhyLabs into Apple) show the category consolidating into platform plays. The academic Reasoning Integrity Score literature grounds the methodology but warns that evaluating on accuracy alone is "dangerously insufficient."
- Red teaming: simulate adversarial inputs against an agent before deployment.
- Evaluation and scoring: test agent behavior on benchmarks and produce trust scores.
Before the action (§4.1). The identity and authorization checks that happen the moment before an agent acts.
- Non-human identity: give agents verifiable identifiers.
- Authorization platforms: enforce fine-grained policy.
- Cross-boundary standards: extend identity and authorization across organizational boundaries.
- Control plane products: integrate identity and authorization into deployable platforms.
During the action (§4.2). The monitoring and intervention that happens while an agent is running.
- Observability: track behavior and surface anomalies.
- Runtime guardrails: intercept dangerous outputs before they leave the agent.
After the action (§4.3). The auditing and accountability work that happens after an agent completes an action.
- AI governance platforms: handle policy, attestation, and reporting.
- Audit specialists: focus on log integrity and post-incident traceability.
Reputation / behavior (§4.4). The state that accumulates about an agent's trustworthiness across its lifecycle.
- Pre-deployment scoring: evaluation vendors produce baseline Trust Scores.
- Runtime behavior tracking: observability and guardrails monitor behavior during operation.
- Closed-loop aggregation: There are products at every lifecycle stage, but they don't talk to each other. Only Vijil Darwin rolls scores and behavior into a unified record, and it's proprietary.
Cognitive trust (§4.5). Whether the principal's confidence in the agent is warranted.
- Internal calibration: psychological work to match trust to actual agent capability. The academic work names the failure mode, but no commercial product helps humans calibrate trust to AI capability and risk.
- External safeguards: insurance, ISO 42001 certification, and IEEE CertifAIed AI ethics certification, all at the Enterprise↔Agent layer. None of this is available to consumers, and none of it extends to cross-org transactions.
What This Map Does Not Yet Resolve
I went in expecting gaps in the trust infrastructure. Two things stood out. Almost all of it serves the enterprise, leaving the consumer side and the space between agents close to empty. And the products that do exist don't connect across an agent's life, so trust built at one stage rarely carries to the next. In Part 2, I plan to map the gaps in full, explore what they mean in practice, and look at who's building to close them.
The practice side is where I most need help. So far my read on how companies actually adopt and govern agents comes from public reporting and my own experience, which only goes so far. If you're building products for agent trust, or dealing with these challenges inside a company, I'd like to hear how it really works from where you sit. DM me on LinkedIn.