Our AI-Agent Safety Checklist (research note)

A demo agent and a production agent look the same in a screenshot. The difference is everything that happens when the inputs are hostile, the user is not who you assumed, and the action is irreversible. This is the short list we'd want answered before letting an agent act on anything that matters.

Authentication on every action. Is the agent acting as a verified user, with that user's real permissions — checked at the point of action, not assumed from the session? An agent should never be able to do more than the human it's acting for.
Capabilities scoped to the task. Does this agent hold only the tools its current job needs? If a read-only task has write or send access "just in case," that's the blast radius of your next injection.
Data access on a need-to-know basis. Can the agent reach only the records relevant to this user and this task? Row-level access controls matter as much for agents as for the humans behind them.
Untrusted input is isolated. Is retrieved or user-supplied content kept structurally separate from your instructions, and treated as data — not authority — by the model?
Output is checked before it acts. Is there a gate between the model's decision and the consequence — a policy check that can refuse an out-of-bounds action?
Humans approve the irreversible. Do high-consequence actions (payments, deletions, external sends) pause for a person who can see the action and its reasoning?
Every decision is logged, tamper-evidently. If something goes wrong, can you reconstruct what the agent saw, decided, and did — from a record you can trust hasn't been altered?
It's actually audit-ready. Could you hand the log to a reviewer and have it stand on its own? "We have logs" and "we can prove what happened" are different claims.

Using the list

None of these are exotic. They're the agent-era versions of well-known engineering controls — authentication, least privilege, scoped data, logging. What's new is that an LLM in the middle makes it easy to skip them, because the agent feels like it's "just answering," right up until it takes an action you can't take back.

The questions that are hardest to answer cleanly — capability scope, output checks, tamper-evident audit — are exactly the ones a runtime at the call site is meant to handle. That's the runtime SecuRight is building; it's in development, and this writing is where we work the thinking out in the open.