Is a browser-first AI agent safer than a Docker sandbox?

Browser-first runtimes have a smaller attack surface because the dangerous capabilities (shell execution, filesystem access) are not registered at all. A Docker sandbox still has those capabilities inside the container; a kernel CVE is the escape vector. Browser-first loses on flexibility — you can't run arbitrary tools — but wins on blast radius.

What does human-in-the-loop mean for AI agents?

Human-in-the-loop means the agent stops and waits for a human to approve before performing any action flagged destructive, irreversible, or side-effectful against a third party. The approval is recorded against the turn so it can be audited. Without HITL, an agent that gets prompt-injected can act on its own.

How do signed skills protect against malicious AI prompts?

Signed skills are AI behaviors distributed as content (often Markdown SKILL.md files) with a cryptographic signature. The runtime verifies the signature before loading, refuses unsigned or tampered skills, and restricts what each skill can do via capability scoping. This stops a community-published skill from silently exfiltrating data.

What is capability-scoped multi-agent handoff?

Capability-scoped handoff means a parent agent only forwards the specific permissions and context slices a sub-agent needs for its task. A research sub-agent doesn't get write access to memory; a code-formatting sub-agent doesn't get the network. Each handoff is the contract for what the sub-agent is allowed to do.

← Home

Guide

The Ultimate Guide to Safe AI Agent Runtimes in 2026

Last updated: 2026-04-27 · Read time: 8 minutes · Maintained by GnamiAI

1. What is an AI agent runtime?

An AI agent runtime is the host environment that gives a large-language-model agent its capabilities: file access, network calls, shell commands, tool registrations, persistent memory, and multi-agent handoff. The runtime is what actually does things; the model only generates text instructing the runtime to do them.

This separation matters: the model is mostly fungible — you can swap Claude, GPT-5, Gemini, or a local Llama in and out — but the runtime is what determines what your agent can break. Picking a safer runtime is a one-time architectural decision; picking a safer model is a continuous bet.

2. Three runtime architectures, compared

Today's AI agent platforms fall into three categories. Each removes a different attack surface, and each makes a different trade against flexibility.

Property	Browser-first	Docker / microVM	Local install
Shell access	Not registered	Container-bounded	Full host
Filesystem	Not registered	Container disk	Full host
Network	Provider API only	Container-bounded	Full host
Escape vector	None — capability missing	Kernel CVE → host	Already inside the host
Install step	None (browser tab)	Docker engine + image	Native binary
Latency	≈10ms (in-process)	~50–500ms (container spawn)	≈10ms (in-process)
Best for	Approval-gated work, multi-tenant SaaS	Code execution, untrusted plugin runtime	Power-user single-tenant tooling

Browser-first (e.g. GnamiAI)

The runtime is a static web page plus serverless functions. The agent is given exactly the capabilities the application code registers — and no others. There is no shell tool to call because the build doesn't contain one. A jailbreak prompt that says "run this in bash" produces text the runtime ignores; there's nothing on the other side of the request to execute it.

Docker / microVM sandbox (e.g. e2b, Daytona, Modal)

The runtime spawns an isolated container per task. The agent has a full Linux environment inside, including shell, package managers, and local network. The sandbox boundary is enforced by the kernel, not by the application. Strong choice when the agent legitimately needs to execute code (data analysis, code generation that gets run); the residual risk is a kernel-level escape, which has happened historically.

Local install (e.g. CLIs, desktop agents)

The runtime is a native binary on your machine. The agent shares whatever permissions your user account has, including reading SSH keys, modifying any of your code, posting to any service you're logged into. There is no boundary — anything you'd worry about malware doing, the agent can do too.

3. Threat model: what can an AI agent actually do wrong?

Three failure modes drive every design decision below. Naming them explicitly makes the trade-offs concrete.

Prompt injection. Untrusted input reaches the model and contains hidden instructions that override the intended task. The model is not a security boundary — it will obey injected instructions if the runtime lets it.
Misaligned reasoning. The model, with no malicious input, decides the right next step is something destructive that the user did not ask for. Often a side effect of vague goals.
Compromised skill / tool. A community-distributed skill has been tampered with or contains exfiltration logic the user didn't read. Most relevant for marketplaces.

The runtime architecture mostly defends against #1 (by removing capabilities). HITL defends against #2 (by gating side effects on explicit human consent). Signed skills + content classification defend against #3.

4. Human-in-the-loop: when does the agent pause?

An HITL agent doesn't pause for everything — that's just chat. It pauses when an action is:

Destructive — deletes, drops, force-pushes, mass-edits.
Irreversible — sends an email, posts to a channel, triggers a webhook.
Side-effectful against a third party — money, identity, reputation.

The pause is a UI prompt with the full action context: what tool, what arguments, against which account, with what expected outcome. The approval is recorded against the turn so an audit log can prove which human signed off on which action and when.

5. Signed skills: how to safely extend an agent

A "skill" is reusable agent behavior — a prompt fragment, sometimes with attached configuration, occasionally with executable code. Marketplaces of skills are inevitable for any popular agent runtime, so the question is how to distribute them safely.

The safest design — chosen by GnamiAI — is to make skills non-executable content. A skill is a Markdown file injected into the system prompt at runtime. There is no code to sandbox because there is no code; the only attack surface is prompt injection, which is the same surface every other prompt opens. The skill registry adds:

Server-side fetch with SSRF guards (no localhost, no metadata endpoints, no link-local IPs).
Content classification — reject HTML, binaries, images, executable shebangs.
NUL-byte scrubbing.
Signature verification (Ed25519) on community-distributed skills, with manual review for new publishers.

Compare this to skill formats that are bundled archives containing scripts: those need a sandbox, an interpreter version pin, and dynamic analysis to be trusted. Markdown-only is simpler to verify because there is less to verify.

6. Capability-scoped multi-agent handoff

Multi-agent systems amplify any of the three failure modes above unless handoff is restricted. The pattern that works:

Each sub-agent declares the capabilities it needs at registration time.
The handoff carries only those capabilities — never the parent's full set.
Context slices replace full transcripts: a research sub-agent gets the question, not the conversation history.
Sub-agent output goes back through the parent's HITL gate before reaching the user — the parent stays accountable.

This stops a coding sub-agent from accidentally seeing the user's memory, a research sub-agent from accidentally getting the network, and a third-party-tool sub-agent from accidentally exfiltrating anything by virtue of having "access to the parent context."

7. How to choose: a decision matrix

The agent runs against my own user data on a server I trust → browser-first.
The agent generates and runs code on input I don't fully trust → Docker / microVM sandbox.
The agent runs only against my own data on my own machine and I'm OK with that risk → local install.
I want to distribute community skills safely → browser-first + Markdown-only skills, OR sandboxed runtime + signed bundles.
I'm building a multi-tenant SaaS → browser-first wins on per-tenant isolation; you don't want each tenant's agent able to escape into your shared infrastructure.

8. FAQ

Do I need a sandbox if my agent only does API calls?

No. If every external effect goes through a small set of inspectable HTTP clients with HITL approvals on side-effectful calls, you don't need a process-level sandbox at all. Browser-first is the right architecture for this.

Is OpenAI's Responses API a sandbox?

No. It's a model API. The runtime that calls it owns the security model. The same is true for Anthropic, Gemini, and OpenRouter — picking the safer model doesn't change which capabilities your agent has access to.

What's the lowest-effort way to add HITL to my agent?

Tag every tool call as either "safe" (read-only, no third party) or "approval-required" (everything else). Block on a UI prompt before any approval-required call. Record the approval against the turn id. Don't let the model hide approvals behind an inner loop — the gate must be at the runtime layer.

Do I still need signed skills if I trust my own creators?

If "your own creators" includes any user who can publish, yes — trust at the publisher level is not enough. Signed skills give you per-publisher revocation and an audit chain that's more credible than "I checked their identity at signup."

Try a browser-first runtime

GnamiAI implements every pattern in this guide as the default behavior. No shell, no sandbox to escape, signed-skill marketplace, HITL gates on every destructive action.

Create your workspace Read the Manifesto Browse the marketplace