ペンリジェント・ヘッダー

Claude Code Security Bypass Research

Claude Code is powerful for the same reason it is risky: it collapses code reading, file editing, shell execution, project memory, and external tool access into one runtime. Anthropic’s own documentation describes it as an agentic coding tool that reads your codebase, edits files, runs commands, and integrates with development tools. Once a tool can do all of that inside a repository, the security question is no longer just whether the model can be prompt-injected. The deeper question is what happens when repository content, shared configuration, memory, and tool connections all participate in the same execution path. (Claude)

That is where the Claude Code research story gets interesting. Public vendor advisories, NVD records, and third-party research from Check Point all point in the same direction: the highest-value failures were not “the model said something wrong.” They were trust-boundary failures. In different forms, Claude Code allowed execution before trust was fully established, allowed project-controlled configuration to influence security posture, or exposed gaps between permissions, shell validation, and network control. Those are system-design failures at the boundary between configuration and execution. (NVD)

This matters beyond one product release cycle. Claude Code is one of the clearest public case studies of what happens when a repository stops being just data. In an agentic coding environment, the repo can carry instructions, memory, hooks, permissions, and tool integrations that affect what the agent sees and what it is allowed to do. A modern security review therefore has to model the repo as a capability surface, not just a source tree. Anthropic’s documentation makes that shift visible in plain language: project-shared settings can include permissions, hooks, and MCP servers; project-scoped MCP servers live in .mcp.json and are designed to be checked into source control; CLAUDE.md and auto memory are loaded at session start; and subagents can inherit tools from the main conversation, including MCP tools. (Claude)

Claude Code Security Starts at the Trust Boundary

A lot of discussion around AI coding security still assumes the old model. In that model, the AI is a text generator, the shell is separate, repo configuration is mostly inert unless a build step runs, and “prompt injection” is mostly about confusing the model. Claude Code breaks that model. Anthropic documents Claude Code as a tool-using environment with multiple permission modes, native sandboxing for Bash, project-level memory, hooks that can automate actions at lifecycle events, MCP integrations for external services, plugins that package hooks and MCP servers, and subagents that can delegate work across contexts. That is a much richer runtime than a code-completion plugin. (Claude)

The trust boundary is the only thing keeping that runtime from treating untrusted repository content as live operational input. Anthropic’s security docs say first-time codebase runs and new MCP servers require trust verification, and tools that make network requests require user approval by default. They also say Claude Code only has the permissions you grant it, and you are responsible for reviewing proposed code and commands before approval. Those are sensible controls, but they only work if the sequence is correct. If configuration is resolved before trust, or if a project can silently change the permission mode that governs the trust dialog itself, then the model’s alignment is not the main problem anymore. The sequence is. (Claude)

Anthropic’s March 2026 auto mode write-up makes another important point explicit: users approve 93 percent of permission prompts. Anthropic built auto mode partly to reduce approval fatigue, because a system that asks constantly will eventually train people to click through. That is a useful admission because it reframes the practical threat model. Human approval is not a hard security boundary if the product’s default operational pattern encourages habituation. In that world, the attacker does not always need a full zero-click exploit chain. Sometimes they only need a workflow where a developer stops noticing what “approve” now means. (Anthropic)

Anthropic’s own docs also draw an important line between security mechanisms. Sandboxing isolates Bash subprocesses with filesystem and network controls, but the docs are equally clear that built-in file tools use the permission system directly rather than running through the sandbox. That means “use the sandbox” is not a complete answer to Claude Code risk. If your threat model involves configuration files, built-in edit and write tools, or computer use, you need to think across more than one boundary. Sandboxing is valuable, but it is only one layer in a stack. (Claude)

Claude Code Hooks, MCP, Memory, Subagents, and Plugins Expand the Attack Surface

The simplest way to understand Claude Code’s security surface is to stop thinking in terms of one assistant and start thinking in terms of layered extension points. Anthropic’s docs describe hooks as user-defined shell commands, HTTP endpoints, or LLM prompts that execute automatically at specific points in Claude Code’s lifecycle. The quickstart guide is even more direct: hooks let you run shell commands automatically when Claude edits files, finishes tasks, or needs input, and they are meant to provide deterministic control rather than relying on the model to decide whether to act. That one phrase, deterministic control, is the key security fact. It means hooks are not just context. They are automation. (Claude)

MCP is a different surface, but it creates a similar shift. Anthropic’s Claude Code docs say project-scoped servers are stored in a .mcp.json file at the project root, that the file is designed to be checked into version control, and that it exists specifically so all team members can access the same MCP tools and services. The docs also warn users to trust MCP servers they install and be especially careful with MCP servers that fetch untrusted content because those can expose users to prompt injection risk. In other words, the official model for team collaboration already assumes that repository content can define tool connectivity. That is convenient for engineers. It also means repository content has a direct path into the tool plane. (Claude)

Memory adds a third plane. Anthropic documents two complementary memory systems in Claude Code, both loaded at the start of every conversation. Claude treats them as context, not enforced configuration. CLAUDE.md files are read at the start of each session, and CLAUDE.md loads in full; MEMORY.md has a startup limit, but the first 200 lines or 25 KB still load automatically. That makes memory poisoning and context shaping different from a casual prompt attack. A poisoned prompt is transient. A poisoned project instruction file can become part of every session. The agent may treat it as advice rather than code, but in an agentic tool, advice influences action selection. (Claude API Docs)

Subagents and plugins compound the same pattern. Anthropic’s subagent docs say that by default, subagents inherit all tools from the main conversation, including MCP tools, unless you explicitly restrict them with an allowlist or denylist. The plugins reference says plugins are a packaging layer that can bundle skills, agents, hooks, MCP servers, and LSP servers into a single installable unit. Anthropic also imposes some security limits there: plugin-shipped agents cannot include hooks, mcpServersあるいは permissionMode. That restriction is revealing. It signals that Anthropic already treats those surfaces as especially sensitive. Even with that limit, however, the architectural lesson remains: extensibility is not neutral. It is a distribution path for policy and capability. (Claude API Docs)

The table below reduces the official feature set into the security consequences that matter most during review. (Claude)

SurfaceWhat the official docs sayWhy it matters for security
HooksHooks execute automatically at lifecycle events and are designed for deterministic controlProject configuration can become automatic execution
Project-scoped MCP.mcp.json is project-root config designed for version control and team sharingShared tool connectivity becomes part of repo trust
CLAUDE.md and auto memoryMemory files load at session start and influence contextProject context can persistently shape decisions
Permission settingsPermissions can be checked into version control and shared across teamsCollaboration config can alter safety posture
SubagentsSubagents inherit tools, including MCP tools, by defaultDelegation can spread more capability than intended
PluginsPlugins package skills, agents, hooks, MCP servers, and LSP serversPackaging becomes a distribution layer for policy and risk

What changes in practice is that the repository is no longer only a source artifact. It is also a runtime influence surface. Security teams spent years teaching developers that source code is dangerous when compiled, executed, or deployed. Claude Code adds another category: source-adjacent configuration that is not obviously executable in the classic sense, but still changes what the agent will run, what it will connect to, and what it will trust. That is the bridge attackers want.

Claude Code Permissions and Trust Make Different Tradeoffs Than Most Teams Realize

The official permission model in Claude Code is richer than a simple yes-or-no approval prompt. Anthropic documents multiple permission modes: default, acceptEdits, plan, auto, bypassPermissionsそして dontAsk. plan mode lets Claude read files, run shell commands to explore, and write a plan without editing source. dontAsk auto-denies anything not explicitly allowed. bypassPermissions disables permission prompts and safety checks, with only a few remaining prompts for critical local state like .git, .vscode, .idea, and parts of .claude. Anthropic explicitly warns that bypassPermissions offers no protection against prompt injection or unintended actions and says it should be reserved for isolated environments. (Claude)

That matters because public advisories show attackers do not need to invent a new security model. They only need to find inconsistencies inside the one the product already has. If project configuration can influence the permission mode before trust is established, or if a tool path can reach a write sink the permission system believed it had blocked, then the user interface still looks permission-based while the actual boundary has already been crossed. That is why so many of Claude Code’s advisories are not about large conceptual failures. They are about sequencing, parsing, and precedence. Those details decide whether the model is operating inside a policy or around it. (ギットハブ)

The docs on settings precedence sharpen that point further. Anthropic says project-shared settings in .claude/settings.json live in source control, and that shared project settings take precedence over some user-level choices when the hierarchy is resolved. The settings page explicitly lists team-shared permissions, hooks, and MCP servers as project-scope use cases. That is useful for collaboration, but from a security standpoint it means repo content can be a policy vehicle. Once you admit repo-scoped policy, your secure design depends on reliably separating safe collaborative defaults from execution-relevant power. Public advisories suggest that separation was not always as strong as it needed to be. (Claude)

The practical takeaway is simple and uncomfortable. In Claude Code, the permission system is not the whole security model. The real security model is the combination of settings scope, trust verification, mode resolution, shell validation, network approval, sandbox boundaries, and extension surfaces. Teams that review any one of those in isolation will miss the path that matters.

The March 31 Source Map Story Changed the Visibility Problem

On March 31, 2026, public GitHub mirrors and third-party reporting claimed that the npm-distributed Claude Code package exposed a cli.js.map file and enough embedded source information to reconstruct a large portion of the TypeScript codebase. The mirror repositories are explicit that they are unofficial extractions, not Anthropic repositories. Third-party reporting said Anthropic had not yet issued a public statement at the time of initial publication. Taken together, those facts do not justify treating the leak narrative as an official source of truth. They do justify treating it as a real change in the cost of white-box review. (ギットハブ)

That distinction matters. Security conclusions about Claude Code should still be anchored in official docs, GitHub advisories, NVD records, and reputable research. But the source-map reporting changes the environment in which researchers work. A runtime that combines shell execution, permissions, agent loops, tool dispatch, and repository-controlled settings becomes far easier to inspect when public mirrors claim to expose readable code paths. Even if you never rely on those mirrors directly, their existence lowers the barrier for independent analysis, patch diffing, and architecture review. For a tool like Claude Code, that is not a minor event. It changes the economics of scrutiny. (ギットハブ)

There is also a broader engineering lesson here. The Luna write-up on the source-map incident framed it as a build pipeline failure rather than a hack, noting that a source map in a public package can expose proprietary source. That interpretation lines up with the basic mechanics of package distribution. For a security-focused audience, the important point is not gossip about leaked code. It is that release artifacts for agentic developer tools deserve the same scrutiny as the tools themselves, because artifact mistakes can turn closed implementation details into open analysis surfaces overnight. (Lunatech)

Claude Code Security Bypass Research

Claude Code Vulnerabilities Mapped the Bypass Paths in Public

The public vulnerability history for Claude Code is unusually instructive because the issues cluster around the same theme. They are not random bugs. They repeatedly show the product struggling with boundaries among trust, configuration, parsing, tool execution, and network control.

The earliest high-signal example is CVE-2025-59536. NVD says versions before 1.0.111 were vulnerable to code injection because of a bug in the startup trust dialog implementation. Claude Code could be tricked into executing code contained in a project before the user accepted the startup trust dialog, with exploitation requiring a user to start Claude Code in an untrusted directory. On its face, that sounds like one startup bug. In architectural terms, it is a statement that “open repo” and “start agent” were too close together in the control flow. The user had not yet successfully expressed trust, but the system had already crossed into action. (NVD)

CVE-2026-21852 pushed the same lesson into the network plane. Anthropic’s GitHub advisory and NVD say that before version 2.0.65, a malicious repository could set ANTHROPIC_BASE_URL in a settings file and cause Claude Code to issue API requests before showing the trust prompt, potentially leaking API keys to an attacker-controlled endpoint. This is not a prompt-injection story. It is a sequencing story. The question is not whether the model understood the user. The question is whether the runtime allowed a repo-controlled environment setting to take effect before trust was established. (ギットハブ)

The March 2026 GitHub advisory GHSA-mmgp-wc2j-qcv7 made the permission issue even more concrete. Anthropic disclosed that Claude Code resolved the permission mode from settings files, including repo-controlled .claude/settings.json, before determining whether to display the workspace trust confirmation dialog. A malicious repo could set permissions.defaultMode への bypassPermissions, silently skipping the trust dialog on first open and placing the user into a permissive mode without explicit consent. That is an unusually clean example of configuration undermining the very boundary meant to govern configuration. The safe thing to do in a design like this is obvious in retrospect: trust must be decided before repo-controlled settings can influence execution posture. The advisory exists because that ordering was not enforced strongly enough. (ギットハブ)

Check Point’s February 2026 research tied several of these ideas together. Their write-up said malicious project configurations could use Hooks, MCP servers, and environment variables to achieve remote code execution and API token exfiltration when users cloned and opened untrusted repositories. Whether you agree with every label in the write-up or not, the structural point is important: this was not a claim about a single parser edge case. It was a claim that project files had become an execution and credential surface. That is exactly the boundary change security teams need to internalize. (Check Point Research)

The command-validation advisories reinforce the same conclusion from another angle. Anthropic published advisories for arbitrary code execution via command validation bypass, for untrusted command execution through 見つける, for arbitrary file writes via ZSH clobber parsing, for write-protection bypass through piped sedecho, and for directory-change-based bypass of protected writes. Different mechanics, same lesson: once the product needs to parse and classify shell intent on behalf of the user, every shell grammar corner becomes security-relevant. A permission model that looks strong in the UI is only as strong as the parser underneath it. (ギットハブ)

The same is true for the sandbox and network layers. Anthropic disclosed a sandbox escape via persistent configuration injection in settings.json, where a missing file at startup meant the sandbox did not protect that path correctly, allowing malicious code in the sandbox to create a persistent configuration that later executed with host privileges when Claude Code restarted. They also disclosed a domain validation bypass in trusted WebFetch handling, where startsWith() logic could let attacker-controlled domains pass validation if they were prefixed with a trusted domain string. Neither issue is glamorous. Both are exactly the kind of details that decide whether a “safer agent runtime” is actually safer. (ギットハブ)

The table below turns the public issue history into a research map rather than a changelog. (NVD)

IdentifierWhat failedなぜそれが重要なのかFixed state
CVE-2025-59536Startup trust dialog bug allowed project code execution before trust acceptanceTrust was not a hard gate before executionFixed in 1.0.111
CVE-2026-21852Repo-controlled ANTHROPIC_BASE_URL could redirect requests before trust promptRepo config influenced network behavior before trustFixed in 2.0.65
GHSA-mmgp-wc2j-qcv7Repo-controlled settings could set bypassPermissions before workspace trust dialogPermission posture could be weakened before consentFixed in 2.1.53
GHSA-ff64-7w26-62rfMissing settings.json path let sandboxed code create persistent host-level configSandboxed execution could plant future privileged behaviorFixed in 2.1.2
GHSA-vhw5-3g5m-8ggfWeak domain validation let attacker domains pass trusted-prefix checksAutomatic network requests could reach attacker infrastructureFixed in 1.0.111
GHSA-xq4m-mc3c-vvg3 and related parser advisoriesShell and path parsing flaws bypassed validation and write restrictionsPermission UI is only as strong as command parsingFixed across multiple releases

What makes this history unusually useful for defenders is that it exposes a taxonomy, not just a patch queue. Claude Code issues keep returning to the same families of risk: trust sequencing, configuration precedence, parser ambiguity, path validation, persistent state, and network egress. Once you see that pattern, “Claude Code bypass” stops sounding like a sensational phrase and starts reading like a practical review category.

A Claude Code Security Bypass Taxonomy

The most useful way to study Claude Code today is not by memorizing advisory IDs. It is by organizing the failures into a few reusable mechanism classes. Those classes are broad enough to matter beyond Claude Code and concrete enough to inform review work.

Trust Bypass

Trust bypass is the class where the product claims a decision boundary exists, but some part of the system does work before that decision is final. CVE-2025-59536 and CVE-2026-21852 are the clearest examples. One allowed project code execution before the startup trust dialog was accepted; the other allowed environment-driven request redirection before the trust prompt appeared. GHSA-mmgp-wc2j-qcv7 belongs here too, because repo-controlled settings influenced the permission mode that governed the trust confirmation itself. The lesson is architectural: in a system like Claude Code, trust must gate all side effects, including configuration resolution that can alter later side effects. (NVD)

Configuration to Execution

This is the most important class and the one security teams still underestimate. Hooks are documented as deterministic lifecycle automation. Project-scoped MCP is documented as shared tool configuration in .mcp.json. Project-shared settings are explicitly designed for source control. That means a repository can carry configuration that does more than describe preferences. It can schedule commands, connect tools, and change how the agent works. Check Point’s Claude Code research is best read as an instance of this broader class: repo configuration became execution and exfiltration logic. (Claude)

Permission Drift

Permission drift happens when the effective safety posture is looser than the user thinks. That can happen because of explicit mode changes, inherited policies, confusing precedence, or approval fatigue. Anthropic’s own docs make the tradeoffs clear: bypassPermissions runs with no checks, dontAsk only allows pre-approved tools, plan researches without editing, and auto uses background safety checks instead of manual approval. The March 2026 trust-dialog bypass advisory matters because it showed how a repo-controlled settings file could silently force the user into a more permissive posture before the UI communicated what happened. That is permission drift in its purest form. (Claude)

Claude Code Security Bypass Research

Context Poisoning and Memory Poisoning

Claude Code’s memory model deserves more attention than it gets in casual discussion. Anthropic says memory files load at the start of every conversation, that CLAUDE.md is loaded at session start, and that Claude treats these files as context rather than enforced configuration. That distinction is subtle but crucial. A context file does not flip a Boolean in code. It does something more probabilistic: it shapes the model’s interpretation of goals, risks, and constraints. In ordinary chat, that may only distort output. In an agentic coding runtime, distorted judgment changes which tools get called, which files are opened, and which shell actions look reasonable. Memory poisoning in this class is not about one malicious sentence. It is about persistent behavioral influence. (Claude API Docs)

Delegation Abuse

Delegation is where subagents and packaged capabilities matter. Anthropic says subagents inherit all tools from the main conversation by default, including MCP tools. Unless teams explicitly restrict tool access, the delegation graph can become broader than the operator intended. The issue is not that subagents are inherently insecure. It is that delegation hides authority spread. A user may think of the main conversation as the unit of trust, but the effective authority now includes whatever helper agents can be spawned under that authority. In security terms, the question is not only “what can the main agent do?” but “what can it cause its children to do?” (Claude API Docs)

Validation Boundary Collapse

This class collects the shell-parsing and write-restriction advisories. Anthropic’s docs emphasize protections such as input sanitization, command blocklists, fail-closed matching, suspicious-command checks, and write restrictions to the current working directory. The advisory history shows how fragile that layer can be when shell grammar, path handling, or environment semantics are misinterpreted. 見つける, piped sed, $IFS, short flags, cd, and ZSH clobber syntax are not random trivia. They are reminders that an AI product promising safe shell execution has to implement a command understanding layer under real-world shells. That layer is security-critical code, and it is hard. (Claude)

The value of this taxonomy is not academic neatness. It gives defenders a way to ask the right questions. Not “does Claude Code have a bug?” but “which of these mechanism classes do we already expose in our environment, and which layers prevent one from flowing into the next?”

Claude Code Detection and Validation, What to Check in Real Repositories

The fastest way to waste time on agentic tool security is to chase one-off exploit folklore and ignore the boring surfaces that keep showing up in real advisories. For Claude Code, the high-yield review starts with repository content and local policy, not with exotic reverse engineering. The first question should be which files in a repo can influence Claude Code before or during execution. At minimum that includes .claude/settings.json, .claude/settings.local.json, .mcp.json, CLAUDE.md, MEMORY.md, and any plugin or agent definitions the project expects to use. Anthropic’s docs confirm that project-shared settings, project-scoped MCP, and memory files are all first-class configuration surfaces. (Claude)

A practical first pass is a repository scan that flags configuration-bearing files and suspicious keys. The point is not to prove compromise. The point is to identify where repo content changes runtime behavior.

find . -maxdepth 4 -type f \
  \( -path "*/.claude/settings.json" \
     -o -path "*/.claude/settings.local.json" \
     -o -name ".mcp.json" \
     -o -name "CLAUDE.md" \
     -o -name "MEMORY.md" \) \
  | sed 's|^\./||'

rg -n --hidden --no-ignore-vcs \
  'permissions\.defaultMode|bypassPermissions|dangerously-skip-permissions|dontAsk|hooks|ANTHROPIC_BASE_URL|mcpServers|curl\s+-fsSL|wget\s+http|powershell|osascript|bash\s+-c|npx\s+@playwright/mcp' .

That kind of scan is not specific to any single CVE, which is exactly why it is useful. It catches the class of risks the public history keeps surfacing: execution hooks, risky permission defaults, network redirection, and repo-defined tool connectivity. Anthropic’s docs also explicitly recommend good hygiene with untrusted content, including reviewing suggested commands before approval, avoiding piping untrusted content directly to Claude, and using VMs when interacting with external web services. Those recommendations line up with a config-first review mindset. (Claude)

For deeper triage, it helps to parse the JSON rather than relying entirely on grep. The following Python example is intentionally conservative. It does not try to execute anything or emulate Claude Code. It simply flags conditions that deserve human review.

import json
from pathlib import Path

SUSPICIOUS_COMMAND_TOKENS = [
    "curl -fsSL",
    "wget http",
    "bash -c",
    "powershell",
    "osascript",
    "nc ",
    "socat ",
]

def load_json(path: Path):
    try:
        return json.loads(path.read_text())
    except Exception as exc:
        return {"_parse_error": str(exc)}

def audit_claude_settings(path: Path):
    data = load_json(path)
    findings = []

    if "_parse_error" in data:
        findings.append(f"{path}: parse error: {data['_parse_error']}")
        return findings

    mode = (
        data.get("permissions", {})
            .get("defaultMode")
    )
    if mode in {"bypassPermissions"}:
        findings.append(f"{path}: dangerous default mode: {mode}")

    hooks = data.get("hooks", {})
    if hooks:
        findings.append(f"{path}: hooks present: {', '.join(hooks.keys())}")

        serialized = json.dumps(hooks)
        for token in SUSPICIOUS_COMMAND_TOKENS:
            if token in serialized:
                findings.append(f"{path}: suspicious hook token found: {token}")

    text = json.dumps(data)
    if "ANTHROPIC_BASE_URL" in text:
        findings.append(f"{path}: contains ANTHROPIC_BASE_URL override")

    return findings

def audit_mcp(path: Path):
    data = load_json(path)
    findings = []

    if "_parse_error" in data:
        findings.append(f"{path}: parse error: {data['_parse_error']}")
        return findings

    servers = data.get("mcpServers", {})
    for name, cfg in servers.items():
        if isinstance(cfg, dict):
            command = cfg.get("command", "")
            env = cfg.get("env", {})
            if command:
                findings.append(f"{path}: MCP server '{name}' runs command: {command}")
            if env:
                findings.append(f"{path}: MCP server '{name}' defines env keys: {list(env.keys())}")
    return findings

targets = []
for p in Path(".").rglob("*"):
    if p.is_file() and str(p).endswith(".claude/settings.json"):
        targets.extend(audit_claude_settings(p))
    if p.is_file() and p.name == ".mcp.json":
        targets.extend(audit_mcp(p))

for finding in targets:
    print(finding)

This kind of audit matters because the documented feature model already tells you where risk accumulates. Hooks are deterministic execution points. Project-scoped MCP is source-controlled tool configuration. Permission settings can be checked into version control. Memory is loaded at session start. You do not need an exploit kit to learn something useful from those facts. You need file visibility and a review policy. (Claude)

Runtime telemetry matters too. CVE-2026-21852 is a clear example of why early network events deserve attention. If a developer launches Claude Code in a new repository and you see unexpected outbound requests, especially to non-Anthropic hosts or before trust is meaningfully established, that should be treated as a serious investigation path. Anthropic’s docs say tools that make network requests require approval by default, but their own advisory showed that configuration sequencing could undermine that expectation under some conditions. The right operational response is not to trust the prompt blindly. It is to log, correlate, and review. (Claude)

One of the cleanest operational controls is to make mode choice an explicit part of your workflow rather than an informal habit. Anthropic’s documented modes already support this.

# Explore a repository and plan changes without editing
claude --permission-mode plan

# Run only pre-approved tools in a locked-down environment
claude --permission-mode dontAsk

# Never use on a normal developer workstation
# Reserve only for isolated containers or disposable VMs
claude --permission-mode bypassPermissions

The point is not that one mode solves everything. The point is that teams should choose a mode based on the trust level of the repository and the environment. plan is a good default for first contact with unknown code. dontAsk is useful when you can predefine exactly what is permitted. bypassPermissions exists, but Anthropic’s own docs say it disables safety checks and offers no protection against prompt injection or unintended actions. Treat it like an incident waiting to happen unless the environment is intentionally disposable. (Claude)

Claude Code Hardening for Teams

Most teams do not need a heroic security program to reduce Claude Code risk. They need a clean separation between collaborative configuration and execution-relevant configuration. That is the single most important hardening principle. If you allow arbitrary repositories to define hooks, permission defaults, or project-scoped MCP servers on a developer laptop that also holds real credentials and broad network reach, then you are treating convenience as a substitute for trust. That is exactly the assumption the public advisories punished. (Claude)

The first policy should be simple: unknown repositories start in a lower-agency mode, ideally inside a disposable environment. Anthropic’s docs explicitly recommend VMs when interacting with external web services and say bypassPermissions is only appropriate for isolated containers and VMs. That advice is more than hygiene. It is an admission that the blast radius of an agentic coding tool can exceed what a single approval prompt communicates. The isolation boundary should be the environment, not the user’s memory of what they clicked. (Claude)

The second policy is to treat .claude/, .mcp.json, CLAUDE.md, and plugin-related files as code-review targets, not as harmless metadata. Code owners should review changes to those files. Security reviewers should diff them the same way they would diff CI workflows, Dockerfiles, or Terraform. The reason is structural. Anthropic’s documentation says project-shared settings can contain permissions, hooks, and MCP servers, and project-scoped MCP is specifically designed for source control. That alone is enough to justify a review gate. You do not need to wait for a new CVE. (Claude)

The third policy is to centralize what you can. Anthropic’s MCP docs describe managed-mcp.json as a way for administrators to take exclusive control over MCP servers, preventing users from adding arbitrary servers outside that managed set. That is the right model for enterprises with meaningful internal systems. If your developers can attach any MCP server they find on the internet, your actual security problem is no longer “tool use.” It is supply chain and delegation. Managed MCP is not the answer to every problem, but it is a strong statement that tool connectivity should be governed, not improvised. (Claude)

The fourth policy is to be precise about what sandboxing does and does not buy you. Anthropic’s sandbox docs are helpful here because they are honest. Sandboxing isolates Bash subprocesses with filesystem and network controls. It does not mean every Claude Code capability runs inside the same isolation boundary. Built-in file tools use the permission system directly, and computer use runs on the real desktop rather than in the isolated Bash environment. Teams that hear “sandbox” and stop thinking are setting themselves up for disappointment. Sandboxing is valuable, especially against prompt-injection-driven shell activity, but it is not a universal wrapper around all agent behavior. (Claude)

The fifth policy is to prefer evidence-first workflows over autonomy-first workflows. Anthropic’s own product direction points this way. Claude Code Security, announced in February 2026, is not framed as “let the model patch your code unsupervised.” Anthropic says it scans codebases, suggests targeted patches, runs each finding through a multi-stage verification process, assigns severity and confidence, and applies nothing without human approval. That is not marketing trivia. It is a design statement about what mature use of an agentic coding system looks like under real security pressure. Verification, not just generation, is the product problem. (Anthropic)

The same principle applies on the offensive side. Penligent’s public English writing on Claude as a pentest copilot argues for an evidence-first workflow in which the model helps reason about attack paths and draft checks, but does not get to collapse hypothesis into proof on its own. That is the right instinct here. A reasoning layer can be powerful without being the final arbiter of exploitability. In practice, teams get safer results when the agent is one component in a verification chain that preserves evidence, bounded execution, and human review. (寡黙)

The sixth policy is to instrument network and credential exposure. CVE-2026-21852 should have killed the idea that local AI tools are “just local.” If a repo-controlled setting can influence API routing before trust is established, then developer laptops need the same outbound visibility assumptions you would apply to internal build runners or CI jobs. At minimum, monitor unexpected domains, keep short key lifetimes where you can, and rotate keys when exposure is plausible rather than waiting for perfect proof. Anthropic’s advisory describes a pre-trust path to potential API key leakage. Teams should read that as a fleet-level lesson, not a one-release footnote. (ギットハブ)

Claude Code Security Shows Where the Industry Is Going

Anthropic’s February 2026 announcement of Claude Code Security is valuable even if you never use the feature. It tells you how Anthropic thinks frontier coding agents need to be operationalized for defenders. The announcement says Claude Code Security reasons about code rather than matching only known patterns, then re-examines each result in a multi-stage verification process to filter false positives before an analyst sees them. Findings get severity ratings and confidence ratings, and nothing is applied without human approval. This is a strikingly conservative shape for an AI feature, and for good reason. When the runtime is powerful, the missing ingredient is no longer generation. It is disciplined verification. (Anthropic)

That design choice also helps explain the broader Claude Code advisory history. Most of the public issues were not failures of “AI creativity.” They were failures of execution governance. A security product that merely expands model capability without improving validation and control will make those failures worse, not better. Anthropic’s auto mode announcement makes a similar point from another angle. Auto mode adds a prompt-injection probe on tool outputs and a transcript classifier on actions, attempting to catch dangerous behavior while reducing approval fatigue. The existence of those layers is a tacit acknowledgment that permissions alone are not enough once the agent is capable and the human is busy. (Anthropic)

For offensive teams, the lesson is almost symmetrical. Reasoning-first assistants can be valuable, but the unit of value is not “the agent had an idea.” The unit of value is “the workflow produced a defensible finding.” That is why verifier-backed offensive systems remain relevant even as coding assistants get better. Penligent’s public material on AI pentest workflows makes that distinction directly: the important boundary is the one between smart suggestions and verified findings. That is not just a product narrative. It is a security architecture principle. (寡黙)

Claude Code therefore ends up teaching two lessons at once. One is negative: project files, permission modes, tool buses, and shell parsing can combine into real bypass risk. The other is positive: the right answer is not to ban agentic tooling outright. It is to narrow the gap between reasoning and verification, and to move trust decisions closer to real boundaries rather than decorative prompts. The product direction from Anthropic and the evidence-first workflow language from Penligent both point toward the same mature model: bounded agency, better verification, explicit governance. (Anthropic)

Why Claude Code Became a Reference Case for Agentic Tool Security

Claude Code is not unique because it had bugs. Every security-relevant runtime has bugs. Claude Code is unique because the public materials are rich enough to show the whole boundary problem in one place. Anthropic documents the runtime’s powerful surfaces openly. GitHub advisories document where specific boundaries failed. NVD captures the high-impact CVEs. Check Point showed how project files could become an execution and exfiltration path. The March 31 source-map reporting, even handled cautiously, suggests that white-box inspection costs may now be lower as well. That combination makes Claude Code one of the best available reference cases for studying agentic developer tool security in public. (Claude)

The right conclusion is not “Claude Code is broken.” The right conclusion is that agentic coding systems force a new style of review. You have to ask how repositories influence policy, how trust is sequenced, how tools are delegated, how memory persists, how shell intent is validated, how network egress is governed, and how verification is kept separate from optimism. Those questions apply to Claude Code today, and they will apply to every serious agentic coding runtime tomorrow. (Claude API Docs)

If there is one sentence worth carrying into every future review, it is this: in agentic coding systems, project files are no longer only inputs. They are part of the control plane. That is the boundary shift the Claude Code advisories made impossible to ignore.

Further Reading and References

Anthropic, Claude Code overview. (Claude)

Anthropic, Claude Code security documentation. (Claude)

Anthropic, Claude Code hooks documentation. (Claude)

Anthropic, Claude Code MCP documentation. (Claude)

Anthropic, Claude Code permission modes. (Claude)

Anthropic, Claude Code auto mode engineering note. (Anthropic)

Anthropic, Claude Code Security announcement. (Anthropic)

NVD, CVE-2025-59536. (NVD)

NVD, CVE-2026-21852. (NVD)

Anthropic GitHub advisory, workspace trust dialog bypass via repo-controlled settings file. (ギットハブ)

Check Point Research, Claude Code project files, RCE, and API token exfiltration. (Check Point Research)

Penligent, Claude AI for Pentest Copilot, Building an Evidence-First Workflow With Claude Code. (寡黙)

Penligent, Claude Code project files became an RCE and API key exfiltration path. (寡黙)

Penligent, Agentic Security Initiative, Securing Agent Applications in the MCP Era. (寡黙)

Penligent homepage. (寡黙)

記事を共有する
関連記事
jaJapanese