AI Agents Hacking in 2026: Defending the New Execution Boundary

AI Agents Hacking is no longer a niche phrase

“AI Agents Hacking” used to sound like a speculative phrase people used in conference talks. In 2026, it maps to real operational risk.

The reason is simple: an AI agent is not just a language model answering questions. It is a system that can read untrusted content, call tools, use credentials, store state, and take actions. That combination turns ordinary application security mistakes into amplified failures. NIST describes a closely related risk category as agent hijacking, a form of indirect prompt injection where malicious instructions are hidden in data an agent consumes and can push the agent toward unintended actions.(NIST) OpenAI’s own agent-building safety guidance similarly calls prompt injection “common and dangerous,” and explicitly warns that even with mitigations, agents can still make mistakes or be tricked.(OpenAI Developers)

There is also a practical content reason this keyword has become high intent: people searching for “AI Agents Hacking” are usually not looking for a generic “AI security 101” article. They want answers to one or more of these questions:

How do agents get hijacked in real workflows?
What are the highest-risk attack paths?
Which CVEs actually matter in agent ecosystems?
How should we harden agent runtimes and tool use?
How do we verify a fix beyond “we patched it”?

I can’t directly access proprietary clickstream/CTR datasets from search engines or publishers. So for this article, I used a transparent proxy method: exact-match relevance, recurrence in top search results, recency of coverage, and presence across reputable vendor and media sources. By that measure, the Check Point page titled “AI Agents Hacking” is one of the strongest direct-match pages currently ranking for the phrase, and it reflects the mainstream framing well: agents are valuable, autonomous, and increasingly attractive targets because they combine sensitive data access with actionability.(checkpoint.com)

That framing is useful, but it is not enough for a serious security team. The missing piece is engineering reality: what exactly breaks first, what compounds the blast radius, and how to build controls that survive contact with adversarial inputs.

This article focuses on that layer.

The security boundary moved, and many teams are still defending the old one

A traditional LLM application might be dangerous if it leaks data or generates unsafe text. An agentic system can be dangerous because it can do things.

That distinction sounds obvious, but it changes everything about the threat model.

Microsoft’s 2026 OpenClaw guidance captures this shift unusually well. Their security research team describes how self-hosted agent runtimes can ingest untrusted text, download and execute skills from external sources, and act using assigned credentials, effectively shifting the execution boundary away from static application code and toward dynamically supplied content plus third-party capabilities.(Microsoft) They also break the boundary into three practical components—identity, execution, and persistence—which is a much more useful defensive model than generic “AI risk” language.(Microsoft)

That model is worth generalizing across agent ecosystems:

Identity boundary: Which tokens, API keys, OAuth grants, and cloud roles can the agent use?
Execution boundary: Which tools, shells, APIs, filesystems, browsers, and runtimes can it invoke?
Persistence boundary: What memory, config, schedules, task queues, or state stores can it modify across runs?
Instruction boundary: What untrusted content can influence planning and tool selection?
Supply-chain boundary: What code, plugins, skills, extensions, or models can be installed or loaded?

If your team is still mainly asking “Did we patch the app?” while ignoring those five boundaries, you are defending the wrong system.

Try AI Agent Hacking >>

What “AI Agents Hacking” usually means in practice

The phrase gets used loosely, so it helps to split it into two categories:

Agents as targets

This is the scenario most defenders care about: attackers hijack or abuse an agent system to exfiltrate data, manipulate workflows, steal credentials, or execute code.

Örnekler şunları içerir:

indirect prompt injection in content the agent reads
malicious plugins/skills/extensions
exposed control interfaces
weak runtime isolation
token theft from config/session stores
browser-to-backend chaining in web-based agent platforms

Agents as offensive tooling

This is the other side: attackers use agents to automate scanning, exploitation attempts, reconnaissance, or social engineering. Wiz’s 2026 comparison of AI agents vs humans in web hacking challenges is a good reality check: agents showed strong capability in focused, testable scenarios (solving 9 out of 10 challenges in their setup), but performance degraded in broader, more realistic contexts where prioritization and strategic pivots mattered.(wiz.io)

For defenders, both categories matter. But the more urgent enterprise problem today is still the first one: you deployed an agent to improve productivity, and it quietly became a new privileged execution surface.

The attack chain view is more useful than the buzzword view

Many AI security articles list “prompt injection, data poisoning, adversarial attacks, supply chain” as if they are independent boxes. In real incidents, they chain.

A more operational way to think about AI Agents Hacking is as a sequence:

Influence the agent (prompt injection, poisoned input, deceptive UI, malicious content)
Authorize action (agent already has credentials or broad scopes)
Yürütmek through tools (shell/API/file/browser)
Devam et changes (memory/config/scheduled task)
Expand via supply chain or adjacent systems
Cover tracks inside noisy automation traffic

This is why the problem feels new even when individual techniques are old. Cline’s February 2026 npm publish incident is a strong example. The final published payload was benign and the impact was limited, but the underlying chain—AI-powered triage workflow processing untrusted issue input, shell access in CI, cache poisoning path, and eventual publication credential abuse—demonstrated how familiar attack primitives become more dangerous when AI automation is inserted into privileged pipelines.(Cline)

That combination effect is the story of AI Agents Hacking in 2026.

Prompt injection is still the center of gravity, but the “why” matters more than the phrase

The industry has repeated “prompt injection” so often that the term risks becoming background noise. For agent systems, it should not be treated as a buzzword. It is a structural design issue.

OpenAI’s agent safety guidance is blunt: prompt injection happens when untrusted text or data enters an AI system and malicious content attempts to override instructions, and the outcome can include private data exfiltration via tool calls or other unintended actions.(OpenAI Developers) The same page also warns that agents won’t be perfect and that builders should be careful about what access they grant.(OpenAI Developers)

NIST’s description of agent hijacking goes one level deeper. It frames the issue as the latest version of an old security problem: lack of clear separation between trusted instructions and untrusted data. In agent systems, the attacker hides malicious instructions in something that looks like normal data the agent would consume (email, file, website), and the agent can then be “hijacked” into harmful behavior.(NIST)

That framing is important because it shifts the conversation from “Can we block all bad prompts?” to “Where are we letting untrusted data become control input?”

Why agents make prompt injection materially worse than chatbots

A chatbot compromised by prompt injection may produce a bad answer.

An agent compromised by prompt injection may:

read and exfiltrate private records
trigger money movement or approvals
post messages as a trusted identity
modify repos/configs/infrastructure
plant persistent instructions into memory/state
call tools that transform a text mistake into a systems incident

Microsoft’s OpenClaw guidance highlights this compounding effect: self-hosted agents combine untrusted code and untrusted instructions into a single execution loop running with valid credentials.(Microsoft) That line should be printed and taped to every internal AI platform team’s wall.

Direct vs indirect prompt injection is not just a taxonomy detail

Security teams usually understand direct prompt injection quickly (“the user typed malicious instructions”). Indirect prompt injection remains under-modeled, even though it is often the higher enterprise risk.

Why? Because indirect injection hides inside normal business data flows:

email threads
ticket comments
docs and wikis
web pages
attachments
scraped content
“helpful” instructions in plugin/skill READMEs

Once the agent is allowed to browse or fetch content autonomously, the data layer becomes part of the control plane.

This is also why “sanitize inputs” alone is not a sufficient strategy. You need routing, isolation, approvals, schema constraints, and privilege scoping.

Real-world signal in 2025–2026: the ecosystem is learning the hard way

The most useful way to understand the trend is to look at incidents and vendor guidance together—not because any one event “proves everything,” but because the patterns repeat.

OpenClaw made the risk legible to non-specialists

OpenClaw became a focal point in 2026 not just because of technical risk, but because it made a broader architectural issue visible: self-hosted agent runtimes can be treated casually by users while actually behaving more like privileged automation platforms.

Microsoft’s February 2026 post explicitly warns that OpenClaw includes limited built-in security controls and should be treated as untrusted code execution with persistent credentials, recommending isolated environments, dedicated non-privileged credentials, and continuous monitoring with a rebuild plan if organizations still choose to evaluate it.(Microsoft)

That is not “AI panic.” It is a mature ops framing: assume compromise is possible, design for containment and recovery.

Penligent’s own OpenClaw analysis page (which is highly relevant to this keyword cluster) also frames the problem in practical terms—skills marketplace supply-chain risk, indirect prompt injection, token/config theft, and patch-lag reality in self-hosted deployments—rather than as a single-bug story.(Penligent)

The Cline npm incident mattered because the chain mattered

Cline’s post-mortem states that an unauthorized party used a compromised npm publish token to publish [email protected], adding a postinstall script that installed OpenClaw globally, while the CLI binary and other contents were unchanged and no malicious code was delivered. The exposure window was approximately eight hours, and the incident was limited to the Cline CLI npm package.(Cline)

That limited impact is good news.

The bad news is the underlying path. Cline also documented that an AI-powered issue triage workflow had shell access, processed untrusted issue input, and enabled a prompt injection path to code execution on GitHub Actions runners, which was then chained with cache poisoning and token handling failures.(Cline) The post-mortem’s own conclusion is the right one: broad tool access for AI agents in CI/CD creates new attack surfaces, and “giving an LLM shell access in a CI context where it processes untrusted input is functionally equivalent to giving every GitHub user shell access.”(Cline)

That sentence should also become a design rule.

High-impact CVEs that matter for AI agent and AI workflow environments

Security teams need a way to triage “AI-related” CVEs without drowning in noise. The standard I use is simple:

Does it affect a platform commonly used to build or host agentic workflows?
Does it create a path to code execution, token theft, account takeover, or privileged tool misuse?
Can it chain with normal agent capabilities (tools, memory, connectors, browser, model endpoints)?
Does it break trust boundaries rather than just a minor UI issue?

By that standard, the following CVEs deserve attention.

AI Agent Hacking

Try Agentic Hacker >>

CVE-2025-3248 in Langflow: unauthenticated code injection to RCE

NVD describes CVE-2025-3248 as a code injection issue in Langflow versions prior to 1.3.0 affecting /api/v1/validate/code, where a remote unauthenticated attacker can send crafted HTTP requests to execute arbitrary code. The listed CVSS v3.1 score from the CNA is 9.8 (Critical).(NVD)

Why this matters in the AI Agents Hacking context:

Langflow is not “just another app”; it is often part of AI workflow and agent orchestration environments.
A code-validation endpoint becoming unauthenticated RCE collapses the distinction between “workflow builder” and “runtime compromise.”
Teams often expose these systems internally first, then accidentally broaden access during demos or integrations.

Even after patching, this kind of issue demands post-patch verification:

confirm route access controls
validate no alternative code paths remain exposed
rotate secrets on compromised hosts
review persisted flows/configs for tampering

CVE-2025-34291 in Langflow: CORS + token cookie configuration chain to account takeover and RCE

NVD describes CVE-2025-34291 as a chained vulnerability in Langflow (up to and including 1.6.9) where permissive CORS configuration combined with a refresh token cookie configured as SameSite=None can allow a malicious webpage to perform credentialed cross-origin requests, obtain fresh tokens, and then access authenticated endpoints—including built-in code-execution functionality—leading to arbitrary code execution and full system compromise.(NVD)

This is exactly the kind of vulnerability many teams underestimate because it looks like “web config hygiene.” In an AI workflow platform, “web config hygiene” can become “backend code execution with agent credentials.”

Why it is high leverage:

It chains browser trust and backend execution trust.
It exploits default assumptions teams make while enabling integrations.
It can be triggered from a malicious origin under user interaction conditions, which is common in admin/operator workflows.

CVE-2025-64496 in Open WebUI: malicious model server to browser token theft and backend RCE chain

NVD describes CVE-2025-64496 in Open WebUI (versions 0.6.224 and prior) as a code injection vulnerability in the Direct Connections feature where malicious external model servers can execute arbitrary JavaScript in victim browsers via SSE execute events, leading to authentication token theft, complete account takeover, and, when chained with the Functions API, remote code execution on the backend server.(NVD)

This one is especially important for AI Agents Hacking because it illustrates a recurring pattern:

teams trust “model endpoints” more than they should
browser-side compromise can steal tokens from operator/admin sessions
backend “functions” or tool APIs convert stolen trust into system compromise

The NVD description notes attack preconditions (Direct Connections disabled by default; admin must add a malicious model URL), which matters operationally because it shows this is not purely remote worm behavior—it is workflow + trust + social engineering.(NVD)

That is exactly what many real agent ecosystem compromises look like.

The emerging pattern: AI agent incidents are often chain failures, not single failures

If you step back from individual incidents and CVEs, the repeated architecture pattern looks like this:

untrusted input is treated as benign data
agent/tool runtime has more privilege than the business task requires
boundaries between planning, execution, and persistence are weak
supply-chain and runtime controls are inconsistent
teams patch the app but don’t verify the workflow

This is why purely model-centric thinking fails. The model is only one component in the attack path.

NIST’s agent hijacking work and evaluation emphasis is useful here because it pushes teams toward measurement and adaptive testing, not static assumptions. Their 2025 technical blog stresses that evaluations need to be continuously improved and adaptive, and that multiple attempts and task-specific analysis matter when assessing risk.(NIST) That is exactly how real attackers behave.

Try AI Agent Hacking >>

A practical defense model for AI Agents Hacking

You do not need a perfect defense to improve outcomes. You need to stop designing agent systems as if they were harmless chat UIs.

The highest ROI approach is boundary engineering: define, constrain, monitor, and routinely test the trust boundaries that matter.

1) Identity: assume agent credentials will be targeted

Agents are valuable because they can act. That means their credentials are valuable too.

Priorities:

dedicated identities per agent/workflow
least privilege scopes
short-lived tokens when possible
explicit approval for sensitive OAuth scopes
rotation playbooks that actually verify revocation (not just dashboard checks)

Cline’s post-mortem is a good reminder that token rotation can fail operationally even when the team believes it succeeded. They describe deleting the wrong token and relying on a management view that did not surface the exposed one.(Cline)

Design implication: treat credential rotation as a tested control, not a checklist item.

2) Execution: isolate the runtime like it’s untrusted automation (because it is)

Microsoft’s OpenClaw guidance repeatedly recommends isolated execution (dedicated VM or separate device), dedicated credentials, and a rebuild plan.(Microsoft) That advice generalizes well to self-hosted agent runtimes and internal pilot environments.

Practical baseline:

run agents in dedicated VMs or hardened containers
avoid running on primary user workstations
mount only required directories
deny host networking by default where possible
enforce outbound egress allowlists
keep environments disposable and rebuildable

3) Instruction boundary: never let untrusted data sit in privileged instruction channels

OpenAI’s agent safety guidance explicitly warns not to use untrusted variables in developer messages, because those messages have higher precedence and give attackers maximum control if contaminated.(OpenAI Developers)

This is one of the most common implementation mistakes in custom agent systems:

developers interpolate fetched content or user-provided strings into “system/developer” instructions for convenience
the agent becomes dramatically easier to steer

Safer pattern:

keep untrusted content in lower-priority channels
extract structured fields first
validate schemas before passing downstream
require approval for high-risk tool invocations

4) Tool boundary: treat tool calls as transaction boundaries, not helper functions

OpenAI also recommends keeping tool approvals on and enabling user review/confirmation for operations, including reads and writes, especially with MCP tools.(OpenAI Developers)

In enterprise settings, this should be translated into a control policy:

low-risk reads may be auto-approved
sensitive reads require explicit approval or policy checks
all state-changing writes require approval, dry-run preview, or step-up authentication
privileged shell/network tools are never directly reachable from untrusted content paths

5) Persistence boundary: memory is a security surface

Microsoft’s OpenClaw post highlights state and memory manipulation risk, and recommends regular review of saved instructions/state, backups for rapid rebuild, and treating rebuild as an expected control if anomalous behavior appears.(Microsoft)

This is one of the most under-defended parts of agent operations.

If an attacker can influence:

saved instructions
trusted sources lists
task schedules
plugin settings
execution defaults

they may not need repeated access. They can convert a one-time injection into durable behavior drift.

6) Supply chain: plugins, skills, and extensions are code, even when they look like content

Agent ecosystems often blur the line between “configuration,” “instructions,” and “code.” That is exactly why they are attractive targets.

Practical controls:

signed packages and provenance (OIDC attestations where possible)
registry allowlists and approved publisher lists
no direct install from arbitrary URLs in production
sandbox untrusted extensions
review installation instructions/README steps as executable risk, not documentation only

Cline’s response is instructive here too: they moved npm publishing to OIDC provenance and removed long-lived npm tokens after the incident.(Cline) That kind of supply-chain hardening matters well beyond one product.

AI Agent Hacking

Try AI Agent Hacking >>

A threat-oriented control matrix you can actually use

Attack surface	Typical attacker goal	What usually fails first	High-ROI controls	What to verify after patch/change
Indirect prompt injection (docs/web/email)	Steer tool use, exfiltrate data	Untrusted content reaches planning/tool path	Channel separation, schema extraction, tool approvals, least privilege	Replay adversarial content and confirm tool calls are blocked or downgraded
Agent runtime on workstation	Token theft, host compromise	Over-privileged execution environment	Dedicated VM/container, non-sensitive data, disposable runtime	Confirm no access to primary creds/home dirs; test rebuild process
Plugin/skill/extension supply chain	Code execution, persistence	“Convenience installs” from untrusted sources	Approved registries, provenance, publisher allowlists, sandboxing	Validate install provenance and runtime behavior of approved extensions
Web UI + backend functions chain	Session theft, account takeover, backend RCE	Browser trust and API trust are coupled	CSP/session hardening, origin restrictions, function authz, token scope reduction	Simulate malicious origin/model server path and confirm chain breaks
CI/CD automation with AI tooling	Secret theft, unauthorized publish	AI workflow has shell + untrusted input + secret adjacency	No shell in untrusted triage flows, secret isolation, cache hardening, OIDC publish	Red-team workflow with hostile issues/PRs and verify no privileged execution path
Agent memory/state	Persistent behavior manipulation	State changes are invisible or unaudited	State diffs, signing, review workflows, backup + restore, anomaly detection	Inspect diffs across runs; test rollback and clean rebuild

Code examples for defenders

The examples below are intentionally defensive. They are meant to help teams design controls and detection—not to reproduce exploitation.

Example 1: Policy gate for agent tool calls

This pattern separates planning from authorization and forces explicit checks at the point where text becomes action.

from dataclasses import dataclass
from typing import Dict, Any, List

@dataclass
class ToolCall:
    name: str
    args: Dict[str, Any]
    source: str              # e.g., "user", "email_fetch", "web_content", "memory"
    risk_level: str          # "low" | "medium" | "high"
    modifies_state: bool

ALLOWED_READ_TOOLS = {"search_docs", "read_ticket", "list_repo_files"}
ALLOWED_WRITE_TOOLS = {"create_draft_reply", "open_change_request"}
DENY_FROM_UNTRUSTED_SOURCES = {"shell_exec", "run_sql_write", "send_payment", "rotate_keys"}

def approve_tool_call(call: ToolCall, user_role: str, has_human_approval: bool) -> bool:
    # 1) Never allow certain tools when the plan is influenced by untrusted content
    if call.source in {"web_content", "email_fetch", "attachment_text"} and call.name in DENY_FROM_UNTRUSTED_SOURCES:
        return False

    # 2) Read-only tools can be auto-approved if explicitly allowlisted
    if not call.modifies_state and call.name in ALLOWED_READ_TOOLS and call.risk_level == "low":
        return True

    # 3) State-changing actions require human approval
    if call.modifies_state and call.name in ALLOWED_WRITE_TOOLS:
        return has_human_approval

    # 4) High-risk tools require human approval + privileged role
    if call.risk_level == "high":
        return has_human_approval and user_role in {"admin", "security_operator"}

    # Default deny
    return False

Why this matters: it encodes a core principle from modern agent safety guidance—untrusted text should not directly drive privileged tool use.(OpenAI Developers)

Example 2: KQL hunting patterns for agent runtimes and suspicious child processes

Microsoft’s OpenClaw guidance includes concrete hunting queries for discovering agent runtimes, ClawHub skill installs, unexpected listeners, and suspicious child processes. The exact field availability depends on your telemetry stack, but the shape is broadly reusable.(Microsoft)

A simplified adaptation:

// Discover likely agent runtimes by process names/commands
DeviceProcessEvents
| where Timestamp > ago(14d)
| where ProcessCommandLine has_any ("openclaw","moltbot","clawdbot","agent runtime")
   or FileName has_any ("openclaw","moltbot","clawdbot")
| project Timestamp, DeviceName, InitiatingProcessAccountName, FileName, ProcessCommandLine
| order by Timestamp desc

// Agent runtimes spawning shells or download tools (high-signal triage pattern)
DeviceProcessEvents
| where Timestamp > ago(14d)
| where InitiatingProcessFileName has_any ("openclaw","moltbot","clawdbot")
   or InitiatingProcessCommandLine has_any ("openclaw","moltbot","clawdbot")
| where FileName in ("cmd.exe","powershell.exe","pwsh.exe","bash","sh","curl","wget")
| project Timestamp, DeviceName,
          Parent=InitiatingProcessFileName,
          AccountName=InitiatingProcessAccountName,
          FileName, ProcessCommandLine
| order by Timestamp desc

// Unexpected listening ports opened by agent-associated processes
DeviceNetworkEvents
| where Timestamp > ago(14d)
| where ActionType == "ListeningConnectionCreated"
| where InitiatingProcessCommandLine has_any ("openclaw","moltbot","clawdbot")
   or InitiatingProcessFileName has_any ("openclaw","moltbot","clawdbot")
| summarize FirstSeen=min(Timestamp), LastSeen=max(Timestamp), Ports=make_set(LocalPort)
  by DeviceName, InitiatingProcessFileName, InitiatingProcessAccountName, LocalIP
| order by LastSeen desc

These aren’t “AI detections.” They are boundary detections:

what is running,
under which identity,
what it spawned,
what it exposed.

That’s the level where incident response gets traction.

Example 3: Egress allowlisting for agent-to-tool and agent-to-model traffic

If your agent can call arbitrary URLs, you’ve already lost too much control. An egress allowlist is one of the highest-ROI controls for self-hosted agents and internal automation services.

A minimal reverse-proxy style example (conceptual NGINX pattern):

# Conceptual example: only allow agent traffic to approved domains
map $http_host $agent_upstream_allowed {
    default 0;
    api.company.internal 1;
    models.company.internal 1;
    github.enterprise.local 1;
    tickets.company.internal 1;
}

server {
    listen 443 ssl;
    server_name agent-egress-proxy.internal;

    location / {
        if ($agent_upstream_allowed = 0) {
            return 403;
        }

        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header X-Agent-Egress "approved";
        proxy_pass https://$http_host$request_uri;
    }
}

In production, pair this with:

DNS controls
TLS validation/pinning where appropriate
logging of destination, identity, and tool context
change management for allowlist additions

This won’t stop every indirect prompt injection attempt. It will dramatically reduce what a successful injection can do.

AI Agent Hacking

Try AI Hacker Tool >>

A better way to think about “fixes”: patches are necessary, but proof matters

One of the most dangerous habits in AI workflow security is declaring success too early.

A patch may remove one vulnerable function. It may not:

revoke stolen tokens
clear poisoned state
undo malicious memory changes
remove risky scopes
fix unsafe runtime placement
prevent the same attack path from reappearing via a different tool

NIST’s emphasis on adaptive evaluations and repeated testing is useful here, because it pushes teams toward evidence over assumptions.(NIST) OpenAI’s guidance on trace graders/evals for agent workflows points in the same direction from the builder side.(OpenAI Developers)

For security teams, that means the bar should be:

“We changed the code” ve
“We replayed the abuse path” ve
“We confirmed the agent can no longer reach the harmful action with the same trust conditions” ve
“We checked adjacent boundaries (identity, tools, persistence, egress)”

That is the difference between confidence and verification.

Penligent’s current product messaging emphasizes reproducible proof, evidence-first findings, and CVE exploit workflows for validation-oriented testing. The homepage language highlights “Find Vulnerabilities. Verify Findings. Execute Exploits,” plus “Evidence-First Results You Can Reproduce,” which maps well to the post-patch verification problem in agent and workflow ecosystems.(Penligent)

That matters because many AI-agent incidents are not just code bugs; they are boundary failures that require testing the actual reachable path in the deployed environment. A platform like Penligent can be useful in the validation layer—confirming whether a supposed fix actually blocks exploitation paths and whether risky defaults, exposed services, or chained conditions still produce real impact—rather than replacing core governance, identity engineering, or runtime hardening.(Penligent)

Penligent’s OpenClaw-focused article is also a good example of the right analytical angle for this topic: it treats agent risk as a combination of exposure, skills supply chain, prompt injection, token theft, and patch-lag, instead of collapsing everything into a single CVE narrative.(Penligent)

Final take

The biggest mistake in AI Agents Hacking discussions is treating the problem as either hype or just “prompt injection.”

It is neither.

What is actually happening is more concrete:

organizations are deploying systems that combine untrusted input, real permissions, and automated execution
attackers are exploiting the seams between those layers
defenders are learning that patching code is necessary but insufficient unless the workflow itself is verified under adversarial conditions

The security question in 2026 is not whether agents can be attacked. They can.

The real question is whether your environment is designed so that a successful attack becomes:

a contained incident,
a visible incident,
and a recoverable incident.

That is the difference between an AI-enabled workflow and an AI-enabled breach.

Referanslar

NIST: Strengthening AI Agent Hijacking Evaluations (agent hijacking definitions and evaluation guidance) (NIST)
OpenAI API Docs: Safety in Building Agents (prompt injection, tool approvals, structured outputs, evals) (OpenAI Developers)
Microsoft Security Blog: Running OpenClaw safely (identity, isolation, runtime risk, hunting guidance) (Microsoft)
NVD: CVE-2025-3248 (Langflow) (NVD)
NVD: CVE-2025-34291 (Langflow) (NVD)
NVD: CVE-2025-64496 (Open WebUI) (NVD)
Cline Post-mortem: Unauthorized Cline CLI npm publish (Feb 2026) (Cline)
Wiz: AI Agents vs Humans in Web Hacking (2026) (wiz.io)
Check Point: AI Agents Hacking (checkpoint.com)
Penligent HackingLabs (Penligent)
People giving OpenClaw root access to their entire life (Penligent)
Penligent Homepage (Penligent)
The Verge
The Times of India

Gönderiyi paylaş:

İlgili Yazılar

CVE-2023-43208 and the Patch Bypass That Turned Mirth Connect Into a High-Priority Internet-Facing RCE

CVE-2023-43208 is one of those vulnerabilities that keeps showing up in security operations conversations long after initial disclosure, not because

Daha fazla bilgi edinin

People giving OpenClaw root access to their entire life

The uncomfortable truth behind “root access” Calling it “root access” sounds dramatic—until you map what modern agentic tools actually hold.

Daha fazla bilgi edinin