AI is already inside email filtering, anomaly detection, behavioral analytics, analyst copilots, model infrastructure, and increasingly in agents that can retrieve data, call tools, touch tickets, write queries, browse the web, and trigger actions. That means “AI in cyber security” is no longer one conversation. It is at least three: using AI to support defense, defending against attackers who are using AI, and securing AI systems themselves as operational targets. NIST’s emerging Cyber AI Profile makes that split explicit by framing the space around securing AI systems, using AI to support cyber defense, and building resilience against AI-enabled threats. (NIST出版物)
That framing matters because a large amount of public writing still collapses everything into a single slogan. The result is confusion. A security team evaluating an incident-triage copilot is solving a different problem from a team trying to harden a retrieval-augmented agent with browser access, and both are solving a different problem from a red team asking whether a model can accelerate recon or reduce time to a reproducible proof. Treating those as the same category produces shallow strategy, shallow testing, and shallow buying decisions. (NIST)
The short-term threat picture is also more nuanced than either the most optimistic vendor pitch or the most dramatic doom-post. In early official assessments, the UK NCSC argued that AI would almost certainly increase the volume and impact of cyber attacks, especially through reconnaissance and social engineering, while noting that the uplift would be uneven and that more advanced offensive use would remain constrained by data, expertise, and resources in the near term. By late 2025 and early 2026, Google Threat Intelligence Group reported a more operational phase of abuse, including broader use of AI across the attack lifecycle, increased experimentation by threat actors, and the appearance of novel AI-enabled malware in active operations. Those two positions are not contradictory. Taken together, they suggest a progression from efficiency gains to selective operational integration rather than an overnight collapse of the offense-defense balance. (NCSC)
A serious article on AI in cyber security therefore has to do more than praise automation or warn about prompt injection. It has to answer four harder questions. Which security tasks are actually improved by AI right now. Which attack paths become more dangerous when AI is inserted into the workflow. Which parts of an AI system deserve the same scrutiny as identity, cloud, and application infrastructure. And how to design controls so that even when an AI component is manipulated, the blast radius stays small. That is the operational center of gravity in 2026. (NIST)
AI in cyber security is not new, but the execution boundary changed
Security has used machine learning for years. Spam filtering, anomaly detection, fraud scoring, clustering, classification, and malware labeling all predate the current generative wave. What changed is not the existence of AI in security but the combination of three shifts: natural-language interfaces for analysts, much stronger general-purpose reasoning over text and code, and action-capable systems that can connect model outputs to tools and side effects. (NIST)
That creates four practical layers. The first layer is classic statistical detection, where models score or classify events. The second is analyst augmentation, where a model summarizes cases, explains logs, drafts rules, or translates natural language into queries. The third is orchestration, where AI helps correlate, prioritize, and recommend actions across products and workflows. The fourth is agentic execution, where the system preserves state, chooses tools, handles multi-step plans, and can alter systems or data if given permission. The risk profile changes sharply as you move down that stack, because the issue stops being “can the model explain this alert” and becomes “what can the system do if the model is wrong, manipulated, or over-trusted.” (マイクロソフト学習)
NIST’s Generative AI Profile captures part of this transition in a concise way. It notes that generative AI can lower barriers for offensive activity while also expanding the available attack surface, including exposure to prompt injection and data poisoning. MITRE’s SAFE-AI report makes a related point from the defender’s side: AI-enabled systems are not just traditional IT with a model bolted on, because they introduce distinct attack surfaces, dependencies, and failure modes that existing assessment habits may miss. (NIST出版物)
The cleanest way to think about the field is to separate what AI does for defenders from what it does for attackers, then to add a third plane for the AI systems themselves. That is where most teams still underinvest. They may buy or build AI features without treating model runtimes, retrieval pipelines, tool layers, connectors, prompt stores, experiment trackers, or local inference services as first-class security subjects. That is the gap where avoidable incidents keep forming. (NCSC)

A practical map of the field
| Domain | Typical examples | Main benefit | Main risk |
|---|---|---|---|
| AI for defense | Alert triage, phishing detection, case summarization, query generation, rule drafting | Analyst speed, prioritization, reduced toil | Over-trust, silent errors, schema hallucination |
| AI for attackers | Recon assistance, phishing content generation, malware iteration, data analysis | Lower cost, faster iteration, wider scale | Higher quality social engineering, faster targeting |
| Security for AI | Prompt injection defense, tool governance, model infrastructure hardening, memory and retrieval controls | Reduced blast radius, better resilience | New attack surfaces, hidden privilege paths, hard-to-audit behavior |
This table is a synthesis, but it closely matches the way NIST, NCSC, MITRE, OWASP, and current threat-intelligence reporting describe the problem space. (NIST出版物)
Where AI already earns its keep in security operations
The least controversial value of AI in cyber security is not autonomous action. It is compression. Security teams live inside enormous volumes of repetitive text, repetitive telemetry, repetitive tickets, repetitive enrichment, and repetitive translation between one tool’s language and another’s. AI helps when it reduces time spent getting to the next defensible human judgment. Microsoft’s public documentation for Security Copilot emphasizes end-to-end support for incident response, threat hunting, intelligence gathering, and posture work. CrowdStrike describes AI’s role in tasks ranging from threat detection to proactive defense. Those are broad categories, but they align with how real SOC teams get immediate value: not from letting a model decide everything, but from using a model to shrink the time between signal and analyst understanding. (マイクロソフト学習)
Email security is a useful example because it shows both maturity and limits. Google Workspace states that its AI defenses in Gmail block more than 99.9 percent of spam, phishing, and malware. That is the productized, long-running side of AI in cyber security: classification at scale, backed by telemetry, feedback loops, and mature enforcement. The important lesson is not the number on one product page. It is that the highest-value production use cases tend to be those where inputs are well-observed, outputs are bounded, and the final action can be attached to long-developed control planes such as email quarantine, account protection, and policy enforcement. (Google Workspace)
The same logic applies to alert triage and case summarization. A model that can explain why an unusual sign-in coincided with a mailbox rule change, an OAuth grant, and an outbound data transfer saves real analyst time. A model that can summarize the last 300 lines of an EDR investigation or turn a threat-intelligence narrative into a concrete hunt hypothesis can also be useful. But the use case remains strongest when the model is narrowing search space, not finalizing facts that will trigger destructive action. That distinction becomes especially important when the system is looking at incomplete data or vendor-specific schemas it has not been explicitly grounded on. (マイクロソフト学習)
Security teams should therefore rank candidate AI uses by five properties. Is the task repetitive. Is the evidence observable. Is the output checkable. Is the action reversible. And can the decision be audited after the fact. When those answers are mostly yes, AI tends to work well. When they drift toward intuition-heavy, business-context-heavy, or irreversible actions, the model should stay in an assistive role or be wrapped in strong approval and policy layers. That principle is more durable than any particular model benchmark. (NIST)
Tasks where AI is usually useful and tasks where it is risky to over-trust
| Task | Why AI often helps | Failure mode | Human review needed |
|---|---|---|---|
| Alert deduplication and clustering | Pattern recognition over repetitive events | Missed environmental nuance | はい |
| Case summarization | Compresses long tickets and logs | Omits decisive detail | はい |
| Query and rule drafting | Good at translating intent to syntax | Invented fields, wrong logic, overbroad filters | はい |
| Threat-intel enrichment | Good at extracting entities and timelines | Weak source weighting, false confidence | はい |
| Autonomous containment | High potential speed benefit | Wrong asset, wrong account, cascading outage | Strongly yes |
| Business-logic vulnerability discovery | Needs deep system context | Confident but shallow reasoning | Strongly yes |
This is a judgment table rather than a standard, but it follows directly from the behavior of current systems and the control recommendations in major guidance. (NCSC)
Why phishing and social engineering moved first
If a reader wants the shortest answer to where AI changes the offensive side of cyber security fastest, the answer is social engineering. NCSC’s assessments repeatedly emphasize that AI provides strong uplift in reconnaissance and social engineering. That is intuitive for technical reasons. Language generation is cheap, variation is cheap, translation is cheap, and personalization is cheap once a threat actor can collect public or stolen context. The old tells of poor spelling and awkward phrasing are much less reliable signals now. (NCSC)
NIST’s Generative AI Profile adds a second layer to that concern. It notes that generative AI can ease the deliberate production and dissemination of false or misleading information at scale, enable more sophisticated disinformation targeted at specific demographics, and support realistic deepfakes and synthetic media. Even subtle manipulations of text and images can affect human and machine perception. In security terms, that matters because credential theft, impersonation, fraud escalation, and approval hijacking rarely require a perfect fake. They require a believable enough interaction at the right moment. (NIST出版物)
There is also a timing issue. Social engineering benefits immediately from AI because it does not require deep target execution or zero-day quality exploit development. It benefits from speed, volume, and adaptation. Threat actors can rewrite lures for specific regions, summarize victim profiles faster, turn breach data into tailored scripts, and generate many variations that erode static detection. Google Threat Intelligence Group’s 2025 and 2026 reporting is consistent with that view, repeatedly placing reconnaissance, social engineering, and malware-development support among the areas where AI integration is growing. (グーグル・クラウド)
Defenders, however, are not standing still. This is one of the places where AI on defense may outpace AI on offense because defenders own the telemetry, the control points, and the policy infrastructure. NCSC noted in late 2024 that AI applied to cyber defense may exceed the uplift in adversary capability or application. That is not a guarantee of defender advantage, but it is a useful corrective to the lazy assumption that every AI improvement automatically benefits attackers more. The reality depends on who owns the data, who controls enforcement, and who can measure outcomes. (NCSC)

Detection engineering gets faster, but not magically correct
A large share of security toil is linguistic. Analysts rewrite incident notes. Engineers translate plain-language hypotheses into KQL, Splunk, SQL, YARA, Sigma, or vendor-specific rule formats. Threat hunters read reports, normalize terms, and then map them into environment-specific hunts. AI can reduce the friction across those translations, which is why query generation and rule drafting are among the first practical wins. (マイクロソフト学習)
The failure mode is obvious to anyone who has used these systems seriously. A model can produce a well-formed query against the wrong field, or a rule that looks plausible while encoding a bad assumption about parent-child process structure, tenancy boundaries, or product-specific schemas. AI does not remove the need for detection engineering discipline. It raises the speed ceiling, but it does not eliminate validation. A drafted rule still has to be tested against real logs, negative controls, noisy edge cases, and the organization’s own naming conventions and ingestion quirks. (CrowdStrike)
That is why the correct question is not “can AI write detections.” It is “can AI accelerate an engineering loop that still includes schema verification, controlled test data, false-positive review, and production rollback.” Teams that skip that loop usually discover that the model’s biggest strength is also its biggest risk: it can produce fluent technical text much faster than a human can disprove it. (NIST)
A related shift is that AI workflows themselves now belong inside the detection surface. If an internal agent reads external content, writes to a cache, opens a browser, or executes a helper tool, those transitions are observable and should be monitored. One useful detection pattern is to look for suspicious sequencing rather than a single bad string. For example, content retrieval followed by shell invocation, file-system writes into sensitive paths, or requests to metadata endpoints may say more than any single prompt pattern.
title: Suspicious agent workflow, retrieved content followed by execution
id: 8a0d9b6f-7c9b-4d8b-a30b-ai-agent-exec
status: experimental
logsource:
product: internal-agent-runtime
detection:
selection_retrieval:
event_type: content_retrieved
content_source|contains:
- http
- email
- document
selection_exec:
next_event_type:
- shell_command
- browser_post
- file_write
- external_api_call
selection_sensitive:
destination|contains:
- 169.254.169.254
- /etc/
- ~/.aws/
- /var/run/secrets/
condition: selection_retrieval and selection_exec and selection_sensitive
level: high
fields:
- agent_id
- session_id
- model_name
- retrieved_uri
- tool_name
- tool_args
- destination
- approval_state
falsepositives:
- approved red-team simulation
- controlled integration tests
The point of a detection like this is not that the exact fields above exist in every platform. The point is architectural. Once an AI system crosses from summarization into execution, sequence-aware logging becomes more valuable than prompt-only inspection. That is where many current deployments are still thin. (OpenAI)
Offensive security benefits are real, but most marketing claims still overshoot
AI can materially help red teams, pentesters, and bug bounty hunters. It can accelerate first-pass asset understanding, suggest fuzzing directions, summarize JavaScript, identify recurring anti-patterns, draft requests, transform headers or encodings, and turn raw findings into structured retest plans. Those benefits are real because much of offensive work includes repetitive parsing, hypothesis generation, and tool glue. NIST’s Generative AI Profile explicitly notes that reports have indicated LLMs are already able to discover some vulnerabilities in systems and write code to exploit them, while warning that the same systems may also expand attack surface and enable offensive cyber capabilities. (NIST出版物)
But offensive security is exactly where loose writing becomes dangerous. AI does not make hidden business logic obvious. It does not automatically understand how a target’s real authorization boundaries differ from its documented ones. It does not guarantee exploit reliability. It does not replace the discipline of confirming preconditions, reproducing impact, or separating false positives from usable findings. The hard part of high-value testing is still navigation under uncertainty. Models can help with the navigation. They do not eliminate uncertainty. (NCSC)
This is also why the difference between a chat assistant and a testing workflow matters. Public Penligent material describes an AI-driven penetration testing agent that integrates traditional tools such as Nmap, Metasploit, Burp Suite, and SQLmap into a single workflow, with an emphasis on verification, reproducible PoCs, and report output rather than pure conversational assistance. Even if a team never uses that specific product, the design target is correct. In offensive work, evidence beats prose. A useful AI system is one that preserves state, records attempted paths, and produces something another engineer can validate. (寡黙)
The same principle should shape buyer skepticism. If a vendor demo mostly shows narrative explanations, elegant summaries, and one-shot payload suggestions, the team still does not know whether the system can maintain context over many failed paths, recover from dead ends, or prove impact under realistic conditions. The operational question is not whether the model sounds expert. It is whether the workflow creates verifiable artifacts. (寡黙)
AI systems are now first-class security subjects
The most important upgrade in the conversation around AI in cyber security is not that more defenders use models. It is that more systems deserve to be threat-modeled as AI-enabled systems. MITRE’s ATLAS defines itself as a living knowledge base of adversary tactics and techniques for AI systems, while SAFE-AI argues that AI-enabled systems have risks not comprehensively addressed by traditional assessment approaches. NIST’s adversarial machine-learning taxonomy likewise exists because a shared language is needed for attacks and mitigations across the AI lifecycle. (MITRE ATLAS)
That shared language matters because AI systems rarely fail at a single layer. A practical compromise might begin with poisoned input, travel through retrieval, exploit a model’s inability to distinguish trusted instructions from untrusted data, trigger a tool, then persist through memory or configuration. That is why OWASP’s GenAI work evolved from the 2025 LLM Top 10 into more agentic guidance by 2026. The shift reflects a change in system shape. Once the model can plan, call tools, store memory, and act on behalf of users, the relevant security question becomes less about output safety and more about execution control. (OWASP Gen AIセキュリティプロジェクト)
OpenAI’s agent safety guidance says prompt injection is common and dangerous, describing it as the moment untrusted text or data enters an AI system and attempts to override instructions, potentially leading to private-data exfiltration or unintended actions. OpenAI’s later work on agent design goes further and argues that the goal is not perfect input classification, but systems whose impact remains constrained even when manipulation succeeds. That is a crucial engineering mindset. It moves the discussion from brittle filtering fantasies to privilege design, separation, containment, and recovery. (OpenAI Developers)
Microsoft’s Prompt Shields documentation shows the same shape from a different angle by distinguishing user prompt attacks from document attacks. That matters because indirect prompt injection is not just a weird prompt trick. It is a trust-boundary failure. The malicious instruction can live in email, documents, web pages, tickets, or retrieved content that the model treats as ordinary input. When a connected agent reads that content and also has the power to act, “text” becomes an execution vector. (マイクロソフト学習)
The attack classes that matter most
| Attack class | Minimum attacker foothold | Common consequence | Why basic input filtering is insufficient |
|---|---|---|---|
| Direct prompt injection | Access to the prompt interface | Policy bypass, sensitive-answer manipulation | Attack lives in ordinary text and can be obfuscated |
| Indirect prompt injection | Ability to plant content in email, docs, web, tickets, or retrieval sources | Tool misuse, exfiltration, agent hijacking | The malicious instruction is delivered as “data” |
| Data and memory poisoning | Influence over training, fine-tuning, retrieval, or long-term memory stores | Skewed outputs, hidden persistence, degraded trust | The poisoned state can look legitimate over time |
| Model extraction and theft | Repeated access to model APIs or infrastructure | IP loss, imitation, cost abuse | Abuse looks like normal usage until rate and pattern analysis catch it |
| Tool misuse and over-permission | Model already connected to high-scope actions | Unauthorized writes, credential abuse, destructive actions | The core failure is excessive capability, not only bad prompts |
| Runtime and infrastructure compromise | Access to local inference or model-management systems | Auth bypass, DoS, lateral movement, artifact theft | The target is ordinary software and must be treated that way |
This taxonomy is a synthesis of current guidance and public incident thinking rather than a verbatim list from one source, but every category above is grounded in NIST, MITRE, OWASP, or vendor documentation. (NIST出版物)
Relevant CVEs show that AI infrastructure is ordinary security debt with AI-specific blast radius
One of the easiest ways to lower the quality of a cyber security article is to drop CVE numbers without explaining why they matter. For AI in cyber security, the most relevant CVEs are often not “AI broke cryptography” style discoveries. They are much more familiar software failures inside platforms that store models, execute components, manage experiments, expose APIs, or run local inference. Their importance comes from what those platforms touch: models, training assets, prompts, connectors, credentials, and sometimes production decision paths. (MITRE ATLAS)
Langflow is a good example of how AI application builders can become classical remote-execution surfaces. NVD records CVE-2024-37014 as an issue where Langflow through 0.6.19 allowed remote code execution if untrusted users could reach the custom component endpoint and provide a Python script. NVD also records additional Langflow RCE entries tied to Python-capable or code-executing components. The lesson is straightforward: if an AI workflow platform lets users define or trigger code in the same trust zone as the application, “prompting” quickly stops being the interesting problem. The real problem becomes unsandboxed execution with reachable attack paths. (NVD)
MLflow shows a different but equally important pattern. NVD records CVE-2026-2635 as a default-password authentication-bypass issue, CVE-2025-14279 as a DNS rebinding problem in the REST server resolved in version 3.5.0, and earlier artifact-path issues including local file inclusion and traversal cases. None of those flaws are “AI magic.” They are security failures in a platform used to manage experiments, models, and related assets. What makes them significant in AI environments is the concentration of value around experiment metadata, model artifacts, service relationships, and sometimes privileged development networks. Compromise there can expose more than one model endpoint. It can expose the operational backbone around them. (NVD)
Ollama illustrates why local-model infrastructure should not be dismissed as merely a developer convenience issue. NVD records CVE-2025-63389 as an authentication-bypass flaw affecting API endpoints prior to and including v0.12.3, enabling unauthorized model-management operations. It also records CVE-2025-51471 for cross-domain token exposure via a malicious realm value and multiple GGUF-related denial-of-service issues in 2025 and 2026. Once a “local model runner” turns into a shared inference service on a workstation fleet, lab server, GPU node, or internal platform, the risk becomes familiar enterprise risk: unauthorized access, token theft, and service disruption against a system that may sit close to sensitive data or development pipelines. (NVD)
A compact CVE table for AI-relevant infrastructure
| コンポーネント | CVE | 脆弱性タイプ | Why it matters in AI environments | Practical mitigation |
|---|---|---|---|---|
| Langflow | CVE-2024-37014 | Remote code execution | Workflow builders often sit near prompts, connectors, and application logic | Restrict reachability, sandbox code, patch, isolate build components |
| MLフロー | CVE-2026-2635 | 認証バイパス | Model and experiment platforms can expose artifacts and admin functions | Eliminate default credentials, patch, segment access |
| MLフロー | CVE-2025-14279 | DNS rebinding against REST server | Browser-origin assumptions fail against internal AI management APIs | Patch to fixed version, validate Origin, avoid exposing management endpoints broadly |
| MLフロー | CVE-2024-2928 and CVE-2024-3848 | Local file inclusion and traversal-style artifact handling flaws | Artifact stores often contain valuable data and config | Patch, constrain artifact URIs, isolate file access |
| オーラマ | CVE-2025-63389 | 認証バイパス | Internal model services can expose model-management actions | Patch, require auth, bind safely, segment service |
| オーラマ | CVE-2025-51471 | Token exposure | Cross-domain token leakage can undercut access controls | Patch, verify auth flows, avoid trust on remote realms |
| オーラマ | CVE-2025-66959 and CVE-2025-66960 | Denial of service in GGUF handling | Untrusted model files or metadata can crash runtime services | Patch, validate model sources, isolate ingestion |
The common theme across these issues is not novelty. It is misplaced categorization. Teams often talk about AI security as though it were mostly prompt injection and red teaming, while ignoring that model infrastructure, workflow builders, and local runtimes are ordinary software with the same need for segmentation, authentication, patching, logging, and abuse-case testing as any other high-value service. (NVD)
Prompt injection is not the end of the story, execution is
Prompt injection deserves attention, but it becomes misleading when treated like a magic phrase that explains everything. The real reason it matters is that it attacks a boundary most teams still define poorly: the difference between trusted instructions and untrusted content. NIST’s Generative AI Profile describes prompt injection as modifying the input provided to a generative AI system so it behaves in unintended ways, and specifically notes that indirect prompt injection can occur when adversaries place instructions in retrievable content. OpenAI’s guidance uses different wording but points to the same operational consequence: private data exfiltration and unintended tool actions become possible when untrusted text can influence an action-capable system. (NIST出版物)
That means the risk is proportional not just to model quality, but to privilege. A summarization tool that sees malicious text may return a polluted answer. A connected agent that sees the same text and also has access to email, browser automation, shell execution, secrets, or write-capable APIs may take unauthorized action. The relevant security question is therefore not “can we detect all prompt injection.” It is “what can this system do if it is injected.” That is a much more familiar and solvable cyber security question. (マイクロソフト学習)
This is where modern guidance is actually converging. OpenAI’s design guidance emphasizes constraining impact even when manipulation succeeds. Microsoft’s Prompt Shields separate user and document attack channels. NCSC and CISA-backed secure AI development guidance emphasizes secure design, secure development, secure deployment, and secure operation and maintenance across the lifecycle. OWASP’s more recent agentic work likewise centers on threat-model-driven controls for autonomous, tool-using systems. Different ecosystems use different language, but the architecture lesson is consistent: protect the action path, not only the text input. (OpenAI)
A minimal control wrapper for tool execution makes the idea concrete:
from dataclasses import dataclass
from typing import Dict, Any
HIGH_RISK_TOOLS = {"shell.exec", "browser.post", "fs.write", "secrets.read"}
BLOCKED_HOSTS = {"169.254.169.254", "metadata.google.internal"}
WRITE_PATH_PREFIXES = ("/etc/", "/var/run/secrets/", "/root/", "~/.aws/")
@dataclass
class Decision:
allow: bool
reason: str
require_human_approval: bool = False
def evaluate_tool_call(tool_name: str, args: Dict[str, Any], trust_level: str) -> Decision:
destination = str(args.get("destination", ""))
path = str(args.get("path", ""))
requires_network = bool(args.get("network", False))
if destination in BLOCKED_HOSTS:
return Decision(False, "blocked sensitive metadata destination")
if path.startswith(WRITE_PATH_PREFIXES):
return Decision(False, "blocked write to sensitive path")
if tool_name in HIGH_RISK_TOOLS and trust_level != "trusted_internal":
return Decision(False, "high-risk tool denied for untrusted content")
if tool_name in HIGH_RISK_TOOLS or requires_network:
return Decision(True, "allowed only with human approval", require_human_approval=True)
return Decision(True, "allowed bounded action")
# Example
decision = evaluate_tool_call(
tool_name="browser.post",
args={"destination": "billing.internal", "network": True},
trust_level="retrieved_external_document"
)
print(decision)
The point is not that this snippet is production-ready. The point is that the first line of defense is not “please ignore malicious instructions.” It is a policy layer that knows what the agent is allowed to touch, from what trust zone, under which conditions, with what approval path, and with which audit record. That is much closer to familiar API gateway and privileged-action control than to traditional content moderation. (OpenAI)
Secure AI systems like systems, not like magic
The best public security guidance on AI now looks surprisingly traditional in structure. The joint NCSC-led guidance on secure AI system development organizes the lifecycle into secure design, secure development, secure deployment, and secure operation and maintenance. MITRE’s SAFE-AI guidance similarly argues for adapting established security controls to AI-enabled systems while addressing distinct AI concerns such as data provenance, model behavior, insecure APIs, and attack surfaces tied to models and data. The novelty is not that all old controls disappear. The novelty is that several old controls must now be applied to new system planes. (NCSC)
At design time, teams should ask which parts of the system are authoritative, which are advisory, and which are executable. That sounds simple, but many AI deployments blur those boundaries. A retrieved document may be treated as ordinary context while also influencing tool choice. A memory store may be treated as harmless convenience while quietly shaping future actions. A planning trace may be treated as internal implementation detail while containing policy-critical decisions. If the team cannot point to where authority is granted and where it stops, then the design is not ready. (OWASP Gen AIセキュリティプロジェクト)
At development time, supply chain discipline matters more than many teams expect. The secure AI development guidance explicitly includes supply chain security, documentation, asset management, and technical-debt management. In practice that means model provenance, connector provenance, dependency review, prompt and policy versioning, reproducible configuration, and clear ownership for updates. AI systems inherit software supply chain risk and then add model files, datasets, embeddings, retrieval corpora, and tool schemas on top. (NCSC)
At deployment time, the biggest mistakes are usually excessive reachability and excessive permissions. Management interfaces should not be exposed casually. Local inference servers should not be assumed safe merely because they started as developer tools. Connectors should default to least privilege. Retrieval should distinguish between trusted and untrusted sources. High-risk actions should be approval-gated, and the system should fail closed on missing policy. These are not glamorous controls, but they are the ones that preserve the enterprise when the model behaves badly or the infrastructure is targeted directly. (NCSC)
During operation and maintenance, monitoring has to include system behavior, system input, and update hygiene. The NCSC guidance explicitly calls out monitoring system behavior, monitoring inputs, following a secure-by-design approach to updates, and collecting lessons learned. For AI-enabled systems, that means logging more than API success or failure. It means recording model version, prompt template version, retrieval source identifiers, document hashes, tool calls, arguments, policy decisions, approval events, and resulting side effects in a way investigators can reconstruct. Otherwise the organization has built an active system without the forensic depth to understand it. (NCSC)
Questions that decide whether an AI workflow is defensible
| Question | If the team cannot answer it | Minimum corrective action |
|---|---|---|
| Which tools can the model call without approval | The system has unknown implicit privilege | Define tool classes and approval rules |
| Which inputs are trusted, semi-trusted, and untrusted | Indirect injection risk is unmanaged | Add source labeling and trust-aware policy |
| What exact artifacts are logged for each action | Post-incident reconstruction will fail | Log model, prompt, source, tool, args, result |
| How is model or prompt behavior versioned | Drift and regressions will be hard to isolate | Add version pinning and controlled rollout |
| What is the rollback path for a bad action | One model error can become an outage | Add manual checkpoints and compensating controls |
| Which runtime components are internet-reachable | Management-plane exposure is probably underestimated | Segment and restrict interfaces |
This is a practical governance table, but it maps closely to the secure-design and secure-operation guidance now emerging across the field. (NCSC)
How to test AI systems without pretending text is the whole system
A recurring failure in AI security programs is evaluating only the model and not the workflow. Teams may run jailbreak prompts against a chat interface, declare that the model “passed,” and then deploy the same model inside a system that can read docs, open browsers, send email, or modify records. That is not a real test. Real testing has to follow the action chain. (OpenAI)
A better test plan starts with four categories. First, influence tests: can the attacker alter the model’s reasoning or goals through direct prompts, retrieved data, email, web content, or long-term memory. Second, authorization tests: what actions does the system already have the right to perform. Third, execution tests: which tools, endpoints, and systems can the agent actually reach. Fourth, persistence tests: can malicious state survive across sessions, caches, memory, or configuration. That sequence mirrors the way agentic compromise tends to chain in practice. (寡黙)
For offensive validation platforms, the same lesson applies. If the workflow claims to automate retesting, proof generation, or attack-path exploration, it should be measured on reproducibility, evidence quality, and bounded execution rather than on how eloquently it describes a CVE. Public Penligent material around AI-driven pentesting and AI in cyber security reflects that more operational framing, focusing on attack workflows, evidence, and verification. Again, the broader lesson is not product loyalty. It is that security teams should reward systems that produce checkable outputs under bounded authority, because that is how AI becomes useful without becoming a liability. (寡黙)
One practical red-team improvement is to test for plan drift rather than only malicious phrases. A competent agent defense should be able to show when a retrieved item or a tool result caused the agent to deviate into a higher-risk plan. If the platform cannot tell you what changed the plan, which policy allowed the next action, and what exact data source influenced the decision, then the security posture is weaker than the demo suggests. That is where emerging agentic guidance is pointing, and it is the right direction. (OWASP Gen AIセキュリティプロジェクト)

The mistakes that keep repeating
One recurring mistake is treating retrieval as inherently safe because it is “just reading documents.” In practice, retrieved text is one of the cleanest delivery channels for indirect prompt injection, especially when the system later turns model output into tool calls or user-visible decisions. Document attacks are not edge cases. They are central to how agentic compromise scales. (マイクロソフト学習)
Another is giving agents broad scopes because fine-grained permissioning feels inconvenient during prototyping. That choice usually survives too long. A system that can read external content and also write into production systems, billing systems, identity systems, or developer infrastructure is a system whose safety depends on perfect interpretation of ambiguous text. No serious security team should make that its primary line of defense. (OpenAI)
A third is focusing on answer quality while ignoring action quality. A model can sound right and still cause the wrong side effect. In cyber security, side effects are the real unit of risk. The control question is not whether the model’s explanation sounds reasonable. It is whether the system touched the wrong asset, pulled the wrong secret, hit the wrong endpoint, or authorized the wrong state change. (OpenAI)
A fourth is patching the model while forgetting the runtime. The most instructive public AI-related CVEs in the last two years have often lived in workflow platforms, experiment managers, local inference runtimes, or API surfaces rather than in the foundation model itself. Teams that spend all their energy on prompt hardening while leaving model-management services, artifact stores, or local runtimes weakly exposed are defending the wrong layer. (NVD)
A fifth is confusing generated output with finished work. In detection, a drafted rule is not a deployed rule. In pentesting, a generated payload is not a verified exploit. In agent security, a blocked prompt is not a secure execution boundary. AI is productive, but productivity without verification can compound mistakes faster than manual work ever could. (寡黙)

What different teams should do first
SOC teams should usually start with low-blast-radius augmentation: case summarization, query drafting, enrichment, and hypothesis generation. Those are the places where time savings show up quickly and auditability is still manageable. They should delay autonomous containment unless approvals, rollback, and observability are already strong. (マイクロソフト学習)
Application-security teams should focus on AI-specific trust boundaries: prompt handling, retrieval sources, connector scope, tool gating, browser isolation, secrets exposure, and the ability to reconstruct multi-step agent behavior. For them, the question is not just whether the model can be manipulated, but whether manipulation can escape the language layer and reach system state. (OpenAI Developers)
Red teams and pentesters should treat AI as a force multiplier for boring work and a hypothesis engine for interesting work. It is well suited to compression, reformatting, payload variation, code explanation, and reconnaissance assistance. It still needs human judgment for exploit reliability, business logic, target-specific constraints, and impact proof. Public documentation from AI-assisted offensive platforms is most useful when it keeps that distinction clear. (NIST出版物)
Technical buyers should evaluate AI security products on control depth, not demo fluency. The decisive questions are whether the system exposes least-privilege design, durable audit trails, source-aware trust boundaries, bounded tool execution, reproducible evidence, and credible rollback. Any product can generate polished prose. Far fewer can prove what they did, why they did it, and how to undo it. (NCSC)
The next two years will reward teams that think in control planes
The strongest evidence today does not support two common myths. The first myth is that AI will instantly automate the full attacker lifecycle for everyone. Official assessments and current threat reporting do not support that. The second myth is that AI in cyber security is mostly about chatbot quality. That is even less defensible. The field is moving toward systems that reason over more context, connect to more tools, and sit closer to more privileged workflows. That makes security architecture, logging, identity, segmentation, and approval design more—not less—important. (NCSC)
NCSC’s 2025 assessment warns that a growing divide will emerge between organizations that can keep pace with AI-enabled threats and those that cannot, and that AI is likely to further reduce the time between vulnerability disclosure and exploitation. That is one of the most useful ways to think about the near future. AI does not need to solve every hard problem in offensive security to change the environment. It only needs to shorten cycles, widen scale, and reward the teams that can operationalize it faster. (NCSC)
At the same time, defenders have a structural advantage when they combine AI with strong control points. They own more telemetry, more policy, more enforcement opportunities, and more rollback mechanisms than attackers do. The winning organizations will not be the ones that merely attach a model to every workflow. They will be the ones that know exactly which tasks deserve AI assistance, which tasks deserve AI execution only under constraint, and which systems deserve the same hardening as any other privileged service. (NCSC)
That is the real meaning of AI in cyber security now. It is not a branding phrase, and it is not a single market category. It is a redesign of several security workflows at once, plus the arrival of new attack surfaces that behave partly like software, partly like data systems, and partly like delegated operators. Teams that secure only the model will miss the system. Teams that use only the system and never challenge the model will miss the failure mode. Teams that design for bounded execution, evidence, and recovery will be the ones that can actually trust what they deploy. (MITRE ATLAS)
Further reading
NIST AI Risk Management Framework and Generative AI Profile (NIST)
NIST Cyber AI Profile initial public draft (NIST出版物)
NIST Adversarial Machine Learning taxonomy and terminology (NIST出版物)
NCSC and partner guidance for secure AI system development (NCSC)
NCSC threat assessments on AI and cyber operations through 2027 (NCSC)
MITRE ATLAS and SAFE-AI for AI-enabled system security (MITRE ATLAS)
OWASP Top 10 for LLM Applications 2025 and OWASP Top 10 for Agentic Applications 2026 (OWASP Gen AIセキュリティプロジェクト)
OpenAI guidance on agent safety and prompt injection resistance (OpenAI Developers)
Google Threat Intelligence Group reporting on threat-actor AI use in 2025 and 2026 (グーグル・クラウド)Penligent homepage (寡黙)
Penligent, AI in Cyber Security — What Actually Changes When Attackers and Defenders Both Have Models (寡黙)
Penligent, Agentic AI Security in Production — MCP Security, Memory Poisoning, Tool Misuse, and the New Execution Boundary (寡黙)
Penligent, AI Agents Hacking in 2026 — Defending the New Execution Boundary (寡黙)
Overview of Penligent.ai’s Automated Penetration Testing Tool (寡黙)

