Cabeçalho penumbroso

Claude Mythos Preview Is Not Black Box Pentesting

Anthropic’s public write-up on Claude Mythos Preview matters. The company says the model is “strikingly capable” at computer security tasks, launched Project Glasswing around it, and documents Mythos Preview as an invitation-only research preview for defensive cybersecurity workflows rather than a generally self-serve model. That is not routine model marketing. It is a frontier lab telling defenders that cyber capability is moving fast enough to justify a dedicated defensive program. (Red Anthropic)

The mistake is not taking that warning seriously. The mistake is reading more into the public evidence than the public evidence can support. The strongest material Anthropic has released so far shows three things very clearly: source-visible vulnerability research in open-source code, offline reverse-engineering-assisted analysis of stripped binaries, and increasingly capable exploit development, including N-day exploit generation. All of that is significant. None of that, by itself, proves reliable black-box web pentesting against live internet-facing applications. (Red Anthropic)

That distinction is not academic hair-splitting. It changes what was measured, what remains unknown, how much confidence security buyers should place in the “AI pentesting” label, and what a fair benchmark for real-world application security testing should look like. If a model can read the repository, rank files, run debuggers, iterate in an isolated container, and then emit a bug report plus proof-of-concept, you are looking at a powerful exploit-research assistant. If a model starts from only a URL, a few low-privilege identities, and the messy behavior of a deployed web application and still finds a stateful logic flaw, proves impact, rules out false positives, and packages defensible evidence, that is much closer to black-box pentesting. Anthropic has publicly proven much more of the first category than the second. (Red Anthropic)

Why Claude Mythos Preview matters, and why the public proof is narrower than the headlines

Anthropic Public What Claude Mythos Preview Was Actually

The public Anthropic material is unusually strong by the standards of model capability reporting. The company says it shifted toward novel real-world security tasks because Mythos Preview had “mostly” saturated its earlier internal and external benchmarks. It also says the model was used over several weeks to search for vulnerabilities in the open-source ecosystem, perform offline exploratory work against closed-source software in line with bug bounty programs, and generate exploits from model findings. That alone makes the report worth serious attention. (Red Anthropic)

Anthropic also frames Mythos Preview as a defensive story, not just an offensive one. Project Glasswing brings together major software, cloud, and security companies in an effort to secure critical software, and Anthropic’s own model documentation says Mythos Preview is not generally available and currently requires invitation-only access. In other words, even Anthropic is not presenting this as a normal public developer model rollout. It is presenting it as a controlled defensive capability program. (anthropic.com)

That context matters because it explains both the seriousness of the report and the limits of public verification. Anthropic says more than 99 percent of the vulnerabilities it found have not yet been patched and fewer than 1 percent are fully patched, which means much of the strongest evidence cannot yet be disclosed in technical detail. To create some future accountability, the company publishes SHA-3 commitments for findings it says it will reveal later. That is a reasonable coordinated disclosure posture. It is also an admission that large parts of the most important public claims remain hard for outside readers to independently verify today. (Red Anthropic)

So the correct skeptical reading is not “Anthropic made it up.” The correct skeptical reading is “the public report is strong evidence of a capability jump, but it is still a capability report with narrow public observability.” That is exactly why the boundary of the claim matters so much. Once people hear “zero-days,” “browser exploit chains,” “auth bypass,” and “AI wrote working exploits,” they naturally jump to “so black-box pentesting is solved.” That leap is bigger than the public materials justify. (Red Anthropic)

What Anthropic actually tested in Claude Mythos Preview

Anthropic is unusually explicit about its core scaffold, and that transparency is one of the best parts of the write-up. For the zero-day case studies it discusses, Anthropic says it launches an isolated container that runs the project under test and its source code. It then invokes Claude Code with Mythos Preview and effectively asks it to find a security vulnerability in the program. In a typical run, the model reads code, forms hypotheses, runs the project to confirm or reject them, repeats as needed, adds debugging logic or uses debuggers, and finally outputs either “no bug” or a bug report with a proof-of-concept exploit and reproduction steps. (Red Anthropic)

The company also describes how it scales that process. Instead of scanning the same repository the same way every time, Anthropic first asks the model to rank files by how likely they are to contain interesting bugs on a one-to-five scale. Files ranked higher are more likely to parse raw network data or handle user authentication. Multiple agents then work in parallel on different files to increase bug diversity and reduce duplicate findings. At the end, Anthropic runs another Mythos Preview agent to decide whether a bug report is “real and interesting,” filtering out valid but minor or obscure issues. (Red Anthropic)

Anthropic also explains why so many of its public examples are about memory safety. Critical software is still heavily built in C and C++, remaining bugs in mature codebases are harder and therefore more revealing, and memory safety violations are easier to validate than many logic issues because tools such as Address Sanitizer separate real bugs from hallucinations with high confidence. That is a sound research choice if your goal is to measure upper-bound exploit research ability. It is not the same goal as measuring whether a model can reason through a modern external web application’s authentication flows, business states, tenant boundaries, and API behaviors under realistic operational constraints. (Red Anthropic)

This matters because methodology is not a side note in security evaluation. Methodology defines the question. Anthropic asked, in effect: “Given code and runtime access, can the model independently find and exploit serious vulnerabilities?” That is a hard and important question. It is just not the same as asking: “Given an internet-facing target and only the behavior it exposes, can the model perform black-box web pentesting at a level a security team can trust?” (Red Anthropic)

The difference becomes even clearer in Anthropic’s reverse-engineering section. For closed-source targets, Anthropic says Mythos Preview reconstructs plausible source code from stripped binaries, then receives both that reconstructed source and the original binary, and performs the same agentic search process as before. Anthropic says this was done entirely offline and used to find vulnerabilities and exploits in closed-source browsers and operating systems. Again, that is impressive. It is also not the same thing as starting with only a URL, a browser, some JavaScript, a few API traces, and one legitimate account. (Red Anthropic)

Why source-visible exploit research is not black box pentesting

Source Code visible exploit is not real pentesting

Source access changes the job. It changes it so much that using the same label for both tasks creates more confusion than clarity.

When a model can read the repository, the search space changes from behavioral inference to structural analysis. It can see function names, permission checks, parsing routines, feature flags, fallback paths, test-only routes, alternative authentication branches, and places where untrusted data crosses into critical operations. It can correlate behavior with implementation instead of inferring implementation from behavior. That does not make the job easy, especially in operating systems, browsers, and crypto libraries. But it does remove the most expensive layer of uncertainty that defines black-box application work. (Red Anthropic)

A second shift is that source-visible testing lowers the cost of hypothesis generation. In real web engagements, a tester often spends a large fraction of the work simply discovering what the application is and how it behaves: which identities exist, how the state machine moves, which APIs belong to which workflow, whether a redirect is cosmetic or authoritative, whether a hidden field is enforced server-side, whether a mobile endpoint differs from the browser endpoint, whether a password reset path enforces the same checks as the main login path, and whether a retry is hitting the same backend path or a different edge service. Source access short-circuits many of those questions. (owasp.org)

A third shift is that unlimited or loosely bounded agentic iteration measures a different kind of success. If you let an agent run for hours, add debugging logic, split work across many files, and retry until it either disproves or proves its hypothesis, you are measuring what the system can eventually achieve with enough context and compute budget. That is a valid research lens. But a black-box pentest buyer also cares about unit economics, reproducibility, false positives, evidence quality, rate discipline, workflow coverage, and whether the system can converge without contaminating the target or drifting into a dead-end loop. Anthropic’s public scaffold is optimized for upper-bound capability, not for the full production reality of external target validation. (Red Anthropic)

The difference can be summarized like this:

QuestionWhat Anthropic publicly showsWhat black-box web pentesting still needs
Can the model find serious bugs with rich internal contextYes, stronglyNot enough by itself
Can the model generate exploits for some bugs and N-daysYes, stronglyStill not sufficient for web proof
Can the model reconstruct stripped binaries and reason about them offlineYes, publicly claimedStill not the same as external SaaS testing
Can the model discover attack surface from externally visible behavior aloneNot publicly demonstrated in detailRequired
Can the model reason through deployed business workflows, MFA, alternate channels, tenant boundaries, and race windows with only behavioral evidenceNot publicly demonstrated in detailRequired
Can the model produce low-false-positive, replayable, evidence-first findings against live internet-facing appsNot publicly demonstrated in detailRequired

The left column is a synthesis of Anthropic’s published methodology and examples. The right column is what OWASP and PortSwigger treat as core parts of real web testing, especially in authentication, business logic, workflow circumvention, and race-condition analysis. (Red Anthropic)

This is why “white-box strength” is not an insult. It is a precise job description. Claude Mythos Preview appears to be very strong at exploit research with high-information inputs. That is a major milestone. It just is not equivalent to proving black-box web pentesting against the internet.

Why reconstructed binaries are still not black box web reality

The closed-source section of Anthropic’s report is one of the places readers most often over-interpret. Anthropic says Mythos Preview can reverse-engineer stripped binaries into plausible source code, then validate against the original binary, all offline, and use that workflow to find vulnerabilities and exploits in browsers, operating systems, and firmware. That is a serious capability claim. It is also a very different problem from external application security testing. (Red Anthropic)

A browser exploit chain or local privilege escalation chain lives in a world where the model can reason about memory layouts, primitives, mitigations, and internal program structure. A web application bug often lives in a world where the most important questions are semantic rather than structural. Did the checkout path revalidate the discount on the server? Did the invitation flow bind the invite to the organization or only to an email address? Does the mobile API enforce MFA the same way as the browser path? Does the “complete profile” step actually gate access, or only change the frontend state? Does the application lock on account, IP, tenant, or device when multiple MFA failures occur? Did a cache layer serve privileged data after a role change? Those are black-box application questions. The hard part is not merely finding code that looks risky. The hard part is reconstructing the security model from behavior. (owasp.org)

Anthropic’s own wording reinforces the point. The reverse-engineering work is said to be done “entirely offline.” That makes perfect sense for responsibly exploring closed-source systems. But real external web pentesting is defined by the opposite condition: the target is live, often noisy, partially observable, sometimes rate-limited, occasionally shielded by WAFs or anti-bot systems, and full of side effects that can mislead both humans and models. An agent that succeeds in a clean offline reverse-engineering loop has not yet proven it can survive that operational mess. (Red Anthropic)

This is also why “binary bug found” is not the same as “web pentest solved.” Many of the most expensive failures in real SaaS systems have nothing to do with low-level memory safety. They are about authorization drift, identity lifecycle bugs, invite abuse, reset flow confusion, alternative-channel weakening, cross-tenant reference mistakes, payment-state mismatches, workflow skipping, and concurrency edge cases. Those categories are explicitly treated as first-class testing areas by OWASP and PortSwigger because they do not fall out of the target just because you know how code is written. They fall out of careful interaction with the deployed system and the different states it can enter. (owasp.org)

Claude Mythos Preview and the missing web proof

Anthropic does say Mythos Preview found many web application logic vulnerabilities. The report lists complete authentication bypasses that let unauthenticated users grant themselves administrator privileges, account login bypasses that let unauthenticated users log in without a password or two-factor code, and denial-of-service attacks that could remotely delete data or crash the service. Those are exactly the kinds of findings that would matter in black-box web work. The issue is not relevance. The issue is evidence density. Anthropic does not publish concrete public case studies for those web findings yet because the vulnerabilities are still under disclosure. (Red Anthropic)

That creates an unavoidable asymmetry in the current public record. For source-visible memory safety work, Anthropic gives the audience enough methodology and enough examples to understand the shape of the result. For web logic work, the audience receives categories of outcomes but not yet the methodological detail needed to judge whether those results came from a genuinely black-box process, a source-visible process, or some hybrid. The report’s own caveat makes that clear: most bugs remain unpatched, the current public material is only a lower bound, and several claims are intentionally abstract until coordinated disclosure completes. (Red Anthropic)

That does not mean the web claims are false. It means they are not yet a public proof of black-box web pentesting. In security, “public proof” matters because defenders, buyers, and other researchers need to know what has actually been demonstrated. A category claim without a reproducible method and without concrete case material is enough to justify interest and caution. It is not enough to settle the label. (Red Anthropic)

This is also where reading multiple sources side by side becomes important. Consider the FreeBSD case that Anthropic used as a flagship example. Anthropic described it as a 17-year-old remote code execution vulnerability that “allows anyone to gain root” starting from an unauthenticated user anywhere on the internet. The current NVD entry, however, says kernel remote code execution is possible by an authenticated user who can send packets to the kernel’s NFS server while kgssapi.ko is loaded, while user-space RPC servers with librpcgss_sec loaded are vulnerable from any client able to send packets. Meanwhile, the official FreeBSD advisory says the stack overflow can be triggered without the client authenticating first. Those details may eventually reconcile cleanly, but the fact that early public descriptions differ is a good reminder that capability reports should be read alongside vendor and CVE records, not in isolation. (Red Anthropic)

That kind of source drift is not unusual in fast-moving disclosures. It is also a strong argument for precise boundaries. If even public flagship cases need careful cross-reading, then it is even more important not to overclaim on the categories Anthropic has not yet detailed in public.

What black box web pentesting actually demands

OWASP’s Web Security Testing Guide is useful here because it is not written to flatter any particular tool category. It is written as a practical framework for testing web applications and web services. In that framework, authentication testing, authorization testing, business logic testing, workflow circumvention, request forgery, integrity checks, application misuse, file handling, and more are all part of the job. Black-box application security work is not just “find injection.” It is “understand how the system is supposed to behave, observe how it actually behaves, and identify where the two diverge.” (owasp.org)

OWASP’s business-logic guidance makes the point bluntly: testing for business logic flaws in a dynamic web application requires thinking in unconventional ways. Its example is intentionally simple but representative: what happens if the authentication mechanism expects steps one, two, and three in order, and the user goes from step one straight to step three? That question is the opposite of source-first reasoning. It is state-first reasoning. The tester is asking what the system will let them do, not what a function name implies it should do. (owasp.org)

OWASP’s workflow-circumvention guidance goes further. It says the application must ensure users complete workflow steps in the correct order and prevent them from skipping or repeating steps, and it describes testing as building abuse and misuse cases that successfully complete a business process without following the intended path. That is exactly the sort of problem that makes black-box web pentesting hard: you do not get a single line of code telling you which step is canonical. You learn it by interacting with the system, comparing states, replaying requests, and looking for the gap between intended and actual security semantics. (owasp.org)

OWASP’s MFA testing guidance is equally revealing. It tells testers to enumerate every authentication path, including the main login page, security-critical account actions, federated login providers, API endpoints from both web and mobile interfaces, alternative non-HTTP protocols, and even test or debug functionality. It explicitly says all login methods should be reviewed to ensure MFA is enforced consistently, because if some methods do not require MFA, they can provide a simple bypass. It also calls out a classic black-box scenario: complete the username and password step, then force-browse or make direct API requests without completing the second factor and see whether access is granted anyway. (owasp.org)

PortSwigger’s logic-flaw material aligns with that view. It defines business logic vulnerabilities as design and implementation flaws that allow unintended behavior and says they usually arise from failure to anticipate unusual application states. That is a key phrase. A code-aware model can help reason about expected states. A black-box pentesting system has to discover unexpected ones from behavior. PortSwigger’s race-condition material makes the same point from another angle: race conditions are closely related to business logic flaws, and meaningful testing often involves hidden multi-step sequences, multi-endpoint races, and carefully aligned request timing. (portswigger.net)

That is why real black-box web pentesting is partly a state-reconstruction problem, partly an identity-and-authorization problem, partly an experimentation-and-evidence problem, and only sometimes a classic exploit-generation problem. A system can be brilliant at source-aware bug discovery and still be mediocre at live application reasoning. Those are adjacent skills, not identical ones.

Business logic vulnerabilities, MFA bypasses, and race conditions are the hard part

Business logic is where the “AI pentesting” label becomes most likely to mislead. If people hear the phrase and picture SQL injection payload generation, endpoint fuzzing, or code review, they are imagining the easier half of the market conversation. The harder half is whether the system can reason about state, not just syntax.

PortSwigger’s summary is useful: business logic vulnerabilities happen when attackers can manipulate legitimate functionality to achieve a malicious goal, often because the application fails to anticipate unusual states. OWASP’s workflow guidance says the same thing in more operational language: testers develop abuse and misuse cases to complete the business process while not completing the correct steps in the correct order. That is why business logic bugs often defeat scanners and naive agents. The system under test may be functioning exactly as coded and still be security-broken because the deployed workflow semantics are wrong. (portswigger.net)

MFA is a perfect example. An AI system that can read code may quickly discover that a boolean “mfa_complete” flag gates privileged routes. A black-box system has to infer where MFA is enforced, whether the enforcement is session-based or endpoint-based, whether password-first routes grant partial or full app access, whether mobile APIs enforce the same policy, whether federated logins are stronger or weaker, whether account recovery invalidates MFA, and whether disabling MFA requires re-authentication. OWASP’s guidance explicitly tells testers to enumerate all of those surfaces and then attempt to bypass them. That is not just “find a bug.” It is “understand the security model from outside.” (owasp.org)

Race conditions are even more revealing because they expose a different kind of weakness in automation. PortSwigger points out that race conditions are often tied to hidden multi-step sequences and multi-endpoint collisions. In practice, that means the tester needs to understand when two requests interact with the same state, when the “race window” opens, how to warm connections, and how to prove the effect is not a random transient. A model that can read the code may have a huge head start. A black-box agent has to infer the race from behavior, build the timing experiment, and gather repeatable evidence. That is much closer to the reality of offensive validation against live systems. (portswigger.net)

A practical black-box workflow therefore needs a state ledger, not just a prompt. It needs to track identities, preconditions, cookies, CSRF tokens, expected transitions, negative cases, and replay constraints. A useful minimal contract might look like this:

target:
  base_url: https://app.example.com
  scope: authenticated web application
identities:
  - name: anonymous
    session: none
  - name: normal_user
    session_source: fresh browser login
  - name: admin_user
    session_source: separate browser profile
critical_flows:
  - login
  - password_reset
  - mfa_enrollment
  - mfa_challenge
  - org_invite_acceptance
  - checkout
  - billing_change
evidence_requirements:
  save_request_response_pairs: true
  capture_negative_cases: true
  require_replay_on_fresh_session: true
  record_server_side_effects: true
safety_limits:
  max_requests_per_minute: 20
  destructive_actions_forbidden: true
  stop_on_unexpected_write: true
hypotheses:
  - password stage may grant access before MFA completion
  - invite acceptance may not bind the correct tenant
  - checkout flow may allow step skipping

The point of a structure like this is not elegance. It is discipline. Black-box testing fails all the time because people jump from “interesting response” to “critical finding” without proving state, scope, and repeatability.

OWASP’s MFA guidance also maps cleanly to a simple verification pattern. After completing the password step, the test is not over; it has barely started. The next question is whether the session already has more reach than it should.

# Step 1: complete username and password only
curl -i -c jar.txt \
  -H "Content-Type: application/json" \
  -d '{"username":"user@example.com","password":"REDACTED"}' \
  https://app.example.com/api/login

# Step 2: without completing MFA, try force-browsing a privileged page
curl -i -b jar.txt \
  https://app.example.com/app/dashboard

# Step 3: try a direct API request that should require completed MFA
curl -i -b jar.txt \
  https://app.example.com/api/account/profile

# Step 4: compare behavior after completing MFA in a separate clean session
curl -i -b post_mfa_jar.txt \
  https://app.example.com/api/account/profile

That pattern is conceptually simple and operationally important. It tests whether MFA is truly an authorization boundary or only a frontend waypoint. OWASP explicitly recommends this kind of force-browsing and direct API validation because partial authentication states are a common source of real-world bypasses. (owasp.org)

This is also the place where an evidence-first AI workflow becomes more useful than a “smart scanner.” The useful AI system is not the one that narrates a suspicion in fluent prose. It is the one that can preserve state, replay hypotheses, capture negative cases, compare identities, and package the result so a human can verify it without trusting the model’s storytelling. That distinction is exactly what gets lost when source-aware exploit research and black-box pentesting are collapsed into the same bucket.

Recent web CVEs show what black box reality looks like

The fastest way to see the difference between exploit research and black-box web reality is to look at the kinds of web CVEs that repeatedly become urgent in real operations. They are often not exotic. They are reachable, internet-facing, and business-critical.

JetBrains TeamCity is a good example. JetBrains said in March 2024 that two critical vulnerabilities in TeamCity On-Premises could allow an unauthenticated attacker with HTTP(S) access to bypass authentication checks and gain administrative control, and that all TeamCity On-Premises versions were affected until the fix in 2023.11.4. NVD summarizes CVE-2024-27198 as authentication bypass in TeamCity before 2023.11.4 allowing admin actions, and notes the vulnerability is in CISA’s Known Exploited Vulnerabilities catalog. This is exactly the kind of issue that matters to black-box testing because the initial question is not “can I derive a complex exploit chain from source?” It is “from the outside, can I reach privileged functionality I should not reach?” (The JetBrains Blog)

ConnectWise ScreenConnect tells a similar story. NVD describes CVE-2024-1709 as an authentication bypass using an alternate path or channel affecting ScreenConnect 23.9.7 and earlier. ConnectWise’s own bulletin told on-premises customers to immediately update to 23.9.8 or higher and said the company had added a mitigation step that suspends outdated on-prem instances until they are updated. That is almost a textbook example of OWASP’s warning about weaker authentication in alternative channels. The lesson is not merely “auth bypasses exist.” The lesson is that external management surfaces, auxiliary paths, and policy inconsistency are live, repeatable, internet-grade failure modes. (ConnectWise)

MOVEit Transfer is another useful case because it shows how quickly reachable web flaws turn into operational crises. NVD says CVE-2023-34362 is a SQL injection vulnerability in MOVEit Transfer that could allow an unauthenticated attacker to gain access to the database on affected versions. CISA and the FBI publicly said CL0P exploited the zero-day. Again, the key point for this discussion is not that SQL injection is novel. It is that internet-facing application weaknesses with immediate black-box reachability can become mass-impact incidents very quickly. (nvd.nist.gov)

A compact way to compare these examples is the table below:

CVEPublicly documented weaknessWhy it is a black-box-relevant case
CVE-2024-27198TeamCity auth bypass enabling admin control over HTTP(S) on on-prem serversDemonstrates that external reachability and authentication semantics are often the whole story
CVE-2024-1709ScreenConnect auth bypass via alternate path or channelMaps directly to alternate-channel and inconsistent-auth testing rather than source-only reasoning
CVE-2023-34362MOVEit Transfer unauthenticated SQL injection with real-world exploitationShows how internet-facing application bugs can turn into broad operational incidents quickly

The facts in this table are drawn from vendor and NVD records, plus the CISA-FBI public notice for MOVEit exploitation. (The JetBrains Blog)

What ties these together is not a particular bug class. It is the mode of security failure. These are cases where external behavior, authentication semantics, and reachable management or transfer surfaces mattered immediately. A black-box-capable AI pentesting system should be able to reason about exactly those conditions: entry points, identities, alternative channels, sequence checks, exposure, and verifiable impact.

A benchmark that would actually prove black box pentesting

A Benchmark for real Black-box AI Pentesting

If a lab wanted to make a strong public claim that Claude Mythos Preview or any other frontier model had crossed into real black-box web pentesting, the benchmark would have to look very different from the one Anthropic publicly described.

First, there should be no source code, no reconstructed source, and no privileged architectural hints. The system should start from what a real external tester starts from: a target URL, allowed identities, scope constraints, request budgets, and perhaps a browser or HTTP runtime. Second, it should have to discover attack surface itself: pages, scripts, routes, APIs, state transitions, auxiliary channels, and role differences. Third, it should be measured on verified findings, not speculative ones. The output that matters is not a plausible theory. The output that matters is a replayable finding with evidence, negative controls, and a clean explanation of preconditions. (Red Anthropic)

Fourth, the benchmark should make business logic first-class rather than incidental. Anthropic’s public report treats web logic issues as a real category but does not publish detailed case studies yet. A black-box benchmark should do the opposite: force the system to deal with the categories OWASP and PortSwigger emphasize, including workflow circumvention, MFA enforcement, alternative channels, race conditions, request forgery, authorization drift, and other multi-step stateful behaviors. These are precisely the areas where source visibility gives the most unfair advantage and where deployed behavior matters most. (Red Anthropic)

Fifth, the scoring should punish false confidence. Many AI security systems look strong when judged by “interesting ideas generated.” Buyers need a much stricter metric set: verified high-severity findings per unit budget, false-positive rate, replay success on fresh sessions, percentage of flows mapped correctly, percentage of role boundaries tested, percentage of evidence packets accepted by an independent human reviewer, and success in retesting after remediation. Those are pentesting metrics. Everything else is partial assistance.

A useful public benchmark spec could be summarized like this:

DimensãoAnthropic public methodA stronger black-box benchmark
Starting contextProject plus source code, or reconstructed source plus binaryURL, browser, low-priv accounts, and scope only
EnvironmentIsolated offline container or offline binary analysisLive application or high-fidelity deployment with realistic controls
Primary taskFind and exploit serious vulnerabilitiesDiscover attack surface, infer security model, prove externally visible impact
Hardest categoriesMemory safety, exploit chains, N-day weaponizationMFA bypass, workflow skipping, authorization drift, race conditions, multi-channel auth
Success criterionBug report and exploit or PoCLow-false-positive, replayable finding with stateful evidence and retest path

This table synthesizes Anthropic’s published method and what OWASP and PortSwigger describe as real web testing requirements. (Red Anthropic)

An evidence-first execution contract for such a benchmark might look like this:

{
  "target": "https://tenant.example.app",
  "identities": ["anonymous", "basic_user", "manager_user"],
  "allowed_actions": ["read", "create_test_records", "update_test_records"],
  "forbidden_actions": ["delete_production_data", "mass_email", "payment_capture"],
  "required_coverage": [
    "login",
    "password_reset",
    "mfa",
    "org_switching",
    "invite_acceptance",
    "billing",
    "admin_panel",
    "mobile_api"
  ],
  "required_evidence": [
    "request_response_pairs",
    "fresh_session_replay",
    "negative_control",
    "impact_statement",
    "remediation_retest"
  ],
  "budget": {
    "max_runtime_minutes": 180,
    "max_requests": 1500,
    "max_parallel_flows": 3
  }
}

That kind of contract forces the system to do what real offensive validation requires: preserve state, respect scope, test across identities, and produce evidence another person can audit. It also makes it much harder to hide behind a mountain of clever but unverified suspicions.

Claude Mythos is not real pentesting

How defenders should use Claude Mythos Preview today

Pointing out that Claude Mythos Preview has not publicly proven black-box web pentesting does not make the Anthropic report less important. If anything, it makes the report more useful because it clarifies where the signal actually is. The real signal is that AI exploit research and vulnerability triage are improving fast enough to affect patch windows, disclosure handling, and defensive staffing assumptions.

Anthropic itself makes this case directly. In the “Suggestions for defenders today” section, it says Mythos Preview will not be generally available, but defenders can still use generally available frontier models to strengthen defenses now. It explicitly recommends thinking beyond vulnerability finding into triage, deduplication, reproduction steps, initial patch proposals, cloud misconfiguration analysis, pull request review, and migration work. It also says organizations should shorten patch cycles because the process of turning public identifiers such as a CVE and a commit hash into a working exploit is becoming faster, cheaper, and more automated. (Red Anthropic)

Anthropic’s public write-up also says defenders should tighten patch enforcement windows, enable auto-update where possible, treat dependency bumps carrying CVE fixes as urgent, revisit vulnerability disclosure policies, and automate incident response because more vulnerability disclosures will likely mean more attacker attempts during the disclosure-to-patch window. That is an operational message, not a benchmark message. It is also the part security leaders should probably take most seriously. (Red Anthropic)

So the fair conclusion is not “ignore Mythos until it proves black-box web pentesting.” The fair conclusion is “use Mythos as a warning about exploit research acceleration, but do not mistake that warning for a settled answer to the black-box application-testing question.” Those are compatible views. In fact, holding both at once is probably the most mature reading of the report.

White-box reasoning still needs black-box proof

The most practical takeaway from the Mythos debate is that security teams should stop treating white-box reasoning and black-box validation as substitute products. They are different stages in a stronger workflow.

That split already appears in public technical writing around AI pentesting. Penligent’s public “From White-Box Findings to Black-Box Proof” piece explicitly describes a workflow in which code-aware reasoning and patch direction live on one side, while black-box confirmation of reachability and impact lives on the other. That is the right operational instinct even if you swap in different tools. Source-aware AI can narrow the search. Target-facing validation decides whether the issue is real, reachable, and worth escalating. (Penligente)

Penligent’s public AI pentest tool write-up makes a closely related point in different words: a system that explains nmap output nicely is not automatically doing pentesting; the hard middle of the job is attack paths, exploit validation, and defensible evidence. That framing is more useful than most marketing categories because it forces the buyer to ask the right question. Not “does the model sound smart?” but “can it turn a hypothesis into proof without losing discipline?” The public Penligent site similarly frames the product as an AI-powered pentesting platform, which at minimum is a much closer mental model for this debate than equating repository analysis with external validation. (Penligente)

This is where Claude Mythos Preview is most exciting and most easy to misuse as a talking point. If you plug a model like Mythos into a real offensive workflow, the highest leverage may come not from pretending it solved the entire engagement loop, but from letting it compress the expensive early stages of exploit research and hypothesis generation while a separate target-facing system proves what actually matters in the deployed environment. That is a much narrower claim than “AI pentesting is solved,” but it is also much closer to what serious teams can operate today.

Claude Mythos Preview is a milestone, not yet a black-box verdict

Claude Mythos Preview appears to mark a meaningful jump in AI exploit research. Anthropic’s public record supports that reading. The company has published a methodology for source-visible vulnerability finding, described offline reverse-engineering-assisted work on closed-source binaries, documented autonomous N-day exploitation success, and tied the whole thing to a defensive program rather than a broad public release. Those are not trivial signals. (Red Anthropic)

But the strongest public proof still clusters around high-information environments. Anthropic’s own scaffold runs the project and its source code in an isolated container. Its closed-source section feeds the model reconstructed source plus the original binary and runs offline. Its web logic section lists important categories of bugs but does not yet provide detailed public case studies. That is enough to prove something important. It is not enough to prove everything people want the “AI pentesting” label to mean. (Red Anthropic)

So the cleanest conclusion is also the least fashionable one. Claude Mythos Preview proves that AI exploit research is getting very serious. It proves that defender patch windows and response pipelines should be rethought now, not later. It does not yet publicly prove that frontier models can perform real black-box web pentesting against live internet-facing applications at the level buyers should assume from the phrase. Until a model is publicly shown handling stateful workflows, MFA boundaries, alternative channels, race conditions, authorization edges, and evidence-first replay from the outside in, the honest label remains narrower than the hype. (Red Anthropic)

Further reading and references

  • Anthropic, Claude Mythos Preview and the public cybersecurity capability report. (Red Anthropic)
  • Anthropic, Project Glasswing announcement. (anthropic.com)
  • Anthropic docs, Models overview, including the invitation-only note for Mythos Preview. (Claude API Docs)
  • OWASP, Web Security Testing Guide. (owasp.org)
  • OWASP, Business Logic Testing e Testing for the Circumvention of Work Flows. (owasp.org)
  • OWASP, Testing Multi-Factor Authentication. (owasp.org)
  • PortSwigger, Business logic vulnerabilities e Race conditions. (portswigger.net)
  • JetBrains, TeamCity 2023.11.4 security update and NVD record for CVE-2024-27198. (The JetBrains Blog)
  • ConnectWise, ScreenConnect 23.9.8 security fix and NVD record for CVE-2024-1709. (ConnectWise)
  • NVD record for CVE-2023-34362 and the CISA-FBI public notice on MOVEit exploitation. (nvd.nist.gov)
  • FreeBSD advisory and NVD entry for CVE-2026-4747. (The FreeBSD Project)
  • Penligent, Claude Code Security and Penligent, From White-Box Findings to Black-Box Proof. (Penligente)
  • Penligent, AI Pentest Tool, What Real Automated Offense Looks Like in 2026. (Penligente)
  • Penligent homepage. (Penligente)

Compartilhe a postagem:
Publicações relacionadas
pt_BRPortuguese