The One-Line Truth

XBOW deploys thousands of parallel AI agents to find and exploit real vulnerabilities in web applications continuously, validating every finding with reproducible proof-of-concept exploits before surfacing it to a security team.


The Role: CISO / Head of Application Security / VP of Engineering Founded: January 2024 | HQ: Seattle, WA (distributed globally; founder based in Malta) | Funding: $270M+ ($155M Series C led by DFJ Growth and Northzone, plus earlier rounds from Sequoia Capital, Altimeter, NFDG Ventures) Founder: Oege de Moor (CEO, creator of GitHub Copilot and GitHub Advanced Security; DPhil Oxford, formerly Professor of Computer Science at Oxford; founded Semmle, acquired by GitHub in 2019). CISO: Nico Waisman (formerly CISO at Lyft; assembled the offensive security team that trained XBOW's autonomous system)


The Disruption Connection

In December, The Heed Report mapped how AI-accelerated software development was compressing the gap between code shipped and code tested. The same generative tools increasing development velocity were also expanding the attack surface faster than human security teams could audit it. XBOW is the offensive response.

The problem is structural. Developers using AI coding assistants now ship code at a pace that annual or quarterly penetration tests cannot govern. Every release introduces new endpoints, new logic, new potential entry points. Traditional pentesting was built for a world of infrequent releases. That world ended when AI started writing the code.


The Problem It Kills

Manual penetration testing is one of the most expensive and least scalable practices in enterprise security. A traditional engagement can cost $10,000 to $35,000 or more, takes weeks to schedule, and delivers a point-in-time snapshot that begins aging the moment the report is written. The average enterprise ships code daily. The average pentest happens annually. That gap is where attackers live.

The human talent shortage compounds the problem. Offensive security researchers capable of finding complex, chained exploits are among the most scarce professionals in technology. The result is a market where organizations either pay top dollar for infrequent manual tests or rely on dynamic application security testing (DAST) scanners that generate noise without confirming whether a vulnerability is actually exploitable.

XBOW compresses what a senior pentester does in weeks into hours. Pricing starts at $4,000 per test with results delivered within five business days. For organizations running continuous assessments, the economics shift from "can we afford to test" to "can we afford not to."


Who This Is For / Who Should Skip It

Build with this if: You run application security for an organization shipping code frequently, you need pentest reports that satisfy SOC 2 or ISO 27001 compliance requirements, you want continuous offensive validation rather than annual point-in-time snapshots, or your security team lacks the headcount to run regular red team exercises against a growing application portfolio.

Skip this if: Your primary security concern is network infrastructure and Active Directory rather than web applications (Horizon3.ai's NodeZero is stronger in that domain), you need deep business-logic testing that requires intimate knowledge of organization-specific access rules, or you are looking for mobile or cloud infrastructure testing (XBOW's current scope is web-application-shaped, with API, mobile, and cloud expansion planned for late 2026).


How It Actually Works

Minute 1. You submit a target URL through the XBOW console or API. You set boundaries, authentication credentials, and optional context to guide testing. The interface is sparse and technical, built for security professionals rather than general IT staff. Setup is fast if you know what you are scoping.

First Hour. XBOW's Coordinator begins mapping the application, identifying entry points, and planning attack paths. It deploys thousands of parallel agents, each with fresh context and a focused objective. Unlike a monolithic scanner, these agents are short-lived and retired after completing their mission. This prevents context collapse, a common failure mode where an AI accumulates incorrect assumptions over a long-running task. The system handles authentication flows autonomously, re-authenticating if sessions terminate, and navigates SSO, MFA, and token-based access without manual configuration.

First Week. Findings arrive validated. Each vulnerability report includes a proof-of-concept exploit that was safely executed against the target, step-by-step reproduction instructions, and remediation guidance. No theoretical risk scores. No noise. The Validator layer applies deterministic logic to confirm every finding before it reaches your team. XBOW's reports meet penetration testing requirements for SOC 2, ISO 27001, and other compliance frameworks. For teams integrating into Microsoft environments, the XBOW Pentest Manager Agent plugs directly into Microsoft Security Copilot and Sentinel, surfacing offensive insights within existing defensive workflows.


The Features That Matter

The Coordinator-Agent-Validator Architecture. The Coordinator maintains a global view of the target environment and directs testing. Autonomous agents explore in parallel. Validators confirm exploitability using deterministic logic. This three-layer separation means AI does the creative discovery, but proof, not probability, determines what gets reported.

Exploit Chaining. XBOW does not just find individual vulnerabilities. It chains them. In one documented case, the system chained a low-severity blind server-side request forgery into a complete arbitrary file read capability by crafting malicious image files and exploiting specific parsing behaviors across multiple steps. This is the kind of finding that separates autonomous pentesting from automated scanning.

Agent Retirement. Each agent is purpose-built for a specific task and retired after completion. Fresh agents start with clean context. This design choice prevents the "hallucination drift" that plagues long-running LLM sessions and is a direct technical response to the known failure modes of generative AI in high-stakes environments.

Business Logic Detection. XBOW identifies vulnerabilities that DAST scanners typically miss, including Insecure Direct Object References (IDOR) and Broken Object Level Authorization (BOLA). The system understands multi-tenant boundaries and can distinguish between the permissions and contexts of different user roles.

Microsoft Security Ecosystem Integration. Announced at RSAC 2026, the XBOW integration with Microsoft Security Copilot and Microsoft Sentinel embeds autonomous pentesting directly into the tools enterprise security teams already use. Shawn Bice, then Corporate Vice President of Security Platform and AI at Microsoft, noted at the time that the integration connects offensive insights directly into existing defensive workflows.

Compliance-Ready Reporting. Every finding includes reproducible evidence. Reports satisfy SOC 2, ISO 27001, and PCI DSS requirements. The emphasis on proof-based validation rather than probabilistic risk scoring means security teams spend time on remediation, not triage.


Real Cost

Penetration testing starts at $4,000 per test for web application assessments with supported API coverage. Reports are delivered within five business days. Standalone API and mobile testing are planned for 2026 but not yet available.

Enterprise pricing for continuous assessments is not publicly listed. Contact the sales team for continuous validation pricing.

For context, traditional manual penetration testing engagements typically run $18,000 to $30,000+ per system depending on scope and depth. XBOW's entry-level pricing represents a significant reduction, though the comparison is imperfect: manual testers bring contextual business logic understanding that autonomous systems currently struggle to match, and many compliance frameworks still require human attestation alongside automated findings.

The honest cost picture: XBOW does not replace manual pentesting entirely. It replaces the periodic, expensive, human-bottlenecked cadence with continuous autonomous coverage, and reserves human expertise for the highest-complexity assessments where business logic and contextual judgment matter most.


What Customers Say

SentinelOne uses XBOW as both a customer and an investor through S Ventures. Alex Krongold, Director of Corporate Development and Ventures at SentinelOne, describes the operational impact directly: the platform surfaces exploitable findings at machine speed, functioning as a scalable extension of internal red team operations. SentinelOne views XBOW as a critical input for hardening its defensive telemetry.

Samsung Ventures participates as both a strategic investor and a customer. A Samsung Ventures America representative stated that the platform surfaces real-world risks with speed and precision. Samsung now serves as a preferred reseller for XBOW in the South Korean market.

Moderna, the biotechnology company, is publicly cited as a customer. In life sciences, where the security of intellectual property and sensitive data is paramount, XBOW's non-destructive validation approach allows testing of critical applications without risk of downtime or data corruption.

Seznam.cz, the Czech Republic's leading web portal and search engine, is another prominent customer deploying XBOW for continuous validation against a high-traffic, rapidly changing attack surface.

The customer pattern is consistent: organizations with large application portfolios and frequent release cycles adopt XBOW to close the gap between deployment speed and security coverage. No independent customer reviews on platforms like G2 or Gartner Peer Insights were available at the time of this writing.


The Competitive Read

Pentera is the most mature competitor in automated security validation, but its focus is infrastructure-level exposure rather than application-layer exploitation. Organizations whose primary risk surface is network infrastructure and Active Directory may find Pentera stronger. XBOW typically demonstrates deeper application-layer coverage and more sophisticated exploit chaining.

Horizon3.ai (NodeZero) is a well-established autonomous network pentesting platform with over 170,000 tests completed. It excels at credential-based attacks, lateral movement, and Active Directory audits. Where Horizon3 targets the network layer, XBOW targets the application layer. They are more complementary than competitive for organizations with both concerns.

Escape competes directly in the agentic web and API security category, with particular strength among engineering-led organizations that want findings delivered directly into Jira and CI/CD pipelines. For teams prioritizing developer workflow integration over depth of exploitation, Escape is worth evaluating.

Cobalt and Synack represent the traditional crowdsourced and managed pentesting models. Their strength is the human signature on high-stakes compliance audits. Their limitation is cost ($18,000+ per system) and scheduling constraints that make continuous testing impractical.

XBOW's positioning is clearest against DAST scanners. Traditional DAST relies on static payload lists without understanding application logic, frequently gets logged out of authenticated sessions, and produces high volumes of unvalidated alerts. XBOW's adaptive, AI-driven approach handles authentication autonomously and validates every finding with proof-of-concept exploits before surfacing results.


The Honest Verdict

Excellent for: Enterprise application security teams that need continuous offensive validation at machine speed, organizations subject to PCI DSS 4.0 or DORA requirements mandating continuous technical controls testing, and any team whose application release cadence has outpaced its pentest schedule.

Breaks at: Complex, organization-specific business logic where the system cannot distinguish whether a particular user should access a particular resource under rules that exist only in human understanding. The AI excels at technical exploitation but struggles with the contextual judgment a senior human tester brings to scenarios governed by nuanced access policies. Additionally, XBOW's current scope is web-application-focused. Organizations needing network, mobile, or cloud infrastructure testing will need complementary tools until XBOW's expansion roadmap delivers.

Trajectory: The Microsoft Security Copilot and Sentinel integration, moving from RSAC 2026 public preview toward general availability, positions XBOW to become a native feature within the security stack of thousands of Microsoft enterprise customers. The $35M Series C extension from Accenture Ventures, DNX Ventures, Liberty Global Tech Ventures, NVentures (NVIDIA), Samsung Ventures, and SentinelOne S Ventures in May 2026 funds Asia-Pacific expansion through Samsung's reseller channel and DNX Ventures' regional network. WonLae Lee's appointment as General Manager for South Korea signals the geographic priority. The planned expansion into standalone API testing, mobile application testing, and cloud infrastructure assessment in late 2026 would broaden the competitive surface against Pentera and Horizon3.ai. The company has grown to 250+ employees and continues to scale across engineering, go-to-market, and operations. If the Microsoft integration reaches general availability and the testing scope expands as planned, XBOW is positioned to define autonomous offensive security as a category rather than a feature.


Set It Up with AI

Prompt 1: Scope Definition "I need to prepare a penetration testing scope document for an autonomous AI pentesting tool. My web application is [describe: SaaS platform / e-commerce / internal tool / API-first]. List the target URLs, authentication methods (SSO, OAuth, API keys, session tokens), sensitive endpoints I should explicitly include or exclude, and any rate-limiting or production-safety constraints I should configure before launching an autonomous assessment."

Prompt 2: Findings Triage Framework "I have received a penetration testing report containing [X] validated findings across critical, high, medium, and low severity levels. Help me build a remediation prioritization matrix that weighs: exploitability (proof-of-concept provided vs. theoretical), blast radius (number of users or data records affected), regulatory exposure (PCI DSS, SOC 2, HIPAA, DORA implications), and engineering effort to fix. Output a ranked list with estimated remediation timelines."

Prompt 3: Compliance Mapping "Map the following penetration testing findings to the specific controls they satisfy across SOC 2 Type II, ISO 27001 Annex A, PCI DSS 4.0 Requirement 11, and DORA Article 26 (Threat-Led Penetration Testing). For each finding, indicate whether the proof-of-concept evidence in the report meets the documentation standard required by each framework."

Prompt 4: Continuous Testing Architecture "Design a continuous penetration testing workflow for an engineering team shipping [daily/weekly] releases. Include: trigger points in the CI/CD pipeline where autonomous testing should run, criteria for blocking a deployment based on findings severity, escalation paths for critical and high findings, and a quarterly review cadence for assessing the autonomous testing program's coverage against our evolving attack surface."


Sources

Independent and Third-Party Sources

Customer-Attributed Sources

Company Sources

Day 27 of 30. Tomorrow: Dash0 -- Day 28 lands in the Foundation layer.