AI just found 10,000 bugs in a month. Finding them is the easy part

A month into Anthropic’s Project Glasswing, the numbers are in, and they settle an argument the security industry has been having for years.

Claude Mythos Preview, working with about 50 partner organizations, found more than 10,000 high or critical vulnerabilities. Cloudflare alone surfaced 2,000 bugs, 400 of them high or critical, with a false-positive rate better than human testers. Mozilla found 271 in Firefox, ten times what an older model caught. When Anthropic pointed Mythos at 1,000+ open-source projects that hold up internet infrastructure, it found 6,202 high and critical vulnerabilities.

Here’s what those numbers actually prove: AI discovery works. The question of whether frontier models can find real, exploitable bugs at scale is answered. They can.

So the bottleneck moved. It used to be finding the bugs. Now it’s fixing them.

Discovery scaled. Remediation didn’t.

Anthropic’s own disclosure process showed the constraint better than any slide could. The average high or critical bug takes about two weeks to patch, and even at a deliberately measured disclosure pace, their pipeline backed up. The NVD was already struggling to keep up with the CVE inflow before any of this. Exploitation timelines have collapsed from years to hours.

Now picture AI-driven discovery running across the entire software ecosystem at once. Every defender gets a list of thousands of confirmed findings. And then what?

This is the part people underestimate. A finding tells you where a bug is. It doesn’t tell you what fixing it will break.

Every critical patch is a decision tree. What business rules does this code enforce? What depends on it, directly and indirectly? Is there threading involved? A compliance requirement? Some downstream system quietly relying on the current soft-fail behavior? Patch in the wrong order and you break the audit trail. Refactor the wrong way and you blow an SLA. None of that is a speed problem you can solve by finding bugs faster.

And the failure mode is worse than slow. A fix that looks complete but respects none of the system’s actual constraints ships a regression that wasn’t there before. You close one CVE and open another. The fastest patch is the one that didn’t introduce a new vulnerability while fixing the old one.

So the real gap is understanding, not capacity. You can’t safely fix what you don’t fully understand, and most large codebases stopped being fully understood by any one person years ago.

Why patching is an understanding problem

When teams hit this wall, they usually reach for one of three things, and each falls short on its own.

Static analysis is precise but blind to intent. It maps what the code does, not what the business needs it to do, so it can’t tell you which “correct” fix violates a rule no one wrote down.

AI is fast and broad but hallucinates. Point it at a patch with no grounding in your actual system and it will confidently produce something that looks right and isn’t.

Senior engineers have the judgment to catch all of this, the tribal knowledge of what’s safe to touch. But they don’t scale to thousands of findings, and that knowledge usually lives in a handful of heads.

At Swimm, we don’t pick one. Swimm is a product-enabled service company. We combine deterministic analysis for accuracy, AI for scale, and SMEs to catch what tools miss. For remediation, that means three layers working together.

The first layer is deterministic analysis. Our proprietary engine maps the blast radius of every finding: what depends on what, which callers are direct, which are indirect, which paths touch business logic. That’s the factual base a remediation team works from, before anyone changes a line.

The second layer is AI, anchored to that map. Our agents extract the constraints, contracts, and rules each patch has to respect, then index them so they’re queryable. Not a guess about your system. Ground truth your team and your own AI tools can pull from.

The third layer is people. Senior engineers validate every patch in context and catch the cases where the textbook fix breaks a threading model, an SLA, or a compliance rule. This is the layer that keeps your incident response from shipping a new bug while closing an old one.

Build the understanding layer before you need it

The output of this isn’t a stack of reports. It’s a knowledge base that lives in one workspace, yours to keep: architecture and dependency maps, blast-radius analysis per finding, extracted business logic, documented critical flows, all queryable over MCP by your engineers, your AI tools, and your SOC.

We build it in stages, fixed price per stage, with validation evidence at each one, so you commit one step at a time instead of betting everything up front. Assessment scopes the work. Specification extracts the rules and flows your patches have to respect. Modernization builds the knowledge base itself. Enablement hands it off with parity test suites, remediation playbooks, and training so your team keeps patching safely as the code evolves.

The uncomfortable truth in the Project Glasswing data is that our vulnerability management infrastructure was built for a world where humans found bugs at human speed. That world is gone. Discovery is now effectively unlimited. Patch cycles, disclosure timelines, and remediation workflows were not built for that, and they’re the next thing to break.

The teams that come through this well won’t be the ones with the longest list of findings. Everyone will have that. They’ll be the ones who can act on the list without breaking what already works. That takes an understanding layer, and it’s a lot easier to build before the flood than during it.

If you’re staring at a backlog of findings and a codebase no one fully understands anymore, that’s the conversation to have. Reach me at oren@swimm.io.

Oren Toledano

Co-founder & CEO

Oren is the Co-Founder and CEO of Swimm. Before Swimm, Oren co-founded and led Israel Tech Challenge, a non-profit organization that trains and connects talented professionals with the most in-demand skills in tech.

AI just found 10,000 bugs in a month. Finding them is the easy part

Discovery scaled. Remediation didn’t.

Why patching is an understanding problem

Build the understanding layer before you need it

Keep reading

Agentic modernization, delivered. Accurate, complete, on time.