Skip to main content

Jul 1, 2026

We Attack Our Own Models First

The detection model is itself an attack surface, so before it ships our own engineers spend weeks trying to defeat it

Most teams test a detection model by measuring how well it catches attacks. That tells you it works against the threats you already thought of. It says nothing about the threats an adversary will design specifically to slip past it.

A model that decides what's malicious is a target in its own right.

Red-Teaming Our Own Detection

Before a detection model ships, Abnormal's engineers take the attacker's seat. They probe it the way a motivated adversary would: crafting inputs to find the blind spot, mapping where the decision boundary sits, generating attack variants tuned to score just under the threshold. The goal isn't to confirm the model works. It's to make it fail on our terms, in a lab, before anyone's inbox is on the line.

What Adversarial Pressure Surfaces

This pressure exposes what accuracy metrics hide. A model can post excellent numbers and still collapse against an input shaped to exploit how it reasons. Find that input internally and you close the gap before it becomes a vulnerability. Skip the exercise and you've shipped an attack surface with a detection model bolted on top.

It also changes how the model gets built. Knowing your own red team is coming for it forces design choices that assume an intelligent, adaptive adversary from the start, rather than a static list of known-bad to match against.

Attackers were always going to test these models in production. We'd rather be the ones who find what breaks first.

See the latest from Abnormal's product and engineering teams.

Protect Against Evolving Email Threats

See how behavioral AI detects attacks that legacy defenses miss.