How Abnormal Taught AI Agents To Write Detectors

Key Insights

AI Detection Agents move from a customer-reported miss to a deployed detector in hours, eliminating manual analyst review queues.

Agents apply 'second-order thinking'—matching behavioral characteristics like domain age or novelty rather than raw values that attackers can rotate.

Good detectors require both statistical separation of attacks from normal traffic and semantic alignment with the specific attack type.

False positives from broad historical traffic evaluation feed directly into iterative refinement, sharpening detector precision before deployment.

Trusted-platform abuse attacks can pass SPF/DKIM authentication, making behavioral signals more reliable than authentication checks alone.

Most automatically generated detectors fail the same way: they learn to match specific examples rather than recognize the underlying attacker intent. They match on sender domains or subject line keywords, so the next campaign variant—with a different domain, a reworded subject, and the same underlying attack—passes through undetected.

At Abnormal, our engineering team has spent years working on this problem. Now, we've built AI Detection Agents that automatically write and deploy detectors from customer-reported misses in Detection 360. Getting the agents to generate something syntactically valid is straightforward. The challenge was pushing them to reason at the right level of abstraction, prioritize behavioral signals over surface features, and identify patterns that generalize rather than ones that do not.

This post explores how we built the agents, how we keep them grounded in real data rather than assumptions, and the evaluation pipeline that gates deployment on real-world performance.

From Submission to Protection: How It Works

1. Understand the Attack in Context

To write an effective detector, the agent first needs to figure out what actually makes a message malicious in a specific environment. This is where Abnormal’s Deep Research Agent comes in. It takes the submitted message and performs a thorough analysis, synthesizing everything known about it: who sent it, how they normally communicate, how the message compares to typical organizational patterns, and what signals point toward intent or impersonation. The output is a detailed behavioral profile of the attack that the rest of the system builds on.

Let’s take a look at an example.

This attack is easy to identify, but the challenge lies in determining which threat signals are truly predictive. Is it the grammatical errors and urgent tone, highly atypical for a brand like PayPal? Or the fact that the listed support number (+61) 180 084 5132 diverges from PayPal's official line (1-888-221-1161)? Even the delivery method, a Microsoft Teams meeting invite instead of a standard email, serves as a critical behavioral indicator.

These signals are what the Deep Research Agent tries to decipher. In fact, these are the exact signals it identified when it first evaluated the above problem:

Based on the research output and sample messages, I can identify:

**Attack Technique**: Brand impersonation (PayPal) via Microsoft Teams event registration abuse

- Attacker created fake Teams event with PayPal phishing content embedded in the event title/subject

- Uses legitimate Microsoft Teams infrastructure (teams-events.com) to send automated event registration confirmations

- Subject line contains urgent financial threat + phone number for fake support

- Sender domain: jorjarounsevell.s08.usa1.teams-events.com (subdomain of legitimate teams-events.com)

- Authentication passes (SPF/DKIM) because it's sent from legitimate Microsoft infrastructure

- Body links are all legitimate Microsoft domains, but the event content itself is malicious

This is a **trusted platform abuse** attack - using Microsoft Teams' event registration system to deliver phishing content.

Once satisfied with its analysis, the Deep Research Agent hands off its findings, and the Adaptive Detection Agent takes over, using the behavioral profile to detect the attack using our system.

2. Generate a Detector Specific to Your Organization

With analysis in hand, the Adaptive Detection Agent finds the attributes that represent the underlying behavior behind the attack. It does this by looking for attributes that are relevant in two ways:

Statistically relevant. There's no point in using a signal if it doesn't truly distinguish the target emails from normal traffic for a given recipient. This is something that previous, simpler systems were actually great at. They could crunch the numbers across massive volumes of traffic and surface statistically interesting patterns all day long. The problem is that without any deeper understanding, teams end up with rules that are technically correct but practically useless. A system might flag that 100% of attacks in a cluster lack an attachment, which is true but meaningless since most legitimate emails don't have attachments either. Statistical relevance alone produces a lot of noise without the next critical piece:

Semantically relevant. The agent won't waste time looking at attributes such as CC'd recipients if they aren't meaningful to the attack type in question. For humans, this part comes naturally. We have the context to understand what each attribute does and can tell right away whether it matters for a given attack. But doing this by hand across thousands of attack patterns and millions of mailboxes, each with their own version of normal, is incredibly slow and impossible to scale.

The Adaptive Detection Agent is able to walk both of these lines. It has enough statistical sense to tell whether an attribute actually separates attacks from normal traffic, and enough contextual understanding to know whether that attribute is worth examining in the first place.

But finding the right attributes is only half the problem. Once the agent identifies a signal like the sender's email address as relevant, it still needs to decide how to use it. This is where we push the agent toward what we call second-order thinking: instead of matching on the raw value, we match on a behavioral characteristic of that value.

For example, the agent might notice that all the attacks in a cluster came from the same sender domain. The naive approach is to write a rule that matches on that exact domain. But when the attacker rotates it next week, the rule becomes useless. The second-order approach is to ask: what's true about the domain that makes it suspicious? Is it a domain the organization has never seen before? Is it newly registered? Those characteristics will still hold when the attacker switches to a different domain next time.

As a technical note, this is the piece that generative models tend to struggle with and something we spent a lot of time tweaking. Left to their own devices, they latch onto the most distinctive surface features of the example they were shown. We worked to guide the agent toward behavioral abstractions over raw values, and its tools let it verify those intuitions against real data before committing to a rule. The result is detectors that generalize well beyond the specific messages they were built from.

Sticking with our example from earlier, here are some emails caught by the same detector. Notice the changes between the emails—different senders, distinct invite platforms (Google Meet vs. Microsoft Teams), and a wide range of products and organizations.

3. Test It Against Your Historical Data

At this point the agent has a set of attributes it believes are both statistically and semantically relevant to the attack. But believing and knowing are two different things. Before a detector goes live, the agent needs to prove it actually performs reliably against real traffic.

This happens in two stages. First, the agent runs a fast local evaluation. It takes the attack messages it already identified along with a random sample of normal traffic for the recipient and tests the detector against them. This is intentionally lightweight. The goal is to iterate quickly, tweaking which attributes are used and how they're weighted until the detector reliably catches the attacks without flagging the safe messages in the sample. Think of it as a tight feedback loop where the agent can try something, see how it performs, and adjust without waiting on a full evaluation every time.

Once the detector succeeds locally, we run a much broader evaluation across a larger slice of the customer's real traffic. This is where we really stress test it. Every message the detector would have flagged gets reviewed, and we LLM-label each one to determine what it actually is. This serves two purposes. First, it lets us catch any false positives hiding in the broader traffic that didn't show up in the smaller sample. Second, it shows us whether the detector was able to generalize beyond the original attacks. Did it find other instances of this attack type that were previously missed? If so, that's a strong signal that the detector is capturing the underlying behavior and not just memorizing the specific messages it was built from.

The false positives found during this deep evaluation are particularly valuable, and they feed directly into the next step.

4. Optimize for Precision

The false positives uncovered during the deep evaluation aren't just a report card. They're the training data for the next iteration. The agent takes each flagged message that turns out to be safe, examines why the detector caught it, and adjusts accordingly. Maybe the rule was too broad on sender reputation and needs to be tightened. Maybe it was weighting a particular attribute too heavily and catching edge cases that don't actually look like the attack. Each false positive tells the agent something specific about where the detector's boundaries need to shift.

The agent loops through this cycle multiple times, refining the detector with each pass until it can reliably catch the attack without compromising on safe emails. Every iteration makes the rule a little bit sharper and more precise.

5. Deploy with Full Transparency

Once the agent has iterated its way to a detector it's confident in, it doesn't just ship it silently. It provides a detailed overview of the rule, explaining what it's looking for, why those specific attributes were chosen, and how they work together to catch the attack. Everything the agent learned through the analysis, attribute selection, and testing phases gets distilled into a clear summary that ties the detector back to the original attack.

For example, the agent might report something like:

'We have updated our system to detect additional unsolicited academic and conference solicitation scams that use flattering invitations to extract publication or registration fees from employees.',

‘This detector targets advance-fee spam campaigns that use image-based links to evade detection. It identifies messages from unfamiliar senders that combine advance-fee fraud vocabulary with suspicious image links and authentication failures.’,

Or:

'We have updated our system to detect additional phishing attacks that use online form platforms to harvest credentials and sensitive information from employees, particularly those impersonating internal departments like HR.'

Each description reflects the specific attack behavior the agent identified, the attributes it selected, and the testing it ran to make sure the detector holds up. You can trace a straight line from the original suspicious emails all the way through to what the detector is actually doing and why.

Closing the Detection Loop

With the Adaptive Detection Agent, the gap between reporting a missed attack and being protected against it is measured in hours, not days. Every submission kicks off the full loop: analysis, detection, testing, refinement, deployment. No analyst queue, no manual review, no waiting.

And it compounds. Every detector the agent writes makes the system smarter and harder to evade. The attack that got through today becomes the protection that catches the next variant tomorrow.

Identity & AI Security

Platform

Product Resources

Support

Featured

Thought Leadership

Featured

Company

News & Events

Featured

How Abnormal Taught AI Agents To Write Detectors

Key Insights

From Submission to Protection: How It Works

1. Understand the Attack in Context

2. Generate a Detector Specific to Your Organization

3. Test It Against Your Historical Data

4. Optimize for Precision

5. Deploy with Full Transparency

Closing the Detection Loop

Protect Against Evolving Email Threats