Abnormal AI Innovation: High-Scale Aggregation Systems in Production

At Abnormal AI, detecting malicious behavior at scale means aggregating vast volumes of signals in realtime and batch. This post breaks down how we implemented the Signals DAG across both systems to achieve consistency, speed, and detection accuracy at scale.

Fang Deng

July 9, 2025

In Part 1 of this series, we introduced the problem of ML pipeline entanglement, where “changing anything” could “change everything” in naively stood-up ML pipelines with dense dependencies. We introduced the concept of a Signals DAG (Directed Acyclic Graph) as Abnormal AI’s solution to the problem: by building a framework that requires every signal extraction function to declare its inputs and outputs, signal relationships are explicitly modeled and dependencies are fully specified.

Dense dependency graph (left) and sparse dependency graph (right)

While Part 1 introduced the theory, this post explores how it’s put into practice. The Signals DAG now runs in 3 production systems at the core of our Detection Engine: 2 online, serving up to 35k QPS, and 1 batch job processing 3TB of data daily.

A Tale of 3 Services (and 1 DAG)

Why does Abnormal AI’s Detection System need this architecture and where does the Signals DAG fit in? To answer this question, we’ll start with the core ML problem in email security. For every email, we need a way of determining if it’s an attack or safe. For every email, we need to determine if it poses a threat or is safe to deliver without disrupting legitimate mailflow. This scoring is performed by Abnormal AI’s Realtime Scorer. This scorer needs to be both high-precision and high-recall to catch as many attacks as possible while minimizing disruption to safe mailflow. As discussed in Part 1, this is not an easy problem, and our team continues to explore various methods ranging in sophistication to improve our Detection System.

The core of the Realtime Scorer is an ensemble of ML models and LLMs, each taking in some set of features. Aggregate Features derived from patterns across emails rather than any single message are especially powerful in predictive power. To illustrate this, consider a feature like the frequency of correspondence between a sender and recipient. On one hand, you can see how this signal cannot be computed from a single data point. On the other, you can intuit how this will be a powerful signal for modeling normal behavior, allowing our models to flag abnormalities.

These Aggregate Features must be precomputed—that’s the role of Realtime and Batch Signal Aggregates. As 2 implementations of our Signal Aggregates System, they ingest data, extract signals, and compute aggregates in either realtime or batch. These aggregates are upserted into data stores that the Realtime Scorer then queries as a client.

Realtime Scorer queries precomputed aggregates built in realtime and batch

All 3 systems run separate instances of the same Signals DAG, ensuring that signal extraction during scoring and aggregate precomputation follows identical flows. This eliminates drift and ensures data consistency. Right now, the Signals DAG is constructed with primitives provided by a Python library we built in-house.

Signal Aggregates

Realtime and Batch Signal Aggregates were designed to solve complementary halves of the same problem. The Realtime System is a streaming service optimized for low-cardinality, time-sensitive features. The Batch System, on the other hand, is offline by design, built for high-cardinality Aggregate Features where freshness is less critical.

Realtime Signal Aggregates System design

Realtime Signal Aggregates was built to support features like “how many emails from a given sender were detected as threats in the past hour.” It ingests a Kafka stream of email events, transforms them into storage update instructions via the Signals DAG Executor, and processes those instructions using a Go-based Kafka consumer. The storage update logic is encapsulated in its own Go microservice to leverage Go’s performance and maintain a clean separation of concerns. At peak traffic, Realtime Signal Aggregates handles up to 35k QPS, the same volume as the Realtime Scorer.

Batch Signal Aggregates System design

Batch Signal Aggregates was designed for Aggregate Features like “how often a sender email address corresponds with a recipient address over the last 180 days.” The system ingests email logs and builds Aggregate Features using Spark, orchestrated by Airflow. As Abnormal AI grew, Batch Signal Aggregates had to be scaled up and redesigned to handle its current processing load of 3TB of data per day. For more on these scaling challenges and design decisions, stay tuned for a future post.

Ultimately, the Signal Aggregates Systems enabled a new level of expressiveness, one that translated directly into improved detection efficacy, with significant gains in both precision and recall.

Design Decisions and Future Plans

This section covers how the Signal Aggregates System was built, highlighting key design decisions, current work, and future plans.

Separation of Realtime and Batch

When we built Batch and Realtime Signal Aggregates, we decided to stand them up as separate systems for velocity. Before embarking on this, we considered a number of off-the-shelf solutions but found them ultimately unsuitable for Abnormal’s needs. We continued with a prototype phase to derisk the project, proving that the new Signal Aggregates Systems could improve topline precision and recall before committing engineering resources.

Building Batch and Realtime Signal Aggregates in parallel shortened time to value, but introduced a leaky abstraction, requiring ML engineers to account for differences between batch and realtime implementations when adding new features. In the future, we plan to unify the systems under a Lambda architecture. This not only creates a system that “just works” for researchers, but also one that builds aggregates more robustly and accurately—ultimately improving detection efficacy.

Lambda architecture for aggregates across time batches

Big data that gets bigger

A major engineering challenge at Abnormal AI stems from the fact that we’re constantly hitting scaling bottlenecks driven by rapid business growth. As a result, every system we build must use technologies that not only meet current needs but also scale with us. These choices should ideally make scaling to the next order of magnitude possible without making operating at our current scale onerous. A full deep dive on our tech stack and design decisions could fill its own blog post, but here we explore a few of our critical decisions:

We chose Kafka for realtime streaming because it’s a battle-tested framework that can scale to millions of QPS with built-in data partitioning.
We used Spark for batch data processing due to its scalability, fault tolerance, and open-source nature. This lets us tap into a broad ecosystem of shared knowledge and gives us the flexibility to choose between managed and self-hosted deployments, mitigating vendor lock-in.
We chose Redis for our data store because we needed high write throughput and did not require complex relations or range queries. Redis also provides a variety of native data structures we can use out of the box.
We used Python to build the Signals DAG and extraction framework to ensure interoperability with ML libraries and tooling.

This post should provide a comprehensive overview of how Abnormal AI implements function composition in production. What began as an abstract concept—the Signals DAG—is now a core part of 2 high-scale production systems powering our detection engine.

If you're passionate about solving meaningful engineering challenges, we encourage you to explore open roles on our careers page.

To explore the email security solutions powered by Abnormal’s AI-native innovation, schedule a demo today!

Schedule a Demo

Discover How It All Works

See How Abnormal AI Protects Humans

Abnormal AI Innovation: High-Scale Aggregation Systems in Production

Thought Leadership

From Compliance to Culture: What CISOs Need to Know About Evolving SAT

Discover how modern CISOs are evolving security awareness training from a compliance checkbox into a strategic, AI-powered program that drives behavior change and builds a security-first culture.

Threat Intel

Global Email Threat Landscape: Eye-Opening VEC and BEC Engagement Trends by Region

Regional analysis of 1,400+ organizations reveals how geography shapes email security risks. See which regions are most vulnerable to VEC vs BEC.

Credential Phishing

How HTML and JavaScript Fuel Modern Phishing: 3 Real-World Examples

Explore real phishing attacks that use HTML and JavaScript to bypass defenses and learn what makes these emails so hard to detect.

Threat Intel

Custom Phishing Kits: How Cybercriminals Create Bespoke Brand Impersonation Attacks

Brand-specific phishing kits are replacing generic templates. Learn how these custom phishing kits enable sophisticated impersonation attacks.

CISO Insights

When Trust Becomes a Threat: Securing Healthcare at the Human Level

Discover how healthcare security leaders are defending against AI-powered threats. Learn why identity and email are the new frontlines—and what it takes to protect the human element.

Abnormal AI Innovation: High-Scale Aggregation Systems in Production

A Tale of 3 Services (and 1 DAG)

Signal Aggregates

Design Decisions and Future Plans

See Abnormal in Action

Get the Latest Email Security Insights

See How Abnormal AI Protects Humans

Related Posts