Applying software engineering discipline to threat detection

At some point, most security teams will have a rule that fires 4,000 times in a single night. Nobody knows when it was changed, who changed it, or what it was supposed to catch in the first place. The post-mortem reveals that someone edited it directly in the SIEM console six weeks earlier, with no documentation, no peer review, and no way to roll back. This is the default state of detection engineering in most organizations. It has a name: tribal knowledge, or more precisely, the absence of any engineering discipline applied to what is, structurally, a software problem.

Detection as Code is a methodology, not a tool, a vendor category, or a buzzword. It is the answer to that problem.

cover

In brief

Detection rules are software artifacts: they deserve version control, peer review, automated testing, and a deployment pipeline.
Sigma is the closest thing the industry has to a vendor-neutral detection language, compiling to Splunk SPL, Sentinel KQL, Elastic EQL, and more.
A CI/CD pipeline for detections catches logic errors before they reach production, and provides a full audit trail when something goes wrong.
Detection as Code fails without clean telemetry: version-controlling garbage rules is still garbage, just with a better commit history.
Starting small works: a handful of high-value rules in Git with basic tests already eliminates the most damaging failure modes.

The gap that detection engineering refuses to acknowledge

Compare how your development team ships code with how your detection team ships rules. Developers branch, test, review, and deploy. If something breaks, they revert in seconds. If someone asks what changed, the answer is in the commit history. Your detection team, in all likelihood, clicks a rule into the SIEM console, saves it, and hopes for the best.

The process gap here is not cosmetic. It maps exactly to the same set of problems that software engineering solved decades ago. No version control means no history and no rollback. No tests mean the rule fires on legitimate traffic at 3 AM and nobody knows why. No peer review means a logic error survives because the author was the only person who read the query. No deployment pipeline means “staging” and “production” are indistinguishable concepts. Rapid7 summarized this gap into a blunt comparison: software engineering teams operate from Git with automated validation; detection engineering teams operate from a wiki and a Save button. One of these models scales. The other one collapses under its own weight.

The reason this persists is cultural rather than technical. Detection rules have traditionally been treated as configuration, not code. Configuration lives in a UI. You tune it by feel. You delete it when it gets annoying. The implicit assumption is that detection is an operational task, not an engineering one. Detection as Code challenges that assumption directly, and the discomfort that sometimes follows says everything about how embedded the old model is.

What Detection as Code actually means

The core idea is simple: every detection rule lives in a version control system (Git, overwhelmingly), is reviewed before deployment, is validated by automated tests, and is deployed through a pipeline rather than directly from a console. The rule is an artifact. Its history is preserved. Its ownership is explicit. Its behavior is testable.

This sounds obvious in the abstract. In practice it means several things change simultaneously.

Rules need metadata to be manageable at scale: author, creation date, last review date, MITRE ATT&CK technique IDs, target data source, expected false positive rate. Without metadata, a repository of 500 rules becomes a mystery archive that nobody wants to touch. With it, a simple script can emit a live coverage heatmap keyed by ATT&CK technique ID, updated on every commit, with no manual spreadsheet to go stale.

Rules need tests. A detection without a test is a hope, not a control. The minimum viable test is two fixtures: a malicious event the rule must match, and a benign event it must not match. Pull requests that lack both test cases should be rejected by CI, the same way you would reject application code without unit tests. This pattern sounds aggressive until you consider the alternative, which is finding out that your detection fires on legitimate admin activity during an actual incident.

Rules need deployment discipline. New rules should enter shadow mode for at least seven days: the rule runs and writes matches to a separate index, but creates no analyst alert. After a week you read the hit volume, sample a dozen events, and either tune, promote, or kill the rule. Detections that never leave shadow mode are still better than detections that page an analyst every twelve minutes with garbage. This is a lesson that applies equally to large enterprise SOCs and to the small teams I described in my post on 24/7 security monitoring: the goal is fewer alerts with better context, not more coverage with more noise.

How Sigma makes detection logic portable

The portability problem in detection engineering is real. Most organizations run more than one security platform. Rules written natively for Splunk are useless in Sentinel. Rules written for Elastic require a rewrite for Chronicle. Multiply this by every SIEM migration you will ever conduct, and the cost of vendor-native rule formats becomes visible.

Sigma is the closest thing the industry has to a solution. Released by Florian Roth in 2017, it defines detection logic in YAML: a logsource block that describes the data source, a detection block that describes what to look for, and a condition that ties them together. Tools like sigma-cli and pySigma compile that YAML into Splunk SPL, Microsoft Sentinel KQL, Elastic EQL, IBM QRadar, Chronicle YARA-L, Wazuh rules, and a growing list of other targets. A single Sigma file is the source of truth; the platform-specific query is a build artifact.

The SigmaHQ community repository on GitHub contains thousands of maintained rules covering techniques across the MITRE ATT&CK matrix, from credential dumping to living-off-the-land execution. The practical value is that you do not start a Detection as Code program from zero: you fork a curated baseline, map it to your environment, add tests, and own the result.

A minimal Sigma rule for detecting PowerShell encoded command execution looks like this in plain terms: logsource is Windows process creation; detection requires Image ending in powershell.exe and CommandLine containing -enc or -EncodedCommand; condition is selection. That single YAML file compiles to a working KQL query targeting DeviceProcessEvents in Sentinel and a working YARA-L rule targeting udm.principal.process in Chronicle. The portability is not theoretical. It is the reason Sigma has become, as the documentation now explicitly states, the de facto standard for SIEM-agnostic detection.

One important failure mode worth naming: Sigma rules that use vendor-specific field names in the detection block break portability silently. The rule compiles, the translation succeeds, and the query fires on nothing because the field name does not exist in the target SIEM’s schema. Field name normalization, whether through the OCSF schema, ECS, or your own data model, is not optional if you want portability to work in practice.

YARA and YARA-X for content-based detection

Sigma handles log-based detection. YARA handles file and memory-based detection: matching binary patterns, strings, and structural characteristics of files rather than log events. The two are complementary, and a mature Detection as Code program uses both.

I covered the YARA-X transition in detail earlier this year: the complete Rust rewrite addresses the performance ceiling that classic YARA’s C codebase had hit, particularly on rules with complex regular expressions and nested loops. For Detection as Code specifically, YARA-X brings two important improvements. First, the yr compile command works cleanly as a CI validation step, catching syntax errors before a rule is deployed to production scanning. Second, the JSON and YAML output formats make integration with automated pipelines practical without fragile text parsing.

YARA rules, like Sigma rules, need metadata and ownership. A rule without a description, author, and reference field is a detection waiting to become orphaned. For malware hunting workflows, the threat_name, malware_family, and MITRE technique tags are the difference between a rule library and a pile of patterns. YARA-X also fits naturally into threat hunting pipelines alongside Volatility3 and other DFIR tools, where the same rule corpus used in CI can be applied to memory dumps or disk images during incident response.

Where Detection as Code breaks down

The discipline only works on top of a functioning foundation. Git does not cure bad telemetry. Version-controlling rules that fire on unstructured logs, or that target fields that your parser does not reliably populate, produces a neat repository of rules that detect nothing. Before imposing engineering discipline on detection logic, it is worth confirming that your log sources are complete, that parsing is consistent, and that the fields your rules reference actually exist in your SIEM’s schema.

Ownership is the second failure mode. Detection as Code creates an audit trail for who wrote a rule and who approved it, but it does not automatically create a pager assignment. Rules without a named owner accumulate. They fire. Nobody investigates. After six months the SOC has learned to ignore the alert, and the detection is functionally dead while appearing alive in the repository. Every rule should carry an explicit owner and an explicit escalation path, and that information should be part of the metadata that CI validates, not an afterthought in a comment.

The third failure mode is scope creep in automation. The appeal of CI/CD for detections is real, but the same automation that deploys a well-tested rule can deploy a poorly-tested one just as fast. The solution is not less automation but better gates: require positive and negative test fixtures, enforce review by at least one person who did not write the rule, and run shadow mode for any rule touching high-volume data sources. Teams building 24/7 monitoring capability know the pattern well: the safe automation ladder applies here too. Enrich and validate automatically; promote to production only after human confirmation.

The cultural challenge is perhaps the most stubborn. Analysts who have spent years clicking rules into a console often experience Detection as Code as friction rather than discipline. The fix is demonstrating the value through a concrete incident: show the team the pull request that changed the rule three weeks before it started misfiring. That commit history, that diff, that author name is the argument that no process document can make as effectively.

A practical starting point

The good news is that you do not need to boil the ocean to start. A working Detection as Code program can begin with five rules and a single Git repository.

Pick your five highest-value detections, the ones covering your most credible risks and the ones your team actually investigates when they fire. Write each one as a Sigma rule. Add metadata: ATT&CK technique ID, data source, author, last reviewed date. Add a positive test fixture and a negative test fixture. Set up a basic CI pipeline that runs sigma-cli against every pull request and rejects merges without test coverage. That is version one. It is not impressive. It is sufficient.

From there, the program grows naturally: add YARA rules for your malware detection library, introduce shadow mode tooling for new rules before they go live, connect the rule metadata to an ATT&CK Navigator layer for coverage tracking, and eventually integrate deployment automation that pushes approved rules to your SIEM rather than requiring a manual copy-paste. Each step has compound returns: the coverage map improves, the audit trail extends, the on-call analyst gets better context at 3 AM.

Rapid7’s Terraform provider for InsightIDR demonstrates what the mature end of this looks like: detection rules expressed as Terraform resources with inline test cases, deployed through the same IaC pipeline your infrastructure team already uses. Governance happens as a natural side effect of the pull request workflow, not as a separate compliance exercise. Whether or not Terraform fits your environment, the model is instructive: detection logic as a first-class infrastructure artifact, subject to the same rigor as everything else you run in production.

Teams that have followed this path consistently report a 40 to 60 percent reduction in alert volume after six months, faster analyst onboarding because new team members can read the rule history to understand intent, and dramatically smoother SIEM migrations because the detection library is portable rather than locked into a vendor console. The coverage map becomes a credible artifact rather than a fiction. Most importantly, the team stops being afraid to delete bad rules, because the history is preserved and the test fixtures remain.

FAQ

What is Detection as Code? Detection as Code is a methodology that treats threat detection rules as software artifacts, managed through version control, peer review, automated testing, and CI/CD pipelines rather than ad-hoc console changes. The goal is a detection library that is auditable, portable, and maintainable rather than dependent on individual tribal knowledge.

What is Sigma and why does it matter for Detection as Code? Sigma is an open, vendor-neutral rule format that lets you write detection logic once and compile it to Splunk SPL, Microsoft Sentinel KQL, Elastic EQL, or other SIEM-specific languages. It is the most practical foundation for a portable, versionable detection library, and the SigmaHQ community repository provides a large baseline of maintained rules to start from.

Do I need a large team to implement Detection as Code? No. Even a small security team benefits from Detection as Code. Starting with a handful of high-value rules in a Git repository with basic CI validation already eliminates the most damaging failure modes like rules without ownership, no rollback path, and no audit trail. The overhead is low; the returns are immediate and compound over time.