Splunk Detection Rule Audit
Four ways my own rules would flood a real analyst with noise. Audited every SPL detection against 283,976 events, classified the failure modes, and fixed the gaps.
Problem / Hypothesis
I had written detection rules in Splunk targeting a constrained Windows telemetry environment — single sourcetype, no Sysmon, no Windows TA, manual rex-based field extraction. The rules worked. They fired. They matched MITRE ATT&CK techniques.
The hypothesis: working rules are not the same as deployable rules. If a real analyst inherited these rules in a production SOC, what would their first week look like?
Environment
Splunk Enterprise, REST API on port 8089. Single Windows 11 workstation (HO-WE-01). Wazuh agent forwarding Security Event Log as XmlWinEventLog:Security. 7-day window, ~283k events. All field extraction via manual rex against raw XML. No CIM normalization.
Methodology
Step 1 — Inventory
Cataloged every SPL detection query. Each mapped to a MITRE ATT&CK technique with defined thresholds and target EventIDs.
Step 2 — Run against production data
Every rule executed against the full 283,976-event dataset via Splunk REST API. Hit count, volume distribution, and sampled matches recorded.
Step 3 — Analyst workload test
For each rule: if this alert fired in a SOC queue, could an analyst triage it to a conclusion with the information available? Or would they need to pivot to data that doesn’t exist?
Step 4 — Classify the noise
Every rule that failed the analyst-workload test was categorized by the specific reason it would generate untriageable alerts.
Findings
| # | Noise Source | Rules Affected | Analyst Impact |
|---|---|---|---|
| 1 | Empty CommandLine | 3 rules | Alerts fire but can’t be triaged |
| 2 | No failed logon baseline | 1 rule | Silent failure or no-context fire |
| 3 | Missing sourcetype | 2 rules | Rules never fire — false coverage |
| 4 | Rex fragility | All rules | Silent extraction failure |
Of 8 detection rules: 3 immediately actionable, 3 require command-line auditing, 1 requires failed logon auditing, 1 requires Sysmon or network telemetry.
Operational Impact
Every new detection now includes a “deployment prerequisites” section before the SPL, not after. Rules are not marked stable until their dependencies are confirmed present.
Verification
- Detection rules with dependency flags: content/detection-rules/splunk/
- Full analysis: content/case-studies/signalfoundry-splunk-detection-engineering.md
- Phase 1 remediation: Enterprise Security Hardening case study
- CommandLine gap verification: index=wazuh data.win.system.eventID=4688 | eval has_cmdline=if(len(CommandLine)>0,"yes","no") | stats count by has_cmdline
What This Demonstrates
Writing detection rules is the easy part. Knowing whether your rules are deployable — whether they’ll help an analyst or bury them — requires running them against real data and asking uncomfortable questions about what happens when the alert fires.
I found four ways my own rules would make a real analyst’s life worse. I documented them, flagged the dependencies, and fixed the underlying gaps. That sequence — build, audit, document the gaps honestly, fix them — is what separates a detection library from a noise generator.