← All case studies
Case Study — Detection Engineering

Wazuh Windows Telemetry Remediation: Three Phases, Zero to Full Visibility

A multi-agent Wazuh deployment looked healthy but was blind to Windows process creation. Three independent failures — an alert-level threshold, a broken indexer pipeline, and unconfigured Sysmon — were diagnosed and fixed with end-to-end proof.

2026-04-08 Zero visibility 3-phase remediation 2,120+ 4688 alerts 143 Sysmon EID 1

Problem

The deployment appeared healthy: daemons running, 5,000+ alerts per day, all agents connected. A configuration audit revealed it was blind to Windows process creation — zero Security Event 4688 alerts and zero Sysmon Event ID 1 alerts had ever been indexed. Custom rules covering MITRE ATT&CK techniques were loaded but had no data to match against.

Without process creation telemetry, credential dumping, PowerShell download cradles, lateral movement tools, and LOLBin abuse are all invisible. The rules existed. The telemetry they needed did not.

Environment

Deployment overview Wazuh 4.14.4-rc2
Manager:           Ubuntu, single-node (manager + indexer + dashboard)
Total agents:      10 (9 non-manager)
Windows agents:    2 (Windows 11 Enterprise)
Linux agents:      7 (Ubuntu, Debian, Linux Mint)
Custom rules:      28 detection + 5 local operational
log_alert_level:   5 (only level 5+ alerts indexed)
OpenSearch shards: 294

Phase 1 — Alert-Level Threshold

Root cause

Rule 67027 (Security 4688) fires at level 3. The manager log_alert_level=5. Every 4688 event was decoded, matched, and silently discarded because 3 < 5. A second custom rule was also suppressing all 67027 alerts for the primary Windows workstation.

Finding: Alert-level threshold mismatch. Process creation events decoded and matched by rule 67027, then discarded before reaching the indexer because level 3 < log_alert_level 5.

Fix

Removed the host-scoped suppression rule. Added rule 100203 (level 5, child of 67027) to elevate Security 4688 above the indexing threshold. Restarted wazuh-manager (8,490 rules loaded, no errors).

Validation

Rule 100203 confirmed loaded via API. Manager receiving 20,343+ Windows events, writing 607+ alerts, 0 dropped. But zero alerts were reaching the indexer — revealing a second, independent break.

Phase 2 — Indexer Connector Restoration

Root cause

A prior securityadmin.sh -cd command had reloaded ALL 10 OpenSearch security config files from disk defaults. This broke two settings simultaneously: clientcert_auth_domain.http_enabled was set to false (rejecting the manager’s TLS client cert), and the manager’s certificate identity had no role mapping (no write permissions even if auth succeeded).

Finding: securityadmin.sh -cd is a shotgun, not a scalpel. The -cd flag reloads all 10 security configs from disk, not just the one that changed. The correct approach is -f <file> -t <type> to scope the change.

Fix

Enabled clientcert_auth_domain.http_enabled: true in config.yml. Added the manager server identity to all_access.users in roles_mapping.yml. Applied via securityadmin.sh, restarted wazuh-manager.

Validation

Indexer connector restored PASS
IndexerConnector: initialized for ALL indices (25 sec)
New alerts index: 2,197+ documents, growing real-time
Rule 100203 hits: 2,120+ (Security 4688 from primary Windows agent)
Pipeline: Windows 4688 → agent → manager → rule 100203 → indexer → OpenSearch ✓

Phase 3 — Sysmon Telemetry

Root cause

Sysmon v15.15 was installed, running, and generating ~39,318 Event ID 7 (ImageLoad) events. But it had no configuration file — the registry showed only DriverName: SysmonDrv with no ConfigFile parameter. Without a config, Event ID 1 (ProcessCreate) is never generated. Additionally, rule 61603 (Sysmon EID 1) fires at level 0 — the same threshold pattern as Phase 1.

Finding: Sysmon “installed and running” does not mean “configured and generating.” 39,318 ImageLoad events created the appearance of telemetry without the substance.

Fix

Deployed a minimal Sysmon config (schema 4.90): EID 1 (ProcessCreate) enabled with noise exclusions, EID 7 (ImageLoad) disabled, targeted rules for LOLBin network connections, registry persistence, LSASS access, and DNS queries. Added rule 100204 (level 5, child of 61603) to elevate Sysmon EID 1 above the indexing threshold.

Validation

Sysmon EID 1 telemetry restored PASS
Sysmon EID 1 events indexed: 143 in first 5 minutes
Telemetry fields confirmed: process image, command line,
  parent process, user context
User-level and SYSTEM-level processes: both captured
Sample: net.exe user → parent pwsh.exe → user context confirmed

Before / After

Telemetry delta Results
                                Before        After
Security 4688 indexed:          0             2,120+
Sysmon EID 1 indexed:           0             143+ (first 5 min)
Sysmon EID 7 noise:             39,318        Disabled
Alerts reaching indexer:        0             2,197+ (growing)
OpenSearch client cert auth:    Disabled      Enabled
Primary endpoint visibility:    Blind         Full (4688 + Sysmon EID 1)

Remaining Gaps

Secondary endpoint: Back online. All 9 non-manager agents connected — fleet at full visibility.

FIM limit: Primary endpoint at 99,999/100,000 files monitored. Near capacity.

Lessons Learned

The same threshold pattern broke two independent telemetry sources. Security 4688 (level 3) and Sysmon EID 1 (level 0) both fell below log_alert_level=5. Any Wazuh deployment with log_alert_level > 3 should audit all base event rules to ensure needed events cross the indexing threshold.

A broken indexer pipeline is invisible until you look at the index. The manager continued processing events, writing alerts, and reporting healthy daemon status. The only sign was in connector logs and the absence of new documents in the index.

Before-state evidence is the case study. Every config file, daemon status, and dashboard screenshot was captured before changes. This made it possible to precisely document what was broken, prove what changed, and demonstrate the delta.

Verification

A reviewer can confirm these claims through:

  • Source case study: content/case-studies/wazuh-telemetry-remediation.md
  • Evidence: 14 before-state screenshots, 7 config exports, 20 after-state rule exports with diffs
  • Phase fix documents: 3 root cause analyses with validation reports
  • Session log: Complete START/CHANGE/VALIDATION/END entries for all phases