Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

rsigma-parser

CI

rsigma-parser is a parser for Sigma detection rules, correlations, and filters. It parses Sigma YAML into a strongly-typed AST covering the full Sigma 2.0 specification, and includes a 65-rule linter derived from the Sigma v2.1.0 spec.

This library is part of rsigma.

Public API

Parsing

Function Description
parse_sigma_yaml(yaml: &str) Parse a multi-document YAML string into a SigmaCollection
parse_sigma_file(path: &Path) Parse a single YAML file
parse_sigma_directory(dir: &Path) Recursively parse all .yml/.yaml files in a directory
parse_condition(input: &str) Parse a condition expression string into a ConditionExpr
parse_field_spec(key: &str) Parse a field specification like "CommandLine|contains|all" into a FieldSpec

Linting

Function Description
lint_yaml_value(value: &Value) Lint a single YAML document value (auto-detects document type)
lint_yaml_str(text: &str) Lint raw YAML string with source span resolution
lint_yaml_str_with_config(text: &str, config: &LintConfig) Lint with config-based suppression
lint_yaml_file(path: &Path) Lint all documents in a file
lint_yaml_file_with_config(path: &Path, config: &LintConfig) Lint a file with config
lint_yaml_directory(dir: &Path) Recursively lint all .yml/.yaml in a directory
lint_yaml_directory_with_config(dir: &Path, config: &LintConfig) Lint a directory with config
parse_inline_suppressions(text: &str) Parse # rsigma-disable comments from YAML text
apply_suppressions(warnings, config, inline) Filter and override warnings using config and inline suppressions

Auto-Fix Types

Each LintWarning can carry an optional Fix describing how to automatically correct the issue.

Type Description
Fix A suggested fix: title, disposition (Safe/Unsafe), and ordered patches
FixDisposition Safe (no semantic change) or Unsafe (may change meaning)
FixPatch A single edit: ReplaceValue, ReplaceKey, or Remove with a JSON-pointer path

Fix types are yamlpath/yamlpatch-agnostic (all owned data, no external dependencies). Consumers (CLI, LSP) convert FixPatch to concrete patch operations at apply time.

Value Types

Type/Function Description
SigmaString::new(s: &str) Parse a string with wildcard interpretation (*, ?, \ escape)
SigmaString::from_raw(s: &str) Create from raw string (no wildcard parsing; used for |re modifier)
SigmaString::is_plain() Returns true if no wildcards
SigmaString::contains_wildcards() Returns true if any wildcard present
SigmaString::as_plain() Get plain string; None if wildcards present
SigmaValue::from_yaml(v: &Value) Create from a YAML value
Timespan::parse(s: &str) Parse timespan like 1h, 15s, 7d

Parsing

  • Multi-document YAML: --- separators, action: global/reset/repeat for rule templates
  • Condition expressions: PEG grammar (pest) with Pratt parsing and correct operator precedence (NOT > AND > OR). Supports and, or, not, 1 of, all of, any of, N of, parenthesized groups, wildcard patterns — them excludes _-prefixed identifiers per spec
  • Value types: strings with wildcards (*, ?), escape sequences (\*, \?, \\), integers, floats, booleans, null
  • Timespan parsing: 15s, 30m, 1h, 7d, 1w, 1M, 1y
  • Logsource: category, product, service, definition, custom fields

Multi-Document Behavior

  1. serde_yaml::Deserializer yields documents separated by ---.
  2. Non-mapping documents are skipped; errors are accumulated in collection.errors.
  3. YAML parse errors stop iteration (the deserializer may not recover from malformed input).
  4. Collection actions:
    • action: global — store document as a template; remove action key; do not produce a rule.
    • action: reset — clear the global template.
    • action: repeat — merge current document onto the previous document; apply global template if present; parse the merged result. Error if no previous document exists.
  5. Merge order: For normal documents: merged = deep_merge(global, value). For repeat: merged = deep_merge(global, deep_merge(previous, repeat_doc)).
  6. deep_merge: Recursive. Source mappings override destination keys; non-mapping source replaces destination entirely.
  7. Previous tracking: Updated after each non-action document and after each repeat. Repeat chains from the last document, not the original.

SigmaString Escape Semantics

Input Parsed as
\* literal * (not a wildcard)
\? literal ? (not a wildcard)
\\ literal \
\W (non-special) literal \W (both characters kept)

Backslash only consumes itself when followed by *, ?, or \. This preserves Windows paths like \Windows\System32.

Condition Expression Grammar (PEG)

The full PEG grammar is defined in src/sigma.pest. It implements the Sigma condition expression syntax using a Pratt parser with not > and > or precedence.

Parsing quirks:

  • !ident_char lookahead ensures and_filter parses as a single identifier, not and + filter.
  • 1 of and any of both map to Quantifier::Any.
  • Nested same-type binary ops are flattened: a and b and c becomes And([a, b, c]), not And(a, And(b, c)).

Operator Precedence

Precedence (highest first) Operator Associativity
1 not (prefix)
2 and (infix) Left
3 or (infix) Left

a or not b and c parses as a or ((not b) and c).

AST Types

Core Types

Type Description
SigmaCollection Collection of rules, correlations, filters, and errors
SigmaRule A parsed detection rule with metadata, logsource, and detections
CorrelationRule A correlation rule with type, referenced rules, timespan, and conditions
FilterRule A filter rule that injects AND NOT conditions into referenced rules
Detections Named detections, condition expressions, and optional timeframe
Detection AllOf (AND-linked items), AnyOf (OR-linked), or Keywords (plain values)
FieldSpec Field name + modifier chain; name is None for keyword detections
ConditionExpr And, Or, Not, Identifier, or Selector with quantifier and pattern
SigmaValue String, Integer, Float, Bool, or Null
SigmaString String with wildcard parts (Plain text + WildcardMulti/WildcardSingle)

Enums

Enum Variants
Status Stable, Test, Experimental, Deprecated, Unsupported
Level Informational, Low, Medium, High, Critical
RelationType Derived, Obsolete, Merged, Renamed, Similar
Quantifier Any, All, Count(u64)
SelectorPattern Them, Pattern(String)
CorrelationType EventCount, ValueCount, Temporal, TemporalOrdered, ValueSum, ValueAvg, ValuePercentile, ValueMedian

Detection Parsing

  • YAML mappingDetection::AllOf (AND-linked items)
  • YAML list of mappingsDetection::AnyOf (OR-linked)
  • YAML list of plain valuesDetection::Keywords (keyword search across all fields)
  • Condition as list (condition: [s1, s2]) → multiple ConditionExpr parsed independently
  • Empty field name (parse_field_spec("")) → FieldSpec { name: None, modifiers: [] } (keyword)

Field Modifiers (30)

The parser recognizes 30 modifier variants, some with aliases:

Category Modifiers Aliases
String matching contains, startswith, endswith
Value linking all
Encoding base64, base64offset, wide, utf16be, utf16, windash utf16lewide
Pattern re, cidr
Case cased
Existence exists
Placeholder expand
Field reference fieldref
Numeric comparison gt, gte, lt, lte, neq
Regex flags i, m, s ignorecasei, multilinem, dotalls
Timestamp parts minute, hour, day, week, month, year

When the re modifier is present, string values are parsed with SigmaValue::from_raw_string (no wildcard interpretation).

Correlation Rules (8 types)

Type Description
event_count Count matching events per group key
value_count Count distinct field values per group key
temporal Require multiple rules to fire in the same window
temporal_ordered Same as temporal, but rules must fire in order
value_sum Sum a numeric field across events
value_avg Average a numeric field across events
value_percentile Compute a percentile of a numeric field
value_median Compute the median of a numeric field

Correlation Conditions

  • Threshold (mapping): { gte: 100 } or { gt: 10, lte: 100, field: "TargetUser" }. Operators: lt, lte, gt, gte, eq, neq. Values must be numeric.
  • Extended (string): "rule_a and rule_b" for temporal types — parsed as a boolean expression over rule references.
  • Default (temporal, no condition): Threshold { predicates: [(Gte, 1)], field: None }.
  • Timeframe/timespan: The parser accepts both timeframe and timespan keys.
  • Custom attributes: Correlation rules support a custom_attributes mapping (string-to-string) for rsigma.* engine extensions.

Filter Rules

Filter rules inject exclusion conditions into referenced detection rules — enables centralized tuning without modifying original rule files. See the Sigma Filters Specification for the full standard.

Per the spec, selection, condition, and rules all live inside the filter section:

title: Exclude Admin Users
logsource:
    category: process_creation
    product: windows
filter:
    rules:
        - <rule-id>
    selection:
        User|startswith: 'adm_'
    condition: selection

Timespan Parsing

Unit Suffix Multiplier (seconds)
Second s 1
Minute m 60
Hour h 3,600
Day d 86,400
Week w 604,800
Month M (uppercase) 2,629,746 (~30.44 days)
Year y 31,556,952 (~365.25 days)

The string must be at least 2 characters (e.g. 1h). The last character is the unit; the prefix must be a positive integer.

Linter (65 rules)

65 built-in lint rules derived from the Sigma v2.1.0 specification. Four severity levels: Error (spec violation), Warning (best-practice issue), Info (soft suggestion), Hint (stylistic). Info/Hint findings don't cause lint failure.

The linter operates on raw YAML values to catch issues the parser silently ignores.

Infrastructure (4)

Rule Severity Fix Trigger
yaml_parse_error Error YAML parse failure
not_a_mapping Error Document is not a YAML mapping
file_read_error Error Cannot read file
schema_violation Error JSON schema validation failure (optional)

Shared Metadata (16)

Rule Severity Fix Trigger
missing_title Error No title field
empty_title Error title is empty or whitespace
title_too_long Warning title exceeds 256 characters
missing_description Info No description
missing_author Info No author
invalid_id Warning id not a valid UUID (8-4-4-4-12 hex)
invalid_status Error Yes status not in stable/test/experimental/deprecated/unsupported
missing_level Warning No level (detection rules)
invalid_level Error Yes level not in informational/low/medium/high/critical
invalid_date Error date not YYYY-MM-DD with valid day-of-month
invalid_modified Error modified not YYYY-MM-DD
modified_before_date Warning modified is earlier than date
description_too_long Warning description exceeds 65,535 characters
name_too_long Warning name exceeds 256 characters
taxonomy_too_long Warning taxonomy exceeds 256 characters
non_lowercase_key Warning Yes Top-level key is not lowercase

Detection Rules (17)

Rule Severity Fix Trigger
missing_logsource Error No logsource
missing_detection Error No detection
missing_condition Error No condition in detection
empty_detection Warning No named search identifiers
invalid_related_type Error related[].type not in derived/obsolete/merged/renamed/similar
invalid_related_id Warning related[].id not a valid UUID
related_missing_required Error related[] missing id or type
deprecated_without_related Warning status: deprecated but no related
invalid_tag Warning Tag doesn't match ^[a-z0-9_-]+\.[a-z0-9._-]+$
unknown_tag_namespace Warning Tag namespace not in attack/car/cve/d3fend/detection/stp/tlp
duplicate_tags Warning Yes Duplicate tag
duplicate_references Warning Yes Duplicate reference URL
duplicate_fields Warning Yes Duplicate field name
falsepositive_too_short Warning falsepositives entry under 2 characters
scope_too_short Warning scope entry under 2 characters
logsource_value_not_lowercase Warning Yes Logsource category/product/service not lowercase
condition_references_unknown Error Condition references non-existent detection identifier

Correlation Rules (13)

Rule Severity Fix Trigger
missing_correlation Error No correlation or not a mapping
missing_correlation_type Error No correlation.type
invalid_correlation_type Error Type not a recognized correlation type
missing_correlation_rules Error No correlation.rules
empty_correlation_rules Warning correlation.rules is empty
missing_correlation_timespan Error No correlation.timespan or correlation.timeframe
invalid_timespan_format Error Timespan format invalid
missing_group_by Error No correlation.group-by
missing_correlation_condition Error Non-temporal type without condition
missing_condition_field Error value_count/value_sum/value_avg/value_percentile without condition.field
invalid_condition_operator Error Operator not in gt/gte/lt/lte/eq/neq
condition_value_not_numeric Error Condition value not numeric
generate_not_boolean Error generate is not a boolean

Filter Rules (8)

Rule Severity Fix Trigger
missing_filter Error No filter or not a mapping
missing_filter_rules Error No filter.rules
empty_filter_rules Warning filter.rules is empty
missing_filter_selection Error No filter.selection
missing_filter_condition Error No filter.condition
filter_has_level Warning Yes Filter has level (not applicable)
filter_has_status Warning Yes Filter has status (not applicable)
missing_filter_logsource Warning No logsource

Detection Logic (7)

Rule Severity Fix Trigger
null_in_value_list Warning null mixed with other values in a list
single_value_all_modifier Warning Yes |all with a single value
all_with_re Warning Yes |all and |re combined
incompatible_modifiers Warning Incompatible modifier combination (e.g. contains|startswith, re|contains, gt|contains, regex flags without re)
empty_value_list Warning Empty value list
wildcard_only_value Warning Yes Lone * value (suggests |exists: true instead)
unknown_key Info Yes Top-level key likely a typo of a known key (edit distance ≤ 2); custom fields are allowed per the Sigma spec

Rule Suppression

Three-tier system to disable or override lint rules:

  • CLI: --disable rule1,rule2 suppresses specific rules globally
  • Config file: .rsigma-lint.yml (or .rsigma-lint.yaml) with disabled_rules and severity_overrides, auto-discovered by walking ancestor directories from the target path upward
  • Inline comments: # rsigma-disable, # rsigma-disable rule1, rule2, # rsigma-disable-next-line, # rsigma-disable-next-line rule1, rule2
# .rsigma-lint.yml
disabled_rules:
  - missing_description
  - missing_author
severity_overrides:
  title_too_long: info

Suppression order: Warnings are first filtered by disabled_rules, then by inline suppressions, then severity_overrides are applied to remaining warnings.

Inline # inside quoted YAML strings is not treated as a comment.

Optional Schema Validation

The schema_violation lint rule optionally validates rules against a JSON schema. The schema can be the official Sigma schema (downloaded and cached for 7 days, with offline fallback to stale cache) or a local file.

Directory Parsing and Linting

  • parse_sigma_directory: Recursively walks directories; only processes .yml/.yaml files. File-level parse errors are accumulated in collection.errors (not fatal). Sub-collections (rules, correlations, filters) are merged.
  • lint_yaml_directory: Skips hidden directories (. prefix). Canonicalizes paths to detect symlink cycles. Sorts entries by path for deterministic output.

Error Types

Error When
Yaml serde_yaml parse failure
Condition Condition expression parse failure (PEG/Pratt); carries optional SourceLocation with line/column
UnknownModifier Unknown modifier in field spec
InvalidFieldSpec Invalid field specification
InvalidRule Document not a mapping, or invalid structure
MissingField Required field missing (e.g. title, detection)
InvalidDetection Detection section invalid
InvalidCorrelation Correlation rule invalid
InvalidTimespan Timespan string invalid (wrong format, unknown unit)
InvalidValue Invalid value in detection
InvalidAction Unknown collection action
Io File read error

Compatibility

Tested against the SigmaHQ/sigma rule repository:

Corpus Rules Parsed Errors
rules/ 3,110 0
rules-emerging-threats/ 436 0
rules-threat-hunting/ 133 0
rules-compliance/ 3 0
rules-placeholder/ 14 0
unsupported/ 31 58 (deprecated pipe syntax)
deprecated/ 165 1 (deprecated pipe syntax)
Total 3,892 0 real errors

The deprecated pipe aggregation syntax (selection | count(field) by field > N) is intentionally rejected, matching pySigma behavior.

Usage

use rsigma_parser::{parse_sigma_yaml, parse_condition};

let collection = parse_sigma_yaml(yaml).unwrap();
assert_eq!(collection.rules[0].title, "Detect Whoami");

// Parse a condition expression directly
let expr = parse_condition("selection1 and not filter").unwrap();

Benchmarks

Criterion.rs benchmarks with synthetic rules (Apple M-series, single-threaded):

Scenario Time
1 rule 11.7 us
100 rules 1.1 ms
1,000 rules 11.1 ms
Complex condition (8 selections, nested booleans) 23.2 us
cargo bench --bench parse

License

MIT License.