A Go rewrite of yarGen (Python) by Florian Roth - an automatic YARA rule generator.
yarGen-Go generates YARA rules from strings found in malware files while removing all strings that also appear in goodware files. It includes:
- yargen - Main rule generator (CLI + web server)
- yargen-util - Database management utility
Linux/macOS:
- Prerequisites: Install Go 1.22+
- Build: Clone repository or download the ZIP and extract it
- Build binaries: Run the following commands:
go mod tidy go build -o yargen ./cmd/yargen go build -o yargen-util ./cmd/yargen-util
- Databases: Run
./yargen-util updateto download goodware databases - Configure (Optional): Copy
config/config.example.ymltoconfig/config.yamland set your LLM API key - Use: Run
./yargen serveand open the Web UI at http://127.0.0.1:8080
Windows:
- Prerequisites: Install Go 1.22+
- Build: Clone repository or download the ZIP and extract it
- Build binaries: Run the following commands:
go mod tidy go build -o yargen.exe .\cmd\yargen go build -o yargen-util.exe .\cmd\yargen-util
- Databases: Run
yargen-util.exe updateto download goodware databases - Configure (Optional): Copy
.\config\config.example.ymlto.\config\config.yamland set your LLM API key - Use: Run
.\yargen.exe serveand open the Web UI at http://127.0.0.1:8080
π For detailed setup instructions, see the Step-by-Step Setup Guide
- ASCII and UTF-16LE (wide) string extraction
- Opcode extraction from PE/ELF executables
- Encoding detection: Base64, hex-encoded, reversed strings
- Magic header and filesize conditions
- Super rule generation (overlapping string patterns across files)
- Customizable scoring rules (SQLite-backed, editable via Web UI)
- Efficient LLM integration for string selection (OpenAI, Anthropic, Gemini, Ollama)
- Only submits prefiltered top candidates (no goodware strings, max 500 from automatic evaluation)
- Requests numbered list instead of full strings to minimize token usage
- Significantly reduces API costs compared to naive approaches
- Web UI for rule generation and scoring rules management
Download pre-built binaries from the Releases page for your platform.
go install github.com/Neo23x0/yarGen-Go/cmd/yargen@latest
go install github.com/Neo23x0/yarGen-Go/cmd/yargen-util@latestBinaries will be installed to $GOPATH/bin or $HOME/go/bin (add to PATH if needed).
# Basic usage
yargen -m ./malware-samples
# With options
yargen -m ./malware-samples \
-o rules.yar \
-a "Your Name" \
-r "Internal Research" \
--opcodes \
--score
# Show all options
yargen -h# Start web server on localhost:8080
yargen serve
# Custom port
yargen serve --port 3000Then open http://127.0.0.1:8080 in your browser.
# Download built-in databases from GitHub
yargen-util update
# List all databases
yargen-util list
# Create new goodware database
yargen-util create -g /path/to/goodware -i mydb
# Append to existing database
yargen-util append -g /path/to/more/goodware -i mydb
# Inspect database
yargen-util inspect ./dbs/good-strings-mydb.db
# Merge databases
yargen-util merge -o combined.db db1.db db2.dbDefault Config Location:
- The default config file is
./config/config.yaml(in the project directory) - For backward compatibility, the application will automatically check
~/.yargen/config.yamlor~/.yargen/config.ymlif the default location doesn't exist - Use the
--configflag to specify a different config file path - Example:
./yargen serve --config /path/to/custom/config.yml
Quick Setup:
- Copy the example config:
cp config/config.example.yml config/config.yaml(see Step 5 in the Setup Guide for details) - Edit the file to match your LLM provider
- Set your API key as an environment variable
Example Configuration (from config/config.example.yml):
llm:
provider: "openai" # openai, anthropic, gemini, ollama
model: "gpt-4o-mini"
api_key: "${OPENAI_API_KEY}" # Uses environment variable
endpoint: "" # For ollama: http://localhost:11434
timeout: 60
max_candidates: 500
database:
dbs_dir: "./dbs"
scoring_db: "~/.yargen/scoring.db"
defaults:
author: "yarGen"
min_string_length: 8
max_string_length: 128
min_score: 0
max_strings: 20
super_rule_overlap: 5
filesize_multiplier: 3
include_opcodes: true
num_opcodes: 3
server:
host: "127.0.0.1"
port: 8080Environment Variables:
The config file supports environment variable expansion using ${VARIABLE_NAME} syntax. Common variables:
OPENAI_API_KEY- OpenAI API keyANTHROPIC_API_KEY- Anthropic API keyGEMINI_API_KEY- Google Gemini API key
Custom Config Location:
If you prefer to use a config file in your home directory (e.g., ~/.yargen/config.yml), use the --config flag:
./yargen serve --config ~/.yargen/config.ymlSee Step 5 in the Setup Guide for platform-specific environment variable setup instructions.
| Flag | Description | Default |
|---|---|---|
-m |
Path to malware directory | required |
-y |
Minimum string length | 8 |
-z |
Minimum score threshold | 0 |
-x |
High-scoring string threshold | 30 |
-w |
Super rule overlap threshold | 5 |
-s |
Maximum string length | 128 |
-rc |
Max strings per rule | 20 |
--excludegood |
Exclude all goodware strings | false Note: By default, goodware strings receive very low scores but are still included as they can be useful when combined with more specific strings in a malware sample. This flag forces complete removal of all goodware strings from the candidate set. |
--opcodes |
Enable opcode extraction | false |
-n |
Number of opcodes to include | 3 |
| Flag | Description | Default |
|---|---|---|
-o |
Output rule file | yargen_rules.yar |
-a |
Author name | "yarGen" |
-r |
Reference | "" |
-l |
License | "" |
-p |
Rule description prefix | "" |
-b |
Identifier | (folder name) |
--score |
Show scores as comments | false |
--nosimple |
Skip simple rules in super rules | false |
--nomagic |
No magic header condition | false |
--nofilesize |
No filesize condition | false |
-fm |
Filesize multiplier | 3 |
--nosuper |
Disable super rules | false |
| Flag | Description | Default |
|---|---|---|
--config |
Config file path | ./config/config.yaml |
--nr |
Non-recursive scan | false |
--oe |
Only executable extensions | false |
-fs |
Max file size (MB) | 10 |
--no-llm |
Disable LLM | false |
--debug |
Debug output | false |
yarGen-Go uses a customizable scoring system to rank extracted strings. Scores accumulate when multiple rules match.
Categories include:
- Reductions (negative scores):
.., triple spaces, packer strings - File paths (+2 to +4): drive letters, extensions
- System keywords (+5): cmd.exe, system32
- Network (+3 to +5): protocols, IP addresses
- Malware keywords (+5): RAT, spy, inject
- Encoding (+5 to +10): Base64, hex-encoded, reversed strings
- PowerShell (+4): bypass, encoded commands
Manage scoring rules via the Web UI:
- Add/edit/delete rules
- Enable/disable rules
- Import/export as JSON
- Three match types: exact, contains, regex
The Web UI provides:
- Generate Page - Upload files, configure options, generate rules
- Scoring Rules Page - Manage built-in and custom scoring rules
- Settings Page - View LLM configuration status
Features:
- Drag-and-drop file upload
- Real-time rule generation progress
- Download generated .yar files
- CRUD operations for scoring rules
- Import/export scoring rules as JSON
- Minimum: 4 GB RAM
- With opcodes: 8 GB RAM
The goodware database is loaded entirely into memory for O(1) lookups.
yarGen-Go/
βββ cmd/
β βββ yargen/ # Main binary
β βββ yargen-util/ # Database utility
βββ docs/
β βββ SETUP.md # Step-by-step setup guide
βββ internal/
β βββ config/ # YAML configuration
β βββ database/ # Goodware DB loading/saving
β βββ extractor/ # String/opcode extraction
β βββ filter/ # String filtering & scoring
β βββ llm/ # LLM integration
β βββ rules/ # YARA rule generation
β βββ scanner/ # File scanning
β βββ scoring/ # Scoring engine & SQLite store
β βββ service/ # Core service layer
β βββ web/ # HTTP server & static files
βββ config/
β βββ config.example.yml
βββ go.mod
βββ README.md
See LICENSE file for details. Same license as the original yarGen project (GPL-3.0).
yarGen-Go is a Go rewrite of yarGen (Python), created by Florian Roth.




