This repository contains error detection catalogs and templates for the autosubmit-scan tool.
.
├── templates/
│ └── default_autosubmit.yaml # Default template for as-scan <expid>
└── examples/
├── production_catalog.yaml # Production example
└── development_catalog.yaml # Development example
The simplest way to scan an Autosubmit experiment:
# Automatically uses the template from this repository
as-scan a23iThis will:
- Load
templates/default_autosubmit.yamlfrom this repository - Render it with your experiment ID (
a23i) - Run the scan with comprehensive error detection
- Generate a report
Configure default behavior with environment variables:
# Override default template location
export AUTOSUBMIT_SCAN_DEFAULT_TEMPLATE=github://DestinE-Climate-DT:[email protected]/templates/default_autosubmit.yaml
# Override target host (default: mn5)
export AUTOSUBMIT_HOST=mn5
# Override base experiment path
export AUTOSUBMIT_BASE_PATH=/gpfs/scratch/ehpc01/awi478153Load catalogs from any location:
# From this repository
as-scan scan --catalog github://DestinE-Climate-DT:autosubmit-scan-error-catalogs@main/examples/production_catalog.yaml
# From local file
as-scan scan --catalog /path/to/my_catalog.yaml
# From SSH
as-scan scan --catalog ssh://mn5:/shared/catalogs/production.yaml
# From S3
as-scan scan --catalog s3://bucket/catalogs/production.yaml
# From HTTPS
as-scan scan --catalog https://example.com/catalogs/production.yamlThe default template (templates/default_autosubmit.yaml) is used when you run as-scan <expid>.
Template Variables:
{{ expid }}- Experiment ID (e.g.,a23i,t001){{ user }}- Current username{{ host }}- Target host (fromAUTOSUBMIT_HOSTormn5){{ base_path }}- Base experiment path (fromAUTOSUBMIT_BASE_PATHor default){{ timestamp }}- Current timestamp in ISO format
Example Usage:
files:
- ssh://{{ host }}:{{ base_path }}/{{ expid }}/LOG_{{ expid }}/*.out
- ssh://{{ host }}:{{ base_path }}/{{ expid }}/LOG_{{ expid }}/*.errRenders to:
files:
- ssh://mn5:/gpfs/scratch/ehpc01/awi478153/a23i/LOG_a23i/*.out
- ssh://mn5:/gpfs/scratch/ehpc01/awi478153/a23i/LOG_a23i/*.errThe default template detects:
- SlurmOutOfMemory - Job killed due to memory limit
- SlurmTimeLimit - Job exceeded walltime
- SlurmNodeFailure - Node failure or launch problem
- MissingInputFile - Required input file not found
- FortranRuntimeError - Model crash or segfault
- MPIError - MPI communication failure
- PythonTraceback - Python exception with traceback
- PythonImportError - Missing Python module
- NetCDFError - NetCDF file operation failed
- PythonKeyError - Dictionary key access error
- ErrorKeyword - Generic error keyword
Some errors trigger follow-up checks. For example, detecting a PythonTraceback will automatically check for:
PythonImportError(if traceback contains "ImportError")PythonKeyError(if traceback contains "KeyError")
See examples/production_catalog.yaml for a complete production example with:
- Specific file paths
- Team assignments in metadata
- Railway pattern for cascading checks
See examples/development_catalog.yaml for a minimal testing example with:
- Local file paths (
/tmp/test_logs/) - Simple pattern matching
- Basic error categories
Copy an example:
cp examples/development_catalog.yaml my_catalog.yamlEach error definition includes:
ErrorName:
id: ErrorName # Unique identifier
pattern:
type: literal|regex|callable # Pattern type
pattern: "error text" # Pattern to match
flags: ["IGNORECASE"] # Optional regex flags
files: # Where to search
- /path/to/logs/*.log
- ssh://host:/path/*.err
meaning: "What this error means"
suggestion: "How to fix it"
context_lines: 10 # Lines before/after match
next_errors: # Railway pattern (optional)
- error_id: FollowUpError
when:
type: always
metadata: # Custom metadata
severity: critical|high|medium|low
category: resource|model|data|codeLiteral - Exact string match:
pattern:
type: literal
pattern: "ERROR"Regex - Regular expression:
pattern:
type: regex
pattern: 'ERROR|FATAL|CRITICAL'
flags: ["IGNORECASE", "MULTILINE"]Callable - Custom Python function:
pattern:
type: callable
pattern: "my_module:my_matcher_function"Supported protocols:
- Local:
/path/to/file.log - SSH:
ssh://user@host:/path/*.log - S3:
s3://bucket/path/*.log - FTP:
ftp://user:pass@host/path/*.log - GitHub:
github://org:repo@ref/path/file
Glob patterns supported:
*.log- All .log files**/*.err- Recursive search for .err filesLOG_*/job_*.out- Pattern matching directories
as-scan validate my_catalog.yamlas-scan scan --catalog my_catalog.yaml --output ./results --dryrunThis repository follows semantic versioning:
- main - Latest stable version
- develop - Development branch
- v1.0, v1.1, etc. - Tagged releases
# Use specific version
export AUTOSUBMIT_SCAN_DEFAULT_TEMPLATE=github://DestinE-Climate-DT:[email protected]/templates/default_autosubmit.yaml
# Use development branch
export AUTOSUBMIT_SCAN_DEFAULT_TEMPLATE=github://DestinE-Climate-DT:autosubmit-scan-error-catalogs@develop/templates/default_autosubmit.yaml- Create template in
templates/directory - Use Jinja2 syntax for variables
- Document available variables in comments
- Test with:
as-scan scan --catalog templates/your_template.yaml
- Create complete, working catalog in
examples/ - Use realistic file paths and patterns
- Add descriptive comments
- Include railway patterns if applicable
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
[License information to be added]