ETL Processing Pipeline

A Pipeline for creating Knowledge Graphs from CSV files tagged with IEC 61850 standard, featuring automated evaluation and performance monitoring. This pipeline follows the Omega-X ontology pattern for energy data modeling and semantic interoperability.

Omega-X Pattern for Energy data Modeling

The pipeline processes IEC 61850 energy data following the Omega-X ontology pattern, which defines the semantic relationships between energy datasets, data collections, and market participants:

IEC 61850 Device Modeling

The pipeline processes IEC 61850 compliant energy devices, mapping their logical structure to semantic representations:

Device Hierarchy Processing:

Physical Device: Top-level device identifier (e.g., PARK)
Logical Device: Specific device instance (e.g., ECP001_S3_SHL001_Inverter01)
Logical Nodes: Functional components (sddinv1, MMXU1)
Data Attributes: Measurement properties (heatsinktmp, encltmp, TotW, W)

The pipeline automatically extracts and maps these hierarchical relationships to create semantically rich knowledge graphs compatible with the Omega-X energy ontology.

Pipeline Overview

The enhanced pipeline consists of five main steps with integrated evaluation, all following the Omega-X ontology specifications:

Extract - Convert time to ISO 8601, extract device list, and split CSV files by device using regex
Transform - Generate RML mapping files using Jinja2 template and creates knowledge graph using SDM-RDFizer
Load - Import to running GraphDB repository to store and query the knowledge graph

Omega-X Ontology Integration

The pipeline is specifically designed to work with the Omega-X ontology for energy data interoperability:

Semantic Mapping Features

Automatic IEC 61850 tag parsing and mapping to Omega-X classes
Energy device hierarchy preservation in RDF structure
Standardized property mappings using QUDT units
Market participant role assignments
Time series data modeling following ETS (EventsTimeSeries) patterns

Ontology Compliance

All generated RDF follows Omega-X namespace conventions
Device properties are mapped to appropriate ontology classes
Temporal data is structured according to ETS specifications
Energy roles and market participant relationships are preserved

Prerequisites

Required Dependencies

pip install pandas
pip install python-dateutil
pip install tqdm
pip install psutil
pip install jinja2
pip install requests
pip install rdflib

External Tools

SDM-RDFizer: Automatically installed by the pipeline, or install manually from SDM-RDFizer GitHub
GraphDB: Install GraphDB for knowledge graph storage and querying (optional)

Directory Structure

The pipeline automatically creates the following structure:

pipeline_output_YYYYMMDD_HHMMSS/
├── split_csvs/
├── rml_files/
├── knowledge_graph/
├── config/
├── logs/
└── evaluation/

Quick Start

Basic Usage

# Run the complete pipeline with timestamped output
python Pipeline_CSV_RML_KG.py input_data.csv

# Run with evaluation
python Pipeline_CSV_RML_KG.py input_data.csv --evaluate

# Run with GraphDB import
python Pipeline_CSV_RML_KG.py input_data.csv --import-to-graphdb --graphdb-repo my-repo-id

Command Line Usage

Complete Pipeline Script

python Pipeline_CSV_RML_KG.py input.csv [options]

Required Arguments

input_csv: Path to the input CSV file to be processed

Pipeline Output Options

--output-dir: Base output directory (default: pipeline_output)
--no-timestamp: Disable timestamped output folders

CSV Processing Options

--time-col: Name of the time column (default: Time)
--device-col: Name of the column with device identifiers
--file-id: ID to append to split CSV filenames

RML Generation Options

--rml-template: Path to Jinja2 template file (default: Jinja_RML-Template_PerDevice.j2)
--prefix: Ontology prefix URL (default: Omega-X namespace)
--wid: Window ID (default: W1)
--timestamp-column: Name of timestamp column for RML
--source-participant: Name of the source participant
--target-participant: Name of the target participant

Knowledge Graph Options

--kg-format: Output format (turtle, n-triples, rdf-xml)
--remove-duplicates/--no-remove-duplicates: Control duplicate removal
--all-in-one: Generate all datasets in one file
--no-enrichment: Disable enrichment
--no-ordered: Disable ordered processing

Note: For Now When you try to use arguments --source-participant and --target-participant with --all-in-one you must also use --no-remove-duplicates. If not the generated output KG will not be valid.

GraphDB Import Options

--import-to-graphdb: Enable GraphDB import
--graphdb-url: GraphDB instance URL (default: http://localhost:7200)
--graphdb-repo: Repository ID (required for import)
--graphdb-user: Username for authentication
--graphdb-password: Password for authentication

Evaluation Options

--evaluate: Run evaluation after pipeline completion

Examples

Basic Pipeline Execution

# Simple run with default settings
python .\ETL-Pipeline\Resources\Pipeline_CSV_RML_KG.py '.\Input_CSV_Datasets\PARK-2024-09-26_week1.csv' --prefix https://w3id.org/omega-x/ontology/KG/PARK-DataSets --wid W1 --evaluate

Advanced Configuration

# Full pipeline with custom Omega-X settings
python .\ETL-Pipeline\Resources\Pipeline_CSV_RML_KG.py '.\Input_CSV_Datasets\PARK-2024-09-26_week1.csv' --prefix https://w3id.org/omega-x/ontology/KG/PARK-DataSets --wid W1 --all-in-one --import-to-graphdb --graphdb-repo PARK-Inverter-W1-2025 --graphdb-user admin --graphdb-password pass12345 --evaluate

Evaluation Features

The integrated evaluation system provides comprehensive analysis of Omega-X compliant knowledge graphs:

Performance Metrics

Total pipeline execution time
Per-stage timing breakdown
Throughput (triples per second)
Resource usage monitoring (CPU, memory)

Knowledge Graph Analysis

Accurate triple counting using rdflib
Omega-X ontology compliance validation
File size analysis
Format distribution
Quality assessment

Output Reports

JSON evaluation reports with detailed metrics
Console summary with key statistics
Timestamped logs for debugging

Output Structure

Each pipeline run creates a timestamped directory containing Omega-X compliant knowledge graphs:

pipeline_output_20250614_163000/
├── split_csvs/                    # Device-separated CSV files
│   ├── METEOSTA001_W1.csv
│   └── INVERTER01_W1.csv
├── rml_files/                     # Generated RML mappings (Omega-X compliant)
│   ├── generated_METEOSTA001_W1.rml.ttl
│   └── generated_INVERTER01_W1.rml.ttl
├── knowledge_graph/               # Generated RDF files (Omega-X format)
│   ├── knowledge_graph_METEOSTA001_W1.ttl
│   └── knowledge_graph_INVERTER01_W1.ttl
├── config/                        # SDM-RDFizer configuration
│   └── rdfizer_config.ini
├── logs/                          # Detailed execution logs
│   └── pipeline_20250614_163000.log
└── evaluation/                    # Performance reports
    └── evaluation_report_20250614_163000.json

License

This project is licensed under the MIT License.

Acknowledgments

Electricité De France (EDF) team and partners
École des mines de Saint-Étienne
The European project Omega-X for ontology specifications and energy data interoperability standards

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Competency-Questions		Competency-Questions
ETL-Pipeline		ETL-Pipeline
Images		Images
Input_CSV_Datasets		Input_CSV_Datasets
LLM_Dictionary		LLM_Dictionary
License		License
README.md		README.md

License

NaveenVarmaK/Pipeline_Mapping_IEC61850_OmegaX-CSDM

Folders and files

Latest commit

History

Repository files navigation

ETL Processing Pipeline

Omega-X Pattern for Energy data Modeling

IEC 61850 Device Modeling

Pipeline Overview

Omega-X Ontology Integration

Semantic Mapping Features

Ontology Compliance

Prerequisites

Required Dependencies

External Tools

Directory Structure

Quick Start

Basic Usage

Command Line Usage

Complete Pipeline Script

Required Arguments

Pipeline Output Options

CSV Processing Options

RML Generation Options

Knowledge Graph Options

Note: For Now When you try to use arguments --source-participant and --target-participant with --all-in-one you must also use --no-remove-duplicates. If not the generated output KG will not be valid.

GraphDB Import Options

Evaluation Options

Examples

Basic Pipeline Execution

Advanced Configuration

Evaluation Features

Performance Metrics

Knowledge Graph Analysis

Output Reports

Output Structure

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages