Note: The Pylint badge is a static indicator. For actual Pylint scores, see the automated Pylint reports in PR comments generated by our CI checks.
The Mapping Suite SDK, or MSSDK, is a software development kit (SDK) designed to standardize and simplify the handling of packages that contain transformation rules and related artefacts for mapping data from XML to RDF (RDF Mapping Language).
A mapping package is a standardized collection of files and directories that contains all the necessary components for transforming data from one format to another, specifically from XML to RDF using RDF Mapping Language (RML).
A mapping package consists of the following core components:
-
Metadata - Essential identifying information about the package including:
- Identifier
- Title
- Issue date
- Description
- Mapping version
- Ontology version
- Type
- Eligibility constraints
- Signature (hash digest for integrity verification)
-
Conceptual Mapping Asset - Excel spreadsheets that define high-level mapping concepts and relationships between source data and target ontologies.
-
Technical Mapping Suite - A collection of implementation-specific mapping files:
- RML Mapping files - Define transformations from heterogeneous data structures to RDF
-
Vocabulary Mapping Suite - Files that define specific value transformations and mappings between source and target data values (JSON, CSV, XML).
-
Test Data Suites - Collections of test data files used for validation and verification of mapping processes.
-
SPARQL Test Suites - Collections of SPARQL query files used for testing and validation of the transformed data.
-
SHACL Test Suites - Collections of SHACL (Shapes Constraint Language) files used for RDF data validation.
mapping-package/
├── metadata.json # Package metadata
├── transformation/ # Transformation assets
│ ├── conceptual_mappings.xlsx # Excel file with conceptual mappings
│ ├── mappings/ # Technical mapping suite
│ │ ├── mapping1.rml.ttl # RML mapping files
│ │ ├── mapping2.rml.ttl
│ │ └── mapping3.rml.ttl
│ └── resources/ # Vocabulary mapping suite
│ ├── codelist1.json # Value mapping files in various formats
│ └── codelist2.csv
├── validation/ # Validation assets
│ ├── shacl/ # SHACL test suites
│ │ └── shacl_suite1/ # Domain-specific SHACL shapes
│ │ └── shape1.ttl # SHACL shape files
│ └── sparql/ # SPARQL test suites
│ └── sparql_suite1/ # Category-specific SPARQL queries
│ ├── query1.rq # SPARQL query files
│ └── query2.rq
└── test_data/ # Test data suites
├── test_data_suite1/ # Test case directory
│ └── input.xml # Input test data
└── test_data_suite2/ # Another test case directory
└── input.xml # Input test data
This standardized structure ensures consistency across mapping packages and simplifies the process of loading, validating, and executing data transformations.
Install the SDK using pip:
pip install mapping-suite-sdkor using poetry:
poetry add mapping-suite-sdkThe SDK provides several ways to load mapping packages:
from pathlib import Path
import mapping_suite_sdk as mssdk
# Load from a local folder
package = mssdk.load_mapping_package_from_folder(
mapping_package_folder_path=Path("/path/to/mapping/package")
)
# Load from a ZIP archive
package = mssdk.load_mapping_package_from_archive(
mapping_package_archive_path=Path("/path/to/package.zip")
)
# Load from GitHub
packages = mssdk.load_mapping_packages_from_github(
github_repository_url="https://github.com/your-org/mapping-repo",
packages_path_pattern="mappings/package*",
branch_or_tag_name="main"
)# Serialize a mapping package to a dictionary
package_dict = mssdk.serialise_mapping_package(mapping_package)The SDK provides a CLI command to convert mapping packages between versions:
Convert a single mapping package from one version to another (in-place conversion):
mssdk convert --to-version v3 --from-version v2 \
from-package /path/to/mapping/packageConvert all mapping packages in a folder (in-place conversion):
mssdk convert --to-version v3 --from-version v2 \
from-folder /path/to/mappings/folderThe from-folder command will:
- Iterate through all subdirectories in the specified folder
- Convert each valid mapping package in-place
- Skip packages that cannot be converted (e.g., already in target version)
- Report a summary with counts of successful and failed conversions
Options:
--to-version: Target mapping package version (e.g.,v3)--from-version: Source mapping package version (e.g.,v2)--verbose, -v: Show detailed debug logs
The SDK provides flexible extractors for working with mapping packages from different sources.
Extract mapping packages from ZIP archives:
from pathlib import Path
from mapping_suite_sdk import ArchivePackageExtractor
extractor = ArchivePackageExtractor()
# Extract to a specific location
output_path = extractor.extract(
source_path=Path("package.zip"),
destination_path=Path("output_directory")
)
# Extract to a temporary location (automatically cleaned up)
with extractor.extract_temporary(Path("package.zip")) as temp_path:
# Work with files in temp_path
pass # Cleanup is automaticClone and extract mapping packages directly from GitHub repositories:
from mapping_suite_sdk import GithubPackageExtractor
extractor = GithubPackageExtractor()
# Extract multiple packages matching a pattern
with extractor.extract_temporary(
repository_url="https://github.com/org/repo",
packages_path_pattern="mappings/package*",
branch_or_tag_name="v1.0.0"
) as package_paths:
for path in package_paths:
# Process each package
print(f"Found package at: {path}")The SDK provides seamless integration with MongoDB for storing and retrieving mapping packages.
from pymongo import MongoClient
from mapping_suite_sdk import MongoDBRepository
from mapping_suite_sdk.models.mapping_package_v2 import MappingPackageABC
# Initialize MongoDB client
mongo_client = MongoClient("mongodb://localhost:27017/")
# Create a repository for mapping packages
repository = MongoDBRepository(
model_class=MappingPackageABC,
mongo_client=mongo_client,
database_name="mapping_suites",
collection_name="packages"
)from pathlib import Path
from mapping_suite_sdk import load_mapping_package_from_folder, load_mapping_package_from_mongo_db
# Load a package from a folder
package = load_mapping_package_from_folder(
mapping_package_folder_path=Path("/path/to/package")
)
# Store the package in MongoDB
repository.create(package)
# Retrieve the package by ID
retrieved_package = load_mapping_package_from_mongo_db(
mapping_package_id=package.id,
mapping_package_repository=repository
)
# Query multiple packages
packages = repository.read_many({"metadata.version": "1.0.0"})The SDK includes built-in support for OpenTelemetry tracing, which helps with performance monitoring and debugging.
from mapping_suite_sdk import set_mssdk_tracing, get_mssdk_tracing
# Enable tracing
set_mssdk_tracing(True)
# Check if tracing is enabled
is_enabled = get_mssdk_tracing()from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
from mapping_suite_sdk import add_span_processor_to_mssdk_tracer_provider
# Add a console exporter for tracing output
console_exporter = ConsoleSpanExporter()
span_processor = SimpleSpanProcessor(console_exporter)
add_span_processor_to_mssdk_tracer_provider(span_processor)from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from mapping_suite_sdk import add_span_processor_to_mssdk_tracer_provider, set_mssdk_tracing
# Configure and enable OpenTelemetry with OTLP exporter
otlp_exporter = OTLPSpanExporter(endpoint="localhost:4317", insecure=True)
span_processor = BatchSpanProcessor(otlp_exporter)
add_span_processor_to_mssdk_tracer_provider(span_processor)
set_mssdk_tracing(True)
# Now all SDK operations will be traced and sent to your collectorContributions to the Mapping Suite SDK are welcome! Use fork and pull request workflow.
# Clone the repository
git clone https://github.com/meaningfy-ws/mapping-suite-sdk.git
cd mapping-suite-sdk
# Install dependencies
# Use Makefile commands
make install
# Run tests
make test-unit- LinkML 1.9.5 onwards introduces breaking changes in our data
- Click 8.2 onwards introduces breaking changes in our CLI
- Pandas 2.1.4 and OpenTelemetry 1.29.0 are required due to a downstream consumer which relies on Airflow 2.10.x
- Issues: Report bugs and feature requests on our GitHub Issues
- Email: Contact the team at hi@meaningfy.ws
- Website: Visit our website at meaningfy.ws