Python parser and metadata generator for Czech Core Metadata Model (CCMM).
pip install -r requirements.txtfrom pyccmm import CCMMHandler
# Initialize handler (uses bundled CCMM schemas)
handler = CCMMHandler()
# Set basic required fields
handler.set_title("My dataset")
handler.set_publication_year(2024)
# Add identifier
handler.add_identifier("12345", "DOI")
# Add description
handler.add_description("Dataset description")
# Validation
if handler.is_valid():
print("Dataset is valid!")
# Save to file
handler.save_to_file("metadata.xml")# Add metadata as needed
handler.add_subject("keyword", "CCMM")
handler.add_agent_relationship("Author", "creator", "person")
handler.add_distribution("https://example.com/data", "text/csv")
handler.add_location("Prague", "place")
handler.add_time_reference("2024-01-01", "created")# Load existing metadata file
handler = CCMMHandler()
if handler.load_from_file("existing_metadata.xml"):
print("File loaded successfully!")
print("Title:", handler.get_title())Main class for CCMM metadata manipulation.
__init__(ccmm_path: Optional[str] = None)- Initializes handler with path to CCMM schemas (uses bundled schemas if None)
set_title(title: str)- Set dataset titleset_publication_year(year: int)- Set publication yearadd_identifier(value: str, scheme: IdentifierScheme, iri: Optional[str] = None)- Add identifier
set_version(version: str)- Set versionadd_description(text: str)- Add descriptionadd_alternate_title(title: str, title_type: Optional[str] = None)- Add alternate titleadd_subject(subject: str, scheme: Optional[str] = None)- Add subject/keywordadd_agent_relationship(agent_name: str, role: AgentRole, agent_type: AgentType = AgentType.PERSON)- Add agent relationshipadd_distribution(access_url: str, format_type: Optional[DistributionFormat] = None)- Add distributionadd_location(location: str, location_type: LocationType)- Add locationadd_time_reference(time_value: str, time_type: TimeReferenceType)- Add time reference
is_valid() -> bool- Check overall validity (includes XSD validation)
load_from_file(xml_file_path: str) -> bool- Load from filesave_to_file(file_path: str)- Save to fileto_xml_string(pretty_print: bool = True) -> str- Convert to XML string
get_title() -> str- Get dataset titleget_publication_year() -> int- Get publication yearget_identifiers() -> List[Identifier]- Get all identifiersget_subjects() -> List[Subject]- Get all subjectsget_summary() -> Dict[str, Any]- Get dataset summary
The library supports the following CCMM structure:
- Dataset (root element)
- title (required)
- publication_year (required)
- identifier (required)
- version
- description
- alternate_title
- subject
- qualified_relation (agent relationships)
- distribution
- location
- time_reference
- and more...
This library includes the official CCMM schemas as a git submodule from the CCMM repository. The schemas are automatically bundled with the package and used for validation.
See examples/example_usage.py for complete usage examples.
lxml- for XSD validationxml.etree.ElementTree- for XML manipulation (part of Python standard library)typing- for type hints (part of Python standard library)
This project is licensed under the MIT License.
Contributions are welcome.
Developed by Roman Dvořák (romandvorak@mlab.cz) at the Institute of Physics of the Czech Academy of Sciences (FZU).