Croissant RDF: merge-rdf and convert RDF back to Croissant JSON-LD#958
Open
david4096 wants to merge 7 commits intomlcommons:mainfrom
Open
Croissant RDF: merge-rdf and convert RDF back to Croissant JSON-LD#958david4096 wants to merge 7 commits intomlcommons:mainfrom
david4096 wants to merge 7 commits intomlcommons:mainfrom
Conversation
Some datasets return @context as a simple string (e.g., "https://schema.org") while others return it as a dict with @vocab and namespace prefixes. Updated test_fetch_data_workflow to handle both formats correctly.
Implements the reverse operation to regenerate Croissant JSON-LD from RDF files. This addresses one of the key objectives from issue mlcommons#850. Changes: - Add convert_from_rdf() method to CroissantHarvester - Create new rdf-to-jsonld CLI tool for easy conversion - Add comprehensive tests for round-trip conversion - Supports all RDF formats (Turtle, N-Triples, RDF/XML, etc.)
Implements the ability to merge RDF files from multiple Croissant providers into a unified knowledge graph. This addresses issue mlcommons#850 objective. Features: - Merge multiple RDF files with automatic deduplication - Support for various RDF formats (Turtle, N-Triples, RDF/XML, etc.) - CLI tool 'merge-rdf' for easy merging - Wildcard support for batch merging (e.g., *.ttl) - Output format selection (turtle, json-ld, n3, nt, xml) - Comprehensive tests for merging and deduplication Example: merge-rdf huggingface.ttl openml.ttl kaggle.ttl -o unified.ttl
Major improvements: - Added comprehensive Quick Start section with all providers - Documented new CLI tools: rdf-to-jsonld and merge-rdf - Added CLI tools reference table - Included practical use cases (cross-platform catalogs, bioinformatics KG) - Improved SPARQL query examples with better descriptions - Added architecture diagram - Reorganized development section for better clarity - Highlighted multi-provider and knowledge graph merging capabilities The README now reflects all new features implemented for issue mlcommons#850.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
eeb7446 to
f297487
Compare
Changed rdf-to-jsonld and merge-rdf from standalone commands to subcommands (to-jsonld and merge) under a unified croissant-rdf CLI. The old standalone commands remain for backward compatibility.
Added documentation for the new croissant-rdf command with to-jsonld and merge subcommands. Updated all usage examples to show the new unified CLI while noting that legacy commands remain available.
Only the unified croissant-rdf CLI with to-jsonld and merge subcommands is now available. Updated documentation accordingly.
Contributor
|
Looks good @david4096 ! I only wonder if the simple dispatches to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements features from issue #850 to enhance croissant-rdf with round-trip RDF conversion and multi-provider graph merging capabilities.
Changes
1. Fix test for varying @context formats
Some datasets return
@contextas a string while others return it as a dict. Updatedtest_fetch_data_workflowto handle both formats correctly.2. RDF to JSON-LD conversion
Implements the reverse operation to regenerate Croissant JSON-LD from RDF files.
convert_from_rdf()method inCroissantHarvesterrdf-to-jsonld3. Multi-provider RDF merging
Enables combining RDF files from multiple Croissant providers into unified knowledge graphs.
merge-rdfCLI tool4. Documentation improvements
Test Results
All tests passing: 21/21 (71% code coverage)
CLI Tools Added
rdf-to-jsonld: Convert RDF back to Croissant JSON-LDmerge-rdf: Merge multiple RDF files into unified graphsExample Usage
Addresses objectives from #850. @stefanches7