Releases: commondataio/dataportals-registry
Releases · commondataio/dataportals-registry
v1.7.0
Release v1.7.0 (February 24, 2026)
Summary
- 1,647 new catalog entries (net from v1.6.0)
- 3,432 catalog entries updated with refreshed metadata
- 3,472 catalog entries removed (inactive, duplicate, or consolidated)
- Export snapshot: 14,346 catalog records in catalogs.jsonl / full.jsonl
Changes
- Data quality rules and fixes (including API status mismatch handling)
- Subregion name/ID mismatch fixes (
fix_subregion_name_id_mismatch.py) - Regenerated datasets and quality reports
- 136 software definitions; 0 scheduled (all promoted or removed)
See CHANGELOG.md for full details.
v1.5.0 - February 2026
Changed
- Refreshed catalog metadata across entity YAML records and rebuilt generated dataset artifacts.
- Updated export snapshots in
README.mdto reflect latest counts.
Removed
- Removed legacy
History.md; changelog history is now maintained inCHANGELOG.md.
Snapshot
catalogs.jsonl: 12,604 catalog recordssoftware.jsonl: 136 software/platform definitionsscheduled.jsonl: 677 scheduled sourcesfull.jsonl: 13,281 combined entities + scheduled records
v1.4.0 - February 2026
Release v1.4.0 - February 9, 2026
Added
- 208 new catalog entries (12,489 total catalogs, up from 12,281)
- Many new CKAN data catalogs from ecosystem.ckan.org synchronization
- Reference data files for validation and consistency:
data/reference/access_modes.yaml- Standardized access mode valuesdata/reference/catalog_types.yaml- Allowed catalog type valuesdata/reference/software_ids.yaml- Comprehensive software ID mappingsdata/reference/status.yaml- Status value definitions
- New documentation:
devdocs/quality-fix-workflow.md- Guide for fixing data quality issuesdevdocs/scheduled-to-entities.md- Process for promoting scheduled entries to entitiesdocs/metadata-quality.md- Metadata quality standards and guidelines
- OpenSpec proposal for schema allowed values enhancement
Changed
- Schema validation enhanced with allowed values validation for key fields (access_mode, catalog_type, software.id, status)
- Raw JSONL files restored - Both compressed (.zst) and uncompressed versions now available
- Updated entity metadata across multiple catalog entries
- Rebuilt JSONL/Parquet exports and type/software slices (12,489 catalogs; 134 software platforms; 758 scheduled sources; 12,623 combined records)
- Documentation improvements:
- Enhanced AGENTS.md with OpenSpec workflow instructions
- Expanded CONTRIBUTING.md with quality fix workflow and scheduled-to-entities process
- Updated README.md with latest statistics and data export information
Fixed
- Various metadata gaps and inconsistencies in catalog entries
- Improved data quality through enhanced validation rules
Removed
- Legacy files cleaned up from repository
Statistics
- Total catalogs: 12,489 (up from 12,281)
- Software platforms: 134
- Scheduled entries: 758 (up from 749)
- Combined records: 12,623
v1.2.0 - Major Data Catalog Registry Update
Release v1.2.0 - 2025-11-21
Major Additions
- 1,993 new data catalog records across multiple countries and regions
- 1,515 ArcGIS Server instances - massive expansion of geoportal coverage
- 293 World-level catalogs - international and global data repositories
- 97 French data catalogs - significant expansion of French open data coverage
Geospatial Infrastructure Expansion
- 83 GeoServer instances
- 37 GeoNode installations
- 33 GeoNetwork catalogs
- 8 Lizmap instances
- 3 MapProxy instances
- 2 MapBender instances
Open Data Platforms
- 47 OpenDataSoft instances
- 42 CKAN portals
- 5 DKAN installations
Scientific Data Repositories
- 38 Figshare-based repositories
- 6 DSpace installations
- 6 NADA microdata catalogs
- 9 THREDDS servers
Improvements
- 363 records updated with improved metadata
- Updated API endpoints for IPT-based data catalogs
- Enhanced metadata completeness across multiple records
- Better geographic and administrative region coverage
Statistics
Record Changes
- New records: 1,993
- Modified records: 363
- Deleted records: 0
Software Types (Top 10)
- ArcGIS Server: 1,515
- Custom/Unknown: 89
- GeoServer: 83
- OpenDataSoft: 47
- CKAN: 42
- Figshare: 38
- GeoNode: 37
- GeoNetwork: 33
- ArcGIS Hub: 26
- THREDDS: 9
Catalog Types
- Geoportal: 1,726 (86.6%)
- Open data portal: 181 (9.1%)
- Scientific data repository: 68 (3.4%)
- Microdata catalog: 7
- Indicators catalog: 6
Geographic Coverage
- United States: 1,472 records (top states: Minnesota 54, California 51, Wisconsin 43, Ohio 42, Texas 39)
- World-level: 293 records
- France: 97 records
- Netherlands: 11 records
- Plus 30+ additional countries
See CHANGELOG.md for complete details and full statistics.
v1.1.0: Data Quality Analysis Tools
Added
- Comprehensive data quality analysis tool (
devdocs/analyze_duplicates_and_errors.py)- Detects duplicate UID's and ID's across all records
- Identifies missing required fields
- Finds filename mismatches (where
idfield doesn't match filename) - Reports empty files and YAML parsing errors
- Generates detailed reports in JSON, Markdown, and text formats
Changed
- Updated README.md with data quality and validation section
- Added documentation for analysis tools in
devdocs/directory
Fixed
- Identified 7 duplicate ID's (same ID in both entities and software directories)
- Identified 204 records missing required
uidfield - Identified 63 files with filename mismatches
- Identified 1 empty file requiring attention
See CHANGELOG.md for full details.