Skip to content

Commit c58c8e2

Browse files
authored
Merge branch 'main' into fix_url
2 parents 20dac89 + 005cc18 commit c58c8e2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+8209
-1876
lines changed

.github/workflows/main.yml

Lines changed: 1293 additions & 0 deletions
Large diffs are not rendered by default.

.python-version

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
3.10.19

CHANGELOG.md

Lines changed: 61 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,72 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
66

77
## [Unreleased]
88

9+
## What's Changed
10+
* Update README with some fixes by @tino097 in https://github.com/dathere/datapusher-plus/pull/178
11+
* Druf apr2025 by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/180
12+
* Refactor upload log level by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/181
13+
* feat: zip file support by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/182
14+
* feat: shapefile support by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/183
15+
* Refactor jobs py by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/184
16+
* feat: make frequency limit configurable; move stats/freq copying to datastore from jobs.py to qsv_utils.py by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/185
17+
* Lat lon columns inferencing for use in Formulas by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/188
18+
* Configurable Date/Datetime inferencing and dataset stats by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/190
19+
* refactor: move pii-screening to a separate module by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/191
20+
* chore: add WIP geojson update by @rzmk in https://github.com/dathere/datapusher-plus/pull/186
21+
* "smart" formula spatial functions by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/192
22+
* Jobs cleanup by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/193
23+
* Fix datastore upload log timestamps by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/194
24+
* DCAT 3 formula helpers by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/195
25+
* fix: tmp input was being wrongfully assigned by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/197
26+
* refactored SQL-enabled formulas by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/199
27+
* auto unzip one file setting by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/200
28+
* add LRU caches to potentially expensive Formula methods by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/201
29+
* feat: add `dpp_suggestions.STATUS` to track formulae processing progress by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/202
30+
* refactor dpp_suggestions.STATUS to sync with Suggestions UI by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/203
31+
* Refactor: remove stats & freq table save to datastore by @jqnatividad in https://github.com/dathere/datapusher-plus/pull/204
32+
33+
## New Contributors
34+
* @rzmk made their first contribution in https://github.com/dathere/datapusher-plus/pull/186
35+
36+
**Full Changelog**: https://github.com/dathere/datapusher-plus/compare/2.0.0...2.1.0
37+
938
## [2.0.0] - 2025-04-25
1039

11-
## Highlights
12-
Data Resource Upload First (DRUF) Workflow is here!
40+
## 🎉 Data Resource Upload First (DRUF) Workflow is finally here! 🎉
1341
A workflow that flips the old CKAN traditional data ingestion on its head.
1442
* Instead of filling out the metadata first and then uploading the data, users upload data resources first
1543
* In a few seconds, even for very large datasets, analysis and validation is done while precompiling statistical metadata
16-
* This precompiled metadata are then used by formulas defined in the scheming yaml files to either precompute other metadata fields and/or to offer metadata suggestions
17-
* Formulas use the same powerful Jinja2 template engine that powers CKAN's templating system.
18-
* It comes with an extensible library of Jinja2 filters/functions that can be used in formulas ala Excel.
19-
20-
The DRUF reinvents CKAN data ingestion - making it easier for Data Publishers to ensure their data catalog has high-quality, high-resolution metadata that actually reflects the and describes the data in the catalog.
21-
44+
* This precompiled metadata are then used by Metadata Formulae defined in the [scheming](https://github.com/ckan/ckanext-scheming?tab=readme-ov-file#ckanext-scheming) yaml files to either precompute other metadata fields (on both package & resource levels) or to offer metadata suggestions
45+
* Metadata Formulae use the same powerful Jinja2 template engine that powers CKAN's templating system.
46+
* It comes with an extensible library of Jinja2 filters/functions that can be used in Metadata Formulae ala Excel.
47+
48+
The DRUF reinvents CKAN data ingestion - by automatically calculating/suggesting "**Automagical Metadata**" - high-quality, high-resolution metadata that reflects and describes what's **INSIDE** the dataset (e.g. summary stats; frequency table; spatial extent, date range, outliers, etc. calculated with Metadata Formulae) in addition to metadata about the dataset **FILE** (e.g. last updated, size of the file, owner, format, license, etc - what's normally found in traditional data catalogs).
49+
50+
Future improvements planned:
51+
- **Expanded Data Dictionary**
52+
53+
- **"entry-time" Metadata Formulae**
54+
In addition to the two formula types (`formula` to set a metadata field directly during creation/update; and `suggestion_formula` to suggest values using the Bootstap Popover UI), we'll add the ability to allow Data Publishers to enter formulas while they're entering metadata - fully embracing the Excel formula UI/UX aesthetic.
55+
- **DCAT3-optimized reference profiles**
56+
Following implementation guidance for both [DCAT-US v3](https://doi-do.github.io/dcat-us/) and [DCAT-AP 3](https://semiceu.github.io/DCAT-AP/releases/3.0.0/) scheming profiles with Metadata Formulae to compute recommended and optional properties that allow publishers to more fully take advantage of DCAT3 features and improvements - metadata properties that are often too laborious to manually compile.
57+
- **Co-Curator AI**
58+
"Automagical metadata" is the perfect context for AI engines - as it summarizes even very large datasets in just a few kilobytes. It allows the Co-Curator[^1] to suggest tags, descriptions, links to related data sets and chat about the corpus WHILE the Data Publisher is curating the data.
59+
- **Inline Data Validation**
60+
Optional ability to [infer an initial JSON Schema validation file](https://github.com/dathere/qsv?tab=readme-ov-file#schema_deeplink), and then [validate future updates](https://github.com/dathere/qsv?tab=readme-ov-file#validate_deeplink) to the dataset using it, leveraging the same blazing-fast qsv engine (validating up to 340,000 records/per second[^2]).
61+
- **Customizable DRUF Data ingestion pipeline**
62+
Currently, there are [numerous configuration settings](https://github.com/dathere/datapusher-plus/blob/main/ckanext/datapusher_plus/config.py) to fine-tune the DRUF data-ingestion pipeline. However, the built-in default pipeline can only be customized to a limit without customizing the code. We will expose hooks that CKAN operators can take advantage of to tailor their DRUF pipelines to meet their requirements, while preserving the ability to access the precompiled statistical metadata that DP+ maintains.
63+
- **Dynamic loading of Formula filters/functions**
64+
So users can share custom Jinja2 filters and functions they developed for their Metadata Formulae.
65+
- **Inline Data Enrichment**
66+
Data can be optionally enriched while it's being ingested from other reference datasets within the same CKAN instance or external sources (e.g. enriched against high value curated sources like the Census; geocoding, etc.)
67+
- **and more!**
68+
It took a while for us to bake 2.0.0, but we look forward to picking up the pace and co-innovating with the CKAN ecosystem.
69+
70+
71+
> NOTE: To fully experience the DRUF workflow, you'll need to use [scheming dataset form pages](https://excess.org/scheming-formpages/) and apply some CKAN core changes. A detailed installation procedure will be published on the Wiki shortly.
72+
73+
[^1]: Inspired by the [Curator in Ready Player One](https://hero.fandom.com/wiki/Curator)
74+
[^2]: `validate_index benchmark` - https://qsv.dathere.com/benchmarks
2275
---
2376

2477
### Added

CONFIG.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# DataPusher Plus Configuration
2+
3+
## Optional Features
4+
5+
DataPusher Plus includes some optional features that can be enabled through configuration. These features are disabled by default to ensure compatibility with different CKAN versions.
6+
7+
### IFormRedirect Support
8+
9+
The IFormRedirect interface provides custom redirect behavior after dataset and resource form submissions. This interface is only available in certain CKAN branches and is not yet merged into the main CKAN codebase.
10+
11+
**Note**: IFormRedirect methods are only defined when this feature is enabled, keeping the plugin completely clean when disabled.
12+
13+
**Configuration:**
14+
```ini
15+
# Enable IFormRedirect functionality (default: false)
16+
ckanext.datapusher_plus.enable_form_redirect = true
17+
```
18+
19+
**What it does:**
20+
- **Dynamically adds IFormRedirect methods** only when enabled
21+
- Provides custom redirect URLs after dataset/resource creation or editing
22+
- Redirects to dataset page after dataset metadata submission
23+
- Redirects to resource view after resource editing
24+
- Allows "add another resource" workflow
25+
- **Works best with DRUF** for complete resource-first workflow
26+
27+
**Requirements:**
28+
- CKAN version with IFormRedirect interface support
29+
- If the interface is not available, the feature will be automatically disabled with a warning
30+
- **Recommended**: Enable together with DRUF for optimal resource-first experience
31+
32+
### DRUF (Dataset Resource Upload First) Support
33+
34+
DRUF allows users to upload resources before creating the dataset metadata, providing a resource-first workflow.
35+
36+
**Configuration:**
37+
```ini
38+
# Enable DRUF functionality (default: false)
39+
ckanext.datapusher_plus.enable_druf = true
40+
```
41+
42+
**What it does:**
43+
- Adds a `/resource-first/new` endpoint
44+
- Creates a temporary dataset and redirects to resource upload
45+
- Useful for workflows where users want to upload data files first
46+
- **Overrides templates**: Modifies "Add Dataset" buttons and form stages to support resource-first workflow
47+
48+
**Template Overrides:**
49+
When DRUF is enabled, the following templates are overridden:
50+
- `snippets/add_dataset.html`: Changes "Add Dataset" to redirect to resource upload
51+
- `package/snippets/package_form.html`: Modifies form stages to show "Add data" first
52+
- `scheming/package/snippets/package_form.html`: Modifies scheming form stages
53+
54+
**Requirements:**
55+
- No special CKAN version requirements
56+
- Works with standard CKAN installations
57+
- Compatible with ckanext-scheming
58+
59+
## Example Configuration
60+
61+
Add these lines to your CKAN configuration file (e.g., `/etc/ckan/default/ckan.ini`):
62+
63+
```ini
64+
# Enable DRUF (Dataset Resource Upload First) workflow
65+
ckanext.datapusher_plus.enable_druf = true
66+
67+
# Enable IFormRedirect for better form redirects (recommended with DRUF)
68+
ckanext.datapusher_plus.enable_form_redirect = true
69+
```
70+
71+
**Recommended combinations:**
72+
- **Standard mode**: Both disabled (default) - maintains standard CKAN behavior
73+
- **Resource-first workflow**: Both enabled - complete resource-first experience
74+
- **DRUF only**: Only `enable_druf = true` - resource-first without custom redirects
75+
76+
## Template Organization
77+
78+
DataPusher Plus uses a conditional template loading system to avoid conflicts when optional features are disabled:
79+
80+
- **Base templates** (`templates/`): Always loaded, provides standard DataPusher Plus functionality
81+
- **DRUF templates** (`templates/druf/`): Only loaded when `enable_druf = true`, overrides default dataset creation workflow
82+
83+
This ensures that when DRUF is disabled, your CKAN installation maintains completely standard behavior without any template modifications.
84+
85+
## Backwards Compatibility
86+
87+
When these features are disabled (default), DataPusher Plus maintains full backwards compatibility with standard CKAN installations. The plugin will automatically detect if required interfaces are available and disable features gracefully if they are not supported.
88+
89+
## Logging
90+
91+
The plugin will log the status of these features:
92+
- Info messages when features are successfully enabled
93+
- Warning messages when features are configured but not available
94+
- Debug messages for DRUF blueprint registration
95+
96+
Check your CKAN logs to verify the status of these optional features.

0 commit comments

Comments
 (0)