From 6402532e56b9be5475f84e05ee6c5579e29e6f2a Mon Sep 17 00:00:00 2001 From: Nicolas Dupont Date: Thu, 13 Nov 2025 17:11:06 +0100 Subject: [PATCH 1/3] Add datagouv dataset publishing documentation --- content/api/cli.md | 30 +++++++++++++++---- .../collections/reference/configuration.md | 20 ++++++++++++- .../reference/environment-variables.md | 6 ++-- 3 files changed, 47 insertions(+), 9 deletions(-) diff --git a/content/api/cli.md b/content/api/cli.md index 3d1440fe..cf5103d7 100644 --- a/content/api/cli.md +++ b/content/api/cli.md @@ -48,19 +48,37 @@ In these commands: ## Publishing dataset -{{< refItem name="ota dataset [--file ]" description="Export the versions dataset into a ZIP file and publish it to GitHub releases. The dataset title and the URL of the versions repository are defined in the configuration." example="npx ota dataset --file dataset.zip" />}} +{{< refItem name="ota dataset [--file ]" description="Export the versions dataset into a ZIP file. The dataset title is defined in the configuration." example="npx ota dataset --file dataset.zip" />}} -To export the dataset into a ZIP file and publish it on GitHub releases: +To export the dataset into a ZIP file and publish it to configured platforms (GitHub releases, GitLab releases, and/or data.gouv.fr): -{{< refItem name="ota dataset --publish [--file ]" description="Export and publish dataset to GitHub releases" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish" />}} +{{< refItem name="ota dataset --publish [--file ]" description="Export and publish dataset to all configured platforms" example="npx ota dataset --publish" />}} -The `GITHUB_TOKEN` can also be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}). +The dataset can be published to multiple platforms simultaneously: + +- **GitHub releases**: Requires `OTA_ENGINE_GITHUB_TOKEN` environment variable +- **GitLab releases**: Requires `OTA_ENGINE_GITLAB_TOKEN` environment variable (used only if GitHub token is not configured) +- **data.gouv.fr**: Requires `OTA_ENGINE_DATAGOUV_API_KEY` environment variable and `dataset.datagouv.datasetId` in configuration + +These environment variables can be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}). + +> **Note**: If both GitHub and GitLab tokens are configured, GitHub takes precedence. data.gouv.fr can be used alongside either GitHub or GitLab. To export, publish the dataset and remove the local copy that was created after it has been uploaded: -{{< refItem name="ota dataset --publish --remove-local-copy [--file ]" description="Export, publish dataset and remove local copy after upload" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish --remove-local-copy" />}} +{{< refItem name="ota dataset --publish --remove-local-copy [--file ]" description="Export, publish dataset and remove local copy after upload" example="npx ota dataset --publish --remove-local-copy" />}} + +{{< refItem name="ota dataset --schedule [--file ]" description="Schedule export, publishing and local copy removal" example="npx ota dataset --schedule --publish --remove-local-copy" />}} + +### Setting up data.gouv.fr publishing + +Before publishing to data.gouv.fr for the first time, you need to: -{{< refItem name="ota dataset --schedule [--file ]" description="Schedule export, publishing and local copy removal" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --schedule --publish --remove-local-copy" />}} +1. Create a dataset on [data.gouv.fr](https://www.data.gouv.fr/) or [demo.data.gouv.fr](https://demo.data.gouv.fr/) (for testing) +2. Copy the dataset ID from the URL (e.g., `6914a64b17a0a91bb0a61222`) +3. Add it to your configuration at `dataset.datagouv.datasetId` +4. Set `dataset.datagouv.useDemo` to `true` if using the demo environment +5. Set the `OTA_ENGINE_DATAGOUV_API_KEY` environment variable with your API key ## Exposing the collection API diff --git a/content/collections/reference/configuration.md b/content/collections/reference/configuration.md index ec4be876..8032c67e 100644 --- a/content/collections/reference/configuration.md +++ b/content/collections/reference/configuration.md @@ -180,7 +180,7 @@ The reporter section manages how issues are reported when terms content is inacc ### Dataset -The dataset section configures how datasets are published. +The dataset section configures how datasets are published. Datasets can be published to GitHub releases, GitLab releases, and/or data.gouv.fr. If both GitHub and GitLab tokens are configured, GitHub takes precedence. {{< refItem name="dataset.title" @@ -203,6 +203,24 @@ The dataset section configures how datasets are published. default="30 8 * * MON" />}} +#### data.gouv.fr publishing + +The data.gouv.fr section configures publishing to the French government's open data platform. + +{{< refItem + name="dataset.datagouv.datasetId" + type="string" + description="ID of the dataset on data.gouv.fr. Required for publishing to data.gouv.fr. Must be created manually on the data.gouv.fr website." + example="6914a64b17a0a91bb0a61222" +/>}} + +{{< refItem + name="dataset.datagouv.useDemo" + type="boolean" + description="Set to true to use the demo.data.gouv.fr environment for testing." + default="false" +/>}} + ### Collection API The collection API section sets the parameters for the API server. diff --git a/content/collections/reference/environment-variables.md b/content/collections/reference/environment-variables.md index 4c84277d..1e238322 100644 --- a/content/collections/reference/environment-variables.md +++ b/content/collections/reference/environment-variables.md @@ -13,12 +13,14 @@ This reference documentation details all available environment variables that ca {{< refItem name="OTA_ENGINE_SENDINBLUE_API_KEY" type="string" description="API key for SendInBlue." />}} -{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access." />}} +{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access and dataset publishing. If both GitHub and GitLab tokens are configured, GitHub takes precedence for dataset publishing." />}} -{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access." />}} +{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access and dataset publishing. Used only if GitHub token is not configured." />}} {{< refItem name="OTA_ENGINE_GITLAB_RELEASES_TOKEN" type="string" description="GitLab token for dataset releases." />}} +{{< refItem name="OTA_ENGINE_DATAGOUV_API_KEY" type="string" description="API key for data.gouv.fr dataset publishing. Requires dataset.datagouv.datasetId to be configured." />}} + --- ### Federation API From bdd4c14d3411c5c2e502a829e7ac7fabfdb93efc Mon Sep 17 00:00:00 2001 From: Nicolas Dupont Date: Thu, 20 Nov 2025 16:10:20 +0100 Subject: [PATCH 2/3] Update datagouv documentation --- content/api/cli.md | 12 +--------- .../collections/reference/configuration.md | 22 +++++++++++++++---- .../reference/environment-variables.md | 4 ++-- 3 files changed, 21 insertions(+), 17 deletions(-) diff --git a/content/api/cli.md b/content/api/cli.md index cf5103d7..eff6bd23 100644 --- a/content/api/cli.md +++ b/content/api/cli.md @@ -58,7 +58,7 @@ The dataset can be published to multiple platforms simultaneously: - **GitHub releases**: Requires `OTA_ENGINE_GITHUB_TOKEN` environment variable - **GitLab releases**: Requires `OTA_ENGINE_GITLAB_TOKEN` environment variable (used only if GitHub token is not configured) -- **data.gouv.fr**: Requires `OTA_ENGINE_DATAGOUV_API_KEY` environment variable and `dataset.datagouv.datasetId` in configuration +- **data.gouv.fr**: Requires `OTA_ENGINE_DATAGOUV_API_KEY` environment variable. To set up data.gouv.fr publishing, see the [guide to publish datasets to data.gouv.fr]({{< relref "collections/how-to/publish-to-datagouv" >}}). These environment variables can be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}). @@ -70,16 +70,6 @@ To export, publish the dataset and remove the local copy that was created after {{< refItem name="ota dataset --schedule [--file ]" description="Schedule export, publishing and local copy removal" example="npx ota dataset --schedule --publish --remove-local-copy" />}} -### Setting up data.gouv.fr publishing - -Before publishing to data.gouv.fr for the first time, you need to: - -1. Create a dataset on [data.gouv.fr](https://www.data.gouv.fr/) or [demo.data.gouv.fr](https://demo.data.gouv.fr/) (for testing) -2. Copy the dataset ID from the URL (e.g., `6914a64b17a0a91bb0a61222`) -3. Add it to your configuration at `dataset.datagouv.datasetId` -4. Set `dataset.datagouv.useDemo` to `true` if using the demo environment -5. Set the `OTA_ENGINE_DATAGOUV_API_KEY` environment variable with your API key - ## Exposing the collection API {{< refItem name="ota serve" description="Start the collection Web API server. The Web API will be available under `:///`. The server port and base path are defined in the configuration." example="npx ota serve" />}} diff --git a/content/collections/reference/configuration.md b/content/collections/reference/configuration.md index 8032c67e..203041d7 100644 --- a/content/collections/reference/configuration.md +++ b/content/collections/reference/configuration.md @@ -199,25 +199,39 @@ The dataset section configures how datasets are published. Datasets can be publi {{< refItem name="dataset.publishingSchedule" type="string" - description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM." + description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM. If publishing to data.gouv.fr, remember to update `dataset.datagouv.frequency` to match the actual publishing frequency." default="30 8 * * MON" />}} #### data.gouv.fr publishing -The data.gouv.fr section configures publishing to the French government's open data platform. +The data.gouv.fr section configures publishing to the French government's open data platform. Either `datasetId` or `organizationIdOrSlug` must be configured. {{< refItem name="dataset.datagouv.datasetId" type="string" - description="ID of the dataset on data.gouv.fr. Required for publishing to data.gouv.fr. Must be created manually on the data.gouv.fr website." + description="ID of an existing dataset on data.gouv.fr. Use this to publish to an existing dataset. Either this or `organizationIdOrSlug` is required." example="6914a64b17a0a91bb0a61222" />}} +{{< refItem + name="dataset.datagouv.organizationIdOrSlug" + type="string" + description="ID or slug of the organization on data.gouv.fr. Use this to automatically create and publish a dataset. The dataset will be created with the title from `dataset.title` if it doesn't exist. Either this or `datasetId` is required." + example="open-terms-archive" +/>}} + +{{< refItem + name="dataset.datagouv.frequency" + type="string" + description="Update frequency of the dataset. Used when creating or updating a dataset on data.gouv.fr. See [data.gouv.fr API](https://www.data.gouv.fr/api/1/datasets/frequencies/) for all allowed values." + example="weekly" +/>}} + {{< refItem name="dataset.datagouv.useDemo" type="boolean" - description="Set to true to use the demo.data.gouv.fr environment for testing." + description="Set to `true` to use the demo.data.gouv.fr environment for testing." default="false" />}} diff --git a/content/collections/reference/environment-variables.md b/content/collections/reference/environment-variables.md index 1e238322..a7dceaac 100644 --- a/content/collections/reference/environment-variables.md +++ b/content/collections/reference/environment-variables.md @@ -19,10 +19,10 @@ This reference documentation details all available environment variables that ca {{< refItem name="OTA_ENGINE_GITLAB_RELEASES_TOKEN" type="string" description="GitLab token for dataset releases." />}} -{{< refItem name="OTA_ENGINE_DATAGOUV_API_KEY" type="string" description="API key for data.gouv.fr dataset publishing. Requires dataset.datagouv.datasetId to be configured." />}} +{{< refItem name="OTA_ENGINE_DATAGOUV_API_KEY" type="string" description="API key for data.gouv.fr dataset publishing." />}} --- -### Federation API +### Federation API {{< refItem name="OTA_FEDERATION_API_SMTP_PASSWORD" type="string" description="SMTP password for email error notifications." />}} From cd9559a76cf64549615a09ba080788de099c6c09 Mon Sep 17 00:00:00 2001 From: Nicolas Dupont Date: Mon, 24 Nov 2025 11:36:33 +0100 Subject: [PATCH 3/3] Add guide to publish on data.gouv.fr --- .../collections/how-to/publish-to-datagouv.md | 119 ++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 content/collections/how-to/publish-to-datagouv.md diff --git a/content/collections/how-to/publish-to-datagouv.md b/content/collections/how-to/publish-to-datagouv.md new file mode 100644 index 00000000..74e4f16e --- /dev/null +++ b/content/collections/how-to/publish-to-datagouv.md @@ -0,0 +1,119 @@ +--- +title: Publish to data.gouv.fr +weight: 6 +--- + +# How to publish datasets to data.gouv.fr + +This guide explains how to configure your collection to automatically publish datasets to [data.gouv.fr](https://www.data.gouv.fr/), the French government's open data platform. + +## Prerequisites + +- A [data.gouv.fr](https://www.data.gouv.fr/) account +- An API key from your data.gouv.fr account settings +- Either an existing dataset or an organization where you can create datasets + +## Choose your approach + +There are two ways to publish datasets to data.gouv.fr: + +### Option 1: Automatically create a dataset in an organization + +This approach is suitable when you want the system to automatically create and manage the dataset within your organization. + +1. Find your organization on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing) +2. Copy the organization ID or slug from the URL (e.g., `open-terms-archive`) +3. Add it to your configuration at `dataset.datagouv.organizationIdOrSlug` +4. Set `dataset.title` in your configuration (this will be used as the dataset title) + +The dataset will be automatically created if it doesn't already exist in the organization. + +### Option 2: Use an existing dataset + +This approach is suitable when you already have a dataset created on data.gouv.fr and want to update it automatically. + +1. Create a dataset on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing) +2. Copy the dataset ID from the Informations tab on the dataset page (e.g., `6914a64b17a0a61222`) +3. Add it to your configuration at `dataset.datagouv.datasetId` + +## Configure your collection + +### 1. Add configuration settings + +In your collection's configuration file (e.g., `config/production.json`), add the `datagouv` settings under the `dataset` section: + +**For Option 1 (automatic creation):** + +```json +{ + "dataset": { + "title": " collection dataset", + "datagouv": { + "organizationIdOrSlug": "open-terms-archive" + } + } +} +``` + +**For Option 2 (existing dataset):** + +```json +{ + "dataset": { + "title": " collection dataset", + "datagouv": { + "datasetId": "6914a64b17a0a91bb0a61222" + } + } +} +``` + +### 3. Configure for testing (optional) + +If you want to test with the demo environment first, add `useDemo`: + +```json +{ + "dataset": { + "title": " collection dataset", + "datagouv": { + "organizationIdOrSlug": "open-terms-archive", + "useDemo": true + } + } +} +``` + +### 4. Set the API key + +Create a `.env` file at the root of your collection repository (if it doesn't already exist) and add your data.gouv.fr API key: + +``` +OTA_ENGINE_DATAGOUV_API_KEY=your_api_key_here +``` + +## Test your configuration + +You can test your configuration by manually publishing a dataset: + +```bash +npx ota dataset --publish +``` + +This will create and publish a dataset to data.gouv.fr. Check the output to verify the dataset was published successfully. + +## Set up automatic publishing + +To automatically publish datasets on a schedule, use the `--schedule` flag: + +```bash +npx ota dataset --schedule --publish --remove-local-copy +``` + +This will publish datasets according to the schedule defined in your configuration (by default, every Monday at 8:30 AM). + +## Publishing to multiple platforms + +data.gouv.fr publishing can be used alongside GitHub or GitLab releases. Simply configure both platforms and datasets will be published to all configured platforms simultaneously. + +See the [configuration reference]({{< relref "collections/reference/configuration#datagouv-fr-publishing" >}}) for all available options.