Skip to content

Commit f8f8476

Browse files
authored
Add data.gouv dataset publishing documentation (#204)
2 parents 5b82499 + cd9559a commit f8f8476

File tree

4 files changed

+172
-11
lines changed

4 files changed

+172
-11
lines changed

content/api/cli.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,19 +58,27 @@ In these commands:
5858

5959
## Publishing dataset
6060

61-
{{< refItem name="ota dataset [--file <filename>]" description="Export the versions dataset into a ZIP file and publish it to GitHub releases. The dataset title and the URL of the versions repository are defined in the configuration." example="npx ota dataset --file dataset.zip" />}}
61+
{{< refItem name="ota dataset [--file <filename>]" description="Export the versions dataset into a ZIP file. The dataset title is defined in the configuration." example="npx ota dataset --file dataset.zip" />}}
6262

63-
To export the dataset into a ZIP file and publish it on GitHub releases:
63+
To export the dataset into a ZIP file and publish it to configured platforms (GitHub releases, GitLab releases, and/or data.gouv.fr):
6464

65-
{{< refItem name="ota dataset --publish [--file <filename>]" description="Export and publish dataset to GitHub releases" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish" />}}
65+
{{< refItem name="ota dataset --publish [--file <filename>]" description="Export and publish dataset to all configured platforms" example="npx ota dataset --publish" />}}
6666

67-
The `GITHUB_TOKEN` can also be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}).
67+
The dataset can be published to multiple platforms simultaneously:
68+
69+
- **GitHub releases**: Requires `OTA_ENGINE_GITHUB_TOKEN` environment variable
70+
- **GitLab releases**: Requires `OTA_ENGINE_GITLAB_TOKEN` environment variable (used only if GitHub token is not configured)
71+
- **data.gouv.fr**: Requires `OTA_ENGINE_DATAGOUV_API_KEY` environment variable. To set up data.gouv.fr publishing, see the [guide to publish datasets to data.gouv.fr]({{< relref "collections/how-to/publish-to-datagouv" >}}).
72+
73+
These environment variables can be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}).
74+
75+
> **Note**: If both GitHub and GitLab tokens are configured, GitHub takes precedence. data.gouv.fr can be used alongside either GitHub or GitLab.
6876
6977
To export, publish the dataset and remove the local copy that was created after it has been uploaded:
7078

71-
{{< refItem name="ota dataset --publish --remove-local-copy [--file <filename>]" description="Export, publish dataset and remove local copy after upload" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish --remove-local-copy" />}}
79+
{{< refItem name="ota dataset --publish --remove-local-copy [--file <filename>]" description="Export, publish dataset and remove local copy after upload" example="npx ota dataset --publish --remove-local-copy" />}}
7280

73-
{{< refItem name="ota dataset --schedule [--file <filename>]" description="Schedule export, publishing and local copy removal" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --schedule --publish --remove-local-copy" />}}
81+
{{< refItem name="ota dataset --schedule [--file <filename>]" description="Schedule export, publishing and local copy removal" example="npx ota dataset --schedule --publish --remove-local-copy" />}}
7482

7583
## Exposing the collection API
7684

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: Publish to data.gouv.fr
3+
weight: 6
4+
---
5+
6+
# How to publish datasets to data.gouv.fr
7+
8+
This guide explains how to configure your collection to automatically publish datasets to [data.gouv.fr](https://www.data.gouv.fr/), the French government's open data platform.
9+
10+
## Prerequisites
11+
12+
- A [data.gouv.fr](https://www.data.gouv.fr/) account
13+
- An API key from your data.gouv.fr account settings
14+
- Either an existing dataset or an organization where you can create datasets
15+
16+
## Choose your approach
17+
18+
There are two ways to publish datasets to data.gouv.fr:
19+
20+
### Option 1: Automatically create a dataset in an organization
21+
22+
This approach is suitable when you want the system to automatically create and manage the dataset within your organization.
23+
24+
1. Find your organization on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing)
25+
2. Copy the organization ID or slug from the URL (e.g., `open-terms-archive`)
26+
3. Add it to your configuration at `dataset.datagouv.organizationIdOrSlug`
27+
4. Set `dataset.title` in your configuration (this will be used as the dataset title)
28+
29+
The dataset will be automatically created if it doesn't already exist in the organization.
30+
31+
### Option 2: Use an existing dataset
32+
33+
This approach is suitable when you already have a dataset created on data.gouv.fr and want to update it automatically.
34+
35+
1. Create a dataset on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing)
36+
2. Copy the dataset ID from the Informations tab on the dataset page (e.g., `6914a64b17a0a61222`)
37+
3. Add it to your configuration at `dataset.datagouv.datasetId`
38+
39+
## Configure your collection
40+
41+
### 1. Add configuration settings
42+
43+
In your collection's configuration file (e.g., `config/production.json`), add the `datagouv` settings under the `dataset` section:
44+
45+
**For Option 1 (automatic creation):**
46+
47+
```json
48+
{
49+
"dataset": {
50+
"title": "<collection_name> collection dataset",
51+
"datagouv": {
52+
"organizationIdOrSlug": "open-terms-archive"
53+
}
54+
}
55+
}
56+
```
57+
58+
**For Option 2 (existing dataset):**
59+
60+
```json
61+
{
62+
"dataset": {
63+
"title": "<collection_name> collection dataset",
64+
"datagouv": {
65+
"datasetId": "6914a64b17a0a91bb0a61222"
66+
}
67+
}
68+
}
69+
```
70+
71+
### 3. Configure for testing (optional)
72+
73+
If you want to test with the demo environment first, add `useDemo`:
74+
75+
```json
76+
{
77+
"dataset": {
78+
"title": "<collection_name> collection dataset",
79+
"datagouv": {
80+
"organizationIdOrSlug": "open-terms-archive",
81+
"useDemo": true
82+
}
83+
}
84+
}
85+
```
86+
87+
### 4. Set the API key
88+
89+
Create a `.env` file at the root of your collection repository (if it doesn't already exist) and add your data.gouv.fr API key:
90+
91+
```
92+
OTA_ENGINE_DATAGOUV_API_KEY=your_api_key_here
93+
```
94+
95+
## Test your configuration
96+
97+
You can test your configuration by manually publishing a dataset:
98+
99+
```bash
100+
npx ota dataset --publish
101+
```
102+
103+
This will create and publish a dataset to data.gouv.fr. Check the output to verify the dataset was published successfully.
104+
105+
## Set up automatic publishing
106+
107+
To automatically publish datasets on a schedule, use the `--schedule` flag:
108+
109+
```bash
110+
npx ota dataset --schedule --publish --remove-local-copy
111+
```
112+
113+
This will publish datasets according to the schedule defined in your configuration (by default, every Monday at 8:30 AM).
114+
115+
## Publishing to multiple platforms
116+
117+
data.gouv.fr publishing can be used alongside GitHub or GitLab releases. Simply configure both platforms and datasets will be published to all configured platforms simultaneously.
118+
119+
See the [configuration reference]({{< relref "collections/reference/configuration#datagouv-fr-publishing" >}}) for all available options.

content/collections/reference/configuration.md

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ The reporter section manages how issues are reported when terms content is inacc
180180

181181
### Dataset
182182

183-
The dataset section configures how datasets are published.
183+
The dataset section configures how datasets are published. Datasets can be published to GitHub releases, GitLab releases, and/or data.gouv.fr. If both GitHub and GitLab tokens are configured, GitHub takes precedence.
184184

185185
{{< refItem
186186
name="dataset.title"
@@ -199,10 +199,42 @@ The dataset section configures how datasets are published.
199199
{{< refItem
200200
name="dataset.publishingSchedule"
201201
type="string"
202-
description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM."
202+
description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM. If publishing to data.gouv.fr, remember to update `dataset.datagouv.frequency` to match the actual publishing frequency."
203203
default="30 8 * * MON"
204204
/>}}
205205

206+
#### data.gouv.fr publishing
207+
208+
The data.gouv.fr section configures publishing to the French government's open data platform. Either `datasetId` or `organizationIdOrSlug` must be configured.
209+
210+
{{< refItem
211+
name="dataset.datagouv.datasetId"
212+
type="string"
213+
description="ID of an existing dataset on data.gouv.fr. Use this to publish to an existing dataset. Either this or `organizationIdOrSlug` is required."
214+
example="6914a64b17a0a91bb0a61222"
215+
/>}}
216+
217+
{{< refItem
218+
name="dataset.datagouv.organizationIdOrSlug"
219+
type="string"
220+
description="ID or slug of the organization on data.gouv.fr. Use this to automatically create and publish a dataset. The dataset will be created with the title from `dataset.title` if it doesn't exist. Either this or `datasetId` is required."
221+
example="open-terms-archive"
222+
/>}}
223+
224+
{{< refItem
225+
name="dataset.datagouv.frequency"
226+
type="string"
227+
description="Update frequency of the dataset. Used when creating or updating a dataset on data.gouv.fr. See [data.gouv.fr API](https://www.data.gouv.fr/api/1/datasets/frequencies/) for all allowed values."
228+
example="weekly"
229+
/>}}
230+
231+
{{< refItem
232+
name="dataset.datagouv.useDemo"
233+
type="boolean"
234+
description="Set to `true` to use the demo.data.gouv.fr environment for testing."
235+
default="false"
236+
/>}}
237+
206238
### Collection API
207239

208240
The collection API section sets the parameters for the API server.

content/collections/reference/environment-variables.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,16 @@ This reference documentation details all available environment variables that ca
1313

1414
{{< refItem name="OTA_ENGINE_SENDINBLUE_API_KEY" type="string" description="API key for SendInBlue." />}}
1515

16-
{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access." />}}
16+
{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access and dataset publishing. If both GitHub and GitLab tokens are configured, GitHub takes precedence for dataset publishing." />}}
1717

18-
{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access." />}}
18+
{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access and dataset publishing. Used only if GitHub token is not configured." />}}
1919

2020
{{< refItem name="OTA_ENGINE_GITLAB_RELEASES_TOKEN" type="string" description="GitLab token for dataset releases." />}}
2121

22+
{{< refItem name="OTA_ENGINE_DATAGOUV_API_KEY" type="string" description="API key for data.gouv.fr dataset publishing." />}}
23+
2224
---
2325

24-
### Federation API
26+
### Federation API
2527

2628
{{< refItem name="OTA_FEDERATION_API_SMTP_PASSWORD" type="string" description="SMTP password for email error notifications." />}}

0 commit comments

Comments
 (0)