Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 14 additions & 6 deletions content/api/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,19 +48,27 @@ In these commands:

## Publishing dataset

{{< refItem name="ota dataset [--file <filename>]" description="Export the versions dataset into a ZIP file and publish it to GitHub releases. The dataset title and the URL of the versions repository are defined in the configuration." example="npx ota dataset --file dataset.zip" />}}
{{< refItem name="ota dataset [--file <filename>]" description="Export the versions dataset into a ZIP file. The dataset title is defined in the configuration." example="npx ota dataset --file dataset.zip" />}}

To export the dataset into a ZIP file and publish it on GitHub releases:
To export the dataset into a ZIP file and publish it to configured platforms (GitHub releases, GitLab releases, and/or data.gouv.fr):

{{< refItem name="ota dataset --publish [--file <filename>]" description="Export and publish dataset to GitHub releases" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish" />}}
{{< refItem name="ota dataset --publish [--file <filename>]" description="Export and publish dataset to all configured platforms" example="npx ota dataset --publish" />}}

The `GITHUB_TOKEN` can also be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}).
The dataset can be published to multiple platforms simultaneously:

- **GitHub releases**: Requires `OTA_ENGINE_GITHUB_TOKEN` environment variable
- **GitLab releases**: Requires `OTA_ENGINE_GITLAB_TOKEN` environment variable (used only if GitHub token is not configured)
- **data.gouv.fr**: Requires `OTA_ENGINE_DATAGOUV_API_KEY` environment variable. To set up data.gouv.fr publishing, see the [guide to publish datasets to data.gouv.fr]({{< relref "collections/how-to/publish-to-datagouv" >}}).

These environment variables can be defined in a [`.env` file]({{< relref "collections/reference/environment-variables" >}}).

> **Note**: If both GitHub and GitLab tokens are configured, GitHub takes precedence. data.gouv.fr can be used alongside either GitHub or GitLab.

To export, publish the dataset and remove the local copy that was created after it has been uploaded:

{{< refItem name="ota dataset --publish --remove-local-copy [--file <filename>]" description="Export, publish dataset and remove local copy after upload" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --publish --remove-local-copy" />}}
{{< refItem name="ota dataset --publish --remove-local-copy [--file <filename>]" description="Export, publish dataset and remove local copy after upload" example="npx ota dataset --publish --remove-local-copy" />}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The display of examples does not seem to take double dashes into account.

Image


{{< refItem name="ota dataset --schedule [--file <filename>]" description="Schedule export, publishing and local copy removal" example="GITHUB_TOKEN=ghp_XXXXXXXXX npx ota dataset --schedule --publish --remove-local-copy" />}}
{{< refItem name="ota dataset --schedule [--file <filename>]" description="Schedule export, publishing and local copy removal" example="npx ota dataset --schedule --publish --remove-local-copy" />}}

## Exposing the collection API

Expand Down
119 changes: 119 additions & 0 deletions content/collections/how-to/publish-to-datagouv.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: Publish to data.gouv.fr
weight: 6
---

# How to publish datasets to data.gouv.fr

This guide explains how to configure your collection to automatically publish datasets to [data.gouv.fr](https://www.data.gouv.fr/), the French government's open data platform.

## Prerequisites

- A [data.gouv.fr](https://www.data.gouv.fr/) account
- An API key from your data.gouv.fr account settings
- Either an existing dataset or an organization where you can create datasets

## Choose your approach

There are two ways to publish datasets to data.gouv.fr:

### Option 1: Automatically create a dataset in an organization

This approach is suitable when you want the system to automatically create and manage the dataset within your organization.

1. Find your organization on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing)
2. Copy the organization ID or slug from the URL (e.g., `open-terms-archive`)
3. Add it to your configuration at `dataset.datagouv.organizationIdOrSlug`
4. Set `dataset.title` in your configuration (this will be used as the dataset title)

The dataset will be automatically created if it doesn't already exist in the organization.

### Option 2: Use an existing dataset

This approach is suitable when you already have a dataset created on data.gouv.fr and want to update it automatically.

1. Create a dataset on [data.gouv.fr](https://www.data.gouv.fr/) (or [demo.data.gouv.fr](https://demo.data.gouv.fr/) for testing)
2. Copy the dataset ID from the Informations tab on the dataset page (e.g., `6914a64b17a0a61222`)
3. Add it to your configuration at `dataset.datagouv.datasetId`

## Configure your collection

### 1. Add configuration settings

In your collection's configuration file (e.g., `config/production.json`), add the `datagouv` settings under the `dataset` section:

**For Option 1 (automatic creation):**

```json
{
"dataset": {
"title": "<collection_name> collection dataset",
"datagouv": {
"organizationIdOrSlug": "open-terms-archive"
}
}
}
```

**For Option 2 (existing dataset):**

```json
{
"dataset": {
"title": "<collection_name> collection dataset",
"datagouv": {
"datasetId": "6914a64b17a0a91bb0a61222"
}
}
}
```

### 3. Configure for testing (optional)

If you want to test with the demo environment first, add `useDemo`:

```json
{
"dataset": {
"title": "<collection_name> collection dataset",
"datagouv": {
"organizationIdOrSlug": "open-terms-archive",
"useDemo": true
}
}
}
```

### 4. Set the API key

Create a `.env` file at the root of your collection repository (if it doesn't already exist) and add your data.gouv.fr API key:

```
OTA_ENGINE_DATAGOUV_API_KEY=your_api_key_here
```

## Test your configuration

You can test your configuration by manually publishing a dataset:

```bash
npx ota dataset --publish
```

This will create and publish a dataset to data.gouv.fr. Check the output to verify the dataset was published successfully.

## Set up automatic publishing

To automatically publish datasets on a schedule, use the `--schedule` flag:

```bash
npx ota dataset --schedule --publish --remove-local-copy
```

This will publish datasets according to the schedule defined in your configuration (by default, every Monday at 8:30 AM).

## Publishing to multiple platforms

data.gouv.fr publishing can be used alongside GitHub or GitLab releases. Simply configure both platforms and datasets will be published to all configured platforms simultaneously.

See the [configuration reference]({{< relref "collections/reference/configuration#datagouv-fr-publishing" >}}) for all available options.
36 changes: 34 additions & 2 deletions content/collections/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ The reporter section manages how issues are reported when terms content is inacc

### Dataset

The dataset section configures how datasets are published.
The dataset section configures how datasets are published. Datasets can be published to GitHub releases, GitLab releases, and/or data.gouv.fr. If both GitHub and GitLab tokens are configured, GitHub takes precedence.

{{< refItem
name="dataset.title"
Expand All @@ -199,10 +199,42 @@ The dataset section configures how datasets are published.
{{< refItem
name="dataset.publishingSchedule"
type="string"
description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM."
description="Cron expression for dataset publishing. By default, it runs every Monday at 8:30 AM. If publishing to data.gouv.fr, remember to update `dataset.datagouv.frequency` to match the actual publishing frequency."
default="30 8 * * MON"
/>}}

#### data.gouv.fr publishing

The data.gouv.fr section configures publishing to the French government's open data platform. Either `datasetId` or `organizationIdOrSlug` must be configured.

{{< refItem
name="dataset.datagouv.datasetId"
type="string"
description="ID of an existing dataset on data.gouv.fr. Use this to publish to an existing dataset. Either this or `organizationIdOrSlug` is required."
example="6914a64b17a0a91bb0a61222"
/>}}

{{< refItem
name="dataset.datagouv.organizationIdOrSlug"
type="string"
description="ID or slug of the organization on data.gouv.fr. Use this to automatically create and publish a dataset. The dataset will be created with the title from `dataset.title` if it doesn't exist. Either this or `datasetId` is required."
example="open-terms-archive"
/>}}

{{< refItem
name="dataset.datagouv.frequency"
type="string"
description="Update frequency of the dataset. Used when creating or updating a dataset on data.gouv.fr. See [data.gouv.fr API](https://www.data.gouv.fr/api/1/datasets/frequencies/) for all allowed values."
example="weekly"
/>}}

{{< refItem
name="dataset.datagouv.useDemo"
type="boolean"
description="Set to `true` to use the demo.data.gouv.fr environment for testing."
default="false"
/>}}

### Collection API

The collection API section sets the parameters for the API server.
Expand Down
8 changes: 5 additions & 3 deletions content/collections/reference/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,16 @@ This reference documentation details all available environment variables that ca

{{< refItem name="OTA_ENGINE_SENDINBLUE_API_KEY" type="string" description="API key for SendInBlue." />}}

{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access." />}}
{{< refItem name="OTA_ENGINE_GITHUB_TOKEN" type="string" description="GitHub token for API access and dataset publishing. If both GitHub and GitLab tokens are configured, GitHub takes precedence for dataset publishing." />}}

{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access." />}}
{{< refItem name="OTA_ENGINE_GITLAB_TOKEN" type="string" description="GitLab token for API access and dataset publishing. Used only if GitHub token is not configured." />}}

{{< refItem name="OTA_ENGINE_GITLAB_RELEASES_TOKEN" type="string" description="GitLab token for dataset releases." />}}

{{< refItem name="OTA_ENGINE_DATAGOUV_API_KEY" type="string" description="API key for data.gouv.fr dataset publishing." />}}

---

### Federation API
### Federation API

{{< refItem name="OTA_FEDERATION_API_SMTP_PASSWORD" type="string" description="SMTP password for email error notifications." />}}