Table of Contents
- ckanext-wri
This is the WRI Open Data Portal extension for CKAN. It contains CKAN backend customizations for this project.
| Variable | Description |
|---|---|
ckanext.wri.odp_url |
The URL of the frontend application, e.g., https://wri.dev.frontend.datopian.com. |
ckanext.wri.prefect_url |
The URL of the Prefect API, e.g., https://wri.dev.prefect.datopian.com. |
ckanext.wri.datapusher_deployment_name |
The name of the Datapusher deployment, e.g, datapusher-dev. |
ckanext.wri.migration_deployment_name |
The name of the migration deployment, e.g., migration_deployment. |
ckanext.wri.migration_flow_name |
The name of the migration flow, e.g., trigger-migration. |
ckanext.wri.migration_deployment_env |
The environment of the migration deployment, e.g., dev, staging, prod. |
ckanext.wri.api_analytics.measurement_id |
The Google Analytics measurement ID for API tracking, e.g., G-XXXXXXXXXX. |
ckanext.wri.api_analytics.api_secret |
The Google Analytics API secret for API tracking, e.g., XXXXXXXXXX. |
For information on the Prefect variables/blocks required for migrations, see the migration README.
This extension includes the following plugins:
wri- The main plugin that loads ckanext-wri.wri_api_tracking- A plugin that tracks API usage with Google Analytics.wri_harvester- A custom harvester for ingesting datasets between WRI CKAN portals.
This extension includes a notification feature that utilizes its own database, action endpoints, and custom validators. The feature adds three actions
notification_createnotification_updatenotification_get_all
This extension adds a table named notifications to the database for the API notification feature. Initialization is required on the initial boot-up of the extension.
To initiate the database setup, use the following command:
ckan -c <path-to-ini-file> notificationdbParameters:
- recipient_id (string) – The user ID of the recipient of the notification (required).
- sender_id (string) – The user ID of the sender of the notification (required).
- activity_type (string) – The type of activity that triggers the notification, such as
dataset_create, etc. (required). - object_type (string) – The type of the object on which the action is being performed (e.g., dataset, resource, etc.) (required).
- object_id (string) – The ID of the object on which the action is being performed (required).
The parameters time_sent (set to the current timestamp), state (set as active) and is_unread (set to false) are automatically configured during creation.
Parameters:
- recipient_id (string) – The user ID of the recipient of the notification (required).
- sender_id (string) – The user ID of the sender of the notification (required).
- activity_type (string) – The type of activity that triggers the notification, such as
dataset_create, etc. (required). - object_type (string) – The type of the object on which the action is being performed (e.g., dataset, resource, etc.) (required).
- object_id (string) – The ID of the object on which the action is being performed (required).
- time_sent (datetime without timezone) – The timestamp of the sent time (required).
- is_unread (string) – Indicates whether the notification is read or not (required).
- state (string) –
activeordeleted(required).
Returns a list of notifications for a sender or recipient.
Parameters:
- recipient_id (string) – The user ID of the recipient of the notification (optional, but either
recipient_idorsender_idis required). - sender_id (string) – The user ID of the sender of the notification (optional, but either
recipient_idorsender_idis required).
A pending dataset is dataset metadata that's been submitted for approval. While pending, the dataset metadata lives in a separate table from the main package table, pending_datasets. Once approved, the existing dataset is updated with the new metadata.
The pending_datasets table has the following columns:
package_id |
package_data |
last_modified |
|---|---|---|
text (PK) |
jsonb |
timestamp |
The package_id column is the UUID of the dataset (and it's the primary key). The package_data column contains the dataset metadata as a JSONB object. The last_modified column is a timestamp that is automatically generated whenever package_data is updated.
You can initialize the pending dataset table by running the following command:
ckan -c <path-to-ini-file> pendingdatasetsdbParameters:
- package_id (string) – The UUID of the dataset (required).
- package_data (JSON object) – The dataset metadata (required).
Creates a new pending dataset and returns the newly created pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
- package_data (JSON object) – The dataset metadata (required).
Updates an existing pending dataset and returns the updated pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Deletes an existing pending dataset.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Returns the pending dataset for the given package_id.
Parameters:
- package_id (string) – The UUID of the dataset (required).
Returns the diff between the pending dataset and the existing dataset for the given package_id.
Here's an example:
{
"help": "http://ckan-dev:5000/api/3/action/help_show?name=pending_diff_show",
"success": true,
"result": {
"title": {
"old_value": "My dataset title",
"new_value": "My better dataset title"
},
"application": {
"old_value": "",
"new_value": "wri"
},
"resources[0].description": {
"old_value": "My resource description",
"new_value": "My better resource description"
},
"resources[0].format": {
"old_value": "CSV",
"new_value": "HTML"
},
"resources[1].title": {
"old_value": "My resource title",
"new_value": "My better resource title" },
"wri_data": {
"old_value": false,
"new_value": true
},
"cautions": {
"old_value": "",
"new_value": "This is a caution"
},
"languages": {
"old_value": [
"fr"
],
"new_value": [
"en"
]
},
"function": {
"old_value": "The function of this dataset is to x...",
"new_value": "The function of this dataset is to y..."
},
},
}This extension includes a migration API that allows users to migrate datasets from RW to CKAN.
Note: The functionality of this endpoint is limited to the user's permissions in CKAN. For example, if a user has the correct permissions to create a dataset but doesn't have permissions to add it to the Team or Topic specified, the request will return an authorization error.
Migrates an RW dataset/metadata to CKAN. It maps all supported RW fields to CKAN fields. All additional RW fields (except objects) are stored in the migration_extras field of the CKAN dataset. This endpoint handles both the creation and updating of datasets (this is determined automatically—no need to specify).
Parameters:
- rw_dataset_id (string) – The RW UUID of the dataset to migrate (required—unless
gfw_datasetis provided). Example:c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e. - rw_application (string) – The RW application of the dataset to migrate (required). Example:
rw. - dx_application (string) – The destination DX application name (group name) to associate the dataset with (required). Example:
land-carbon-lab. - dataset_slug (string) – The desired slug of the dataset to migrate (optional). If you use this option, you will need to include this parameter each time you call
migrate_datasetfor this dataset. This value will override theslugvalue from the RW/GFW APIs. Example:my-dataset. - dataset_title (string) – The desired title of the dataset to migrate (optional). If you use this option, you will need to include this parameter each time you call
migrate_datasetfor this dataset. This value will override thenamevalue from the RW API or thetitlevalue from the GFW API. Example:My Dataset. - gfw_dataset (string) – The GFW dataset to migrate (optional). If this dataset also has metadata in the RW API, you should also include
rw_dataset_id. Example:gfw_forest_data. - gfw_version (string) – The version of the GFW dataset to migrate (optional). Will default to the latest if a specific version isn't provided. Example:
v2020.01.01. - team (string) – The
name(slug) of the Team to associate the dataset with (optional). Example:land-carbon-lab. - topics (string) – A comma-separated list of Topic
slugs to associate the dataset with (optional). Example:atmosphere,biodiversity. - geographic_coverage (string) – The geographic coverage of the dataset (optional). Example:
Global. - authors (array of objects or string) – An array of objects containing the
nameandemailof the authors of the dataset (optional). Example:[{"name": "John Smith", "email": "john.smith@example.com"}, {"name": "Jane Smith", "email": "jane.smith@example.com"}]. Alternatively, you can use the following string format, with each author name and email separated by a colon (:) and each pair of author name/email separated by a semicolon (;). Example:John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com. - maintainers (array of objects or string) – An array of objects containing the
nameandemailof the maintainers of the dataset (optional). Example:[{"name": "John Smith", "email": "john.smith@example.com"}, {"name": "Jane Smith", "email": "jane.smith@example.com"}]. Alternatively, you can use the following string format, with each maintainer name and email separated by a colon (:) and each pair of maintainer name/email separated by a semicolon (;). Example:John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com. - layer_ids (string) – A comma-separated list of RW Layer UUIDs to associate with the dataset (optional). All other layers will be skipped. Example:
c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e,c0b5f4b1-4f3b-4f1e-8f1e-3f4b1f3b4f1e. - blacklist (string) – A comma-separated list of CKAN fields to exclude from the migration mapping (optional—cannot be used with
whitelist). Example:resources,noteswill exclude theresources(Layers) andnotes(Description) fields from the migration mapping. - whitelist (string) – A comma-separated list of CKAN fields to include in the migration mapping (optional—cannot be used with
blacklist). Example:title,noteswill only include thetitle(Title) andnotes(Description) fields in the migration mapping.
A successful request will return the Prefect status of the new migration job.
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/migrate_dataset?rw_dataset_id=c12446ce-174f-4ffb-b2f7-77ecb0116aba&rw_application=rw&dx_application=land-carbon-lab&team=migration-test&topics=lucas-topic,nov-16-topic"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=migration_status",
"success": true,
"result": {
"id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"created": "2024-04-19T16:25:50.064208+00:00",
"updated": "2024-04-19T16:26:09.039985+00:00",
"name": "masked-penguin",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": "659bece7fac5af816d61217219193235",
"parameters": {
"data_dict": {
"rw_dataset_id": "c12446ce-174f-4ffb-b2f7-77ecb0116aba",
"team": "migration-test",
"topics": [
"lucas-topic",
"nov-16-topic"
],
"rw_application": "rw",
"dx_application": "land-carbon-lab"
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": 0,
"retry_delay": 0,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "COMPLETED",
"state_name": "Completed",
"run_count": 1,
"expected_start_time": "2024-04-19T16:25:50.064030+00:00",
"next_scheduled_start_time": null,
"start_time": "2024-04-19T16:26:03.090073+00:00",
"end_time": "2024-04-19T16:26:09.038686+00:00",
"total_run_time": 5.948613,
"estimated_run_time": 5.948613,
"estimated_start_time_delta": 13.026043,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"type": "COMPLETED",
"name": "Completed",
"timestamp": "2024-04-19T16:26:09.038686+00:00",
"message": null,
"data": {
"type": "unpersisted"
},
"state_details": {
"flow_run_id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": null,
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
You'll need this ID: "id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471" (result.id) to check the status of the migration job at a later time.
Note: This endpoint is currently only available to system admins.
Triggers a full migration of RW datasets/metadata using the pre-defined datasets.csv file. Optionally, the file_name parameter can be used to point to another file (e.g. ?file_name=my_file.csv—only provide the file name, not the path, as the assumed path is migration/files/<FILE_NAME>). The files must be placed in the same directory as datasets.csv. This endpoint handles both the creation and updating of datasets (this is determined automatically—no need to specify).
Parameters:
- blacklist (string) – A comma-separated list of CKAN fields to exclude from the migration mapping (optional—cannot be used with
whitelist). Example:resources,noteswill exclude theresources(Layers) andnotes(Description) fields from the migration mapping. - whitelist (string) – A comma-separated list of CKAN fields to include in the migration mapping (optional—cannot be used with
blacklist). Example:title,noteswill only include thetitle(Title) andnotes(Description) fields in the migration mapping. - file_name (string) – The name of the file to use for the migration (optional). Example:
my_datasets.csv.
A successful request will return the Prefect status of the new migration job.
Same as above, but with the /trigger_migration endpoint.
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/trigger_migration"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=trigger_migration",
"success": true,
"result": {
"id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae",
"created": "2024-04-19T13:35:02.161350+00:00",
"updated": "2024-04-19T13:35:02.158656+00:00",
"name": "hospitable-barracuda",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "ecfa19b2-dd38-4434-a167-974fd8149b68",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": null,
"parameters": {
"data_dict": {
"is_bulk": true
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": null,
"retry_delay": null,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "SCHEDULED",
"state_name": "Scheduled",
"run_count": 0,
"expected_start_time": "2024-04-19T13:35:02.161117+00:00",
"next_scheduled_start_time": "2024-04-19T13:35:02.161117+00:00",
"start_time": null,
"end_time": null,
"total_run_time": 0,
"estimated_run_time": 0,
"estimated_start_time_delta": 0.035684,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "ecfa19b2-dd38-4434-a167-974fd8149b68",
"type": "SCHEDULED",
"name": "Scheduled",
"timestamp": "2024-04-19T13:35:02.161053+00:00",
"message": null,
"data": null,
"state_details": {
"flow_run_id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": "2024-04-19T13:35:02.161117+00:00",
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
You'll need this ID: "id": "7cd8a09e-1834-4ab5-8b72-bd638e9392ae" (result.id) to check the status of the migration job at a later time.
Add a custom file to the migration/files directory and commit it to the repo. Once deployed, you can use the file_name parameter to specify it. The file should be a CSV with the following columns:
rw_dataset_id(required—unlessgfw_datasetis provided)rw_application(required)dx_application(required)team(optional)topics(optional)geographic_coverage(optional)authors(optional)maintainers(optional)layer_ids(optional)layer_names(optional)gfw_dataset(optional—unlessrw_dataset_idisn't provided)gfw_version(optional)dataset_title(optional)dataset_slug(optional)
Note: authors and maintainers must follow the format Author One:Author One Email;Author Two:Author Two Email;Author Three:Author Three Email. Each author/maintainer name and email must be separated by a colon (:), and each author/maintainer pair must be separated by a semicolon (;). If there's only one author/maintainer, you shouldn't include a semicolon.
Example:
rw_dataset_id,gfw_dataset,rw_application,team,topics,geographic_coverage,authors,maintainers,layer_ids,dataset_title,dataset_slug,dx_application
d491f094-ad6e-4015-b248-1d1cd83667fa,,aqueduct-water-risk,aqueduct,"freshwater,surface-water-bodies",Global,,John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com,,An Aqueduct Dataset,an-aqueduct-dataset,aqueduct
b318381e-485d-46c9-8958-c9a9d75d7e91,,aqueduct-water-risk,aqueduct,"freshwater,water-risks",Global,John Smith:john.smith@example.com;Jane Smith:jane.smith@example.com,,,Another Aqueduct Dataset,another-aqueduct-dataset,aqueduct
,gfw_forest_flux_forest_age_category,gfw,global-forest-watch,"land,ghg-emissions,forest",,,John Smith:john.smith@example.com,,,,global-forest-watch
,gfw_forest_flux_removal_forest_type,gfw,global-forest-watch,"land,ghg-emissions,forest",,Jane Smith:jane.smith@example.com,John Smith:john.smith@example.com,,Another Title Example,,global-forest-watch
47a8e6cc-ea40-44a8-b1fc-6cf4fcc7d868,nasa_viirs_fire_alerts,gfw,global-forest-watch,"land,natural-hazards,forest",Global,,,2462cceb-41de-4bd2-8251-a6f75fe4e3d5,,another-slug-example,global-forest-watch
c92b6411-f0e5-4606-bbd9-138e40e50eb8,,gfw,global-forest-watch,"land,forest",,Jane Smith:jane.smith@example.com,,"0cba3c4f-2d3b-4fb1-8c93-c951dc1da84b,2351399c-ef2c-48da-9485-20698190acb0",,,global-forest-watchReturns the status of the specified migration job in Prefect.
Parameters:
- id (string) – The Prefect flow run ID (required). This is found at
result.idin the response from the/migrate_datasetor/trigger_migrationendpoints.
A successful request will return the current status of the migration job.
The following uses the flow run ID from the /migrate_dataset endpoint example above:
% curl -H "Authorization: YOUR_API_TOKEN" "https://wri.dev.ckan.datopian.com/api/3/action/migration_status?id=2b3d8bf5-80a1-4816-a2f0-55a97f720471"
{
"help": "https://wri.dev.ckan.datopian.com/api/3/action/help_show?name=migration_status",
"success": true,
"result": {
"id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"created": "2024-04-19T16:25:50.064208+00:00",
"updated": "2024-04-19T16:26:09.039985+00:00",
"name": "masked-penguin",
"flow_id": "0c5a71cd-ce9f-448e-8453-366cbb6944c0",
"state_id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"deployment_id": "7191012a-0572-4dfe-8e0d-be5de4acc39c",
"work_queue_id": null,
"work_queue_name": null,
"flow_version": "659bece7fac5af816d61217219193235",
"parameters": {
"data_dict": {
"id": "c12446ce-174f-4ffb-b2f7-77ecb0116aba",
"team": "migration-test",
"topics": [
"lucas-topic",
"nov-16-topic"
],
"rw_application": "rw",
"dx_application": "land-carbon-lab"
}
},
"idempotency_key": null,
"context": {},
"empirical_policy": {
"max_retries": 0,
"retry_delay_seconds": 0,
"retries": 0,
"retry_delay": 0,
"pause_keys": [],
"resuming": false
},
"tags": [],
"parent_task_run_id": null,
"state_type": "COMPLETED",
"state_name": "Completed",
"run_count": 1,
"expected_start_time": "2024-04-19T16:25:50.064030+00:00",
"next_scheduled_start_time": null,
"start_time": "2024-04-19T16:26:03.090073+00:00",
"end_time": "2024-04-19T16:26:09.038686+00:00",
"total_run_time": 5.948613,
"estimated_run_time": 5.948613,
"estimated_start_time_delta": 13.026043,
"auto_scheduled": false,
"infrastructure_document_id": null,
"infrastructure_pid": null,
"created_by": null,
"work_pool_id": null,
"work_pool_name": null,
"state": {
"id": "142982aa-2c10-4859-b2fd-68beb2be7bdf",
"type": "COMPLETED",
"name": "Completed",
"timestamp": "2024-04-19T16:26:09.038686+00:00",
"message": null,
"data": {
"type": "unpersisted"
},
"state_details": {
"flow_run_id": "2b3d8bf5-80a1-4816-a2f0-55a97f720471",
"task_run_id": null,
"child_flow_run_id": null,
"scheduled_time": null,
"cache_key": null,
"cache_expiration": null,
"untrackable_result": false,
"pause_timeout": null,
"pause_reschedule": false,
"pause_key": null,
"run_input_keyset": null,
"refresh_cache": null
}
}
}
}
The most important part of the response is the state object, which contains the current state of the migration job. The actual status can be found at result.state.name or result.state.type.
The following fields are supported for the blacklist and whitelist parameters (all other fields do not currently have a direct mapping):
cautions- Maps tocautionsin the RW dataset.citation- Maps tocitationin the RW dataset.function- Maps tofunctionorfunctionsin the RW dataset.language- Maps tolanguagein the RW dataset. Note: This field in CKAN requires an ISO 639-1 language code. If the incoming language is not an ISO 639-1 code, it will be ignored.learn_more- Maps to thelearn_more_linkin the RW dataset.notes- Maps todescriptionin the RW dataset.resources- Maps to the Layers in the RW dataset. Each Layer is mapped to a CKAN resource.title- Maps tonamein the RW dataset if exists, otherwise, uses the same slugified name.url- Maps to thedata_download_original_linkif exists, otherwise, usesdata_download_link. Note: This field in CKAN requires a valid URL. If the incoming URL is not valid, it will be ignored.
In all cases above, both the RW metadata and dataset are checked for a value, defaulting to the RW metadata value if it exists. In most cases, there's no comparable key/value in the RW dataset, but there are a few cases where the RW dataset has a key that's not in the RW metadata.
Most fields that are not mapped directly to CKAN are stored in a custom field called migration_extras (Note: some data is not migrated, such as objects, e.g., the full layer metadata, application config, etc.). This field holds a JSON object where the keys are the paths within the source dataset objects and the values are the values of those paths. For example:
{
... (other CKAN dataset fields) ...
"migration_extras": {
"dataset.dataPath": "None",
"metadata.info.sources.0.id": "0",
"metadata.info.citation": "National Geospatial Intelligence Agency. 2019. \"World Port Index.\" Accessed from https://msi.nga.mil/NGAPortal/MSI.portal?_nfpb=true&_pageLabel=msi_portal_page_62&pubCode=0015. Accessed through Resource Watch, (date). [www.resourcewatch.org](https://www.resourcewatch.org).",
"metadata.updatedAt": "2022-03-18T04:56:01.316Z",
"dataset.overwrite": "false",
"dataset.protected": "false",
"metadata.info.geographic_coverage": "Global",
"metadata.info.technical_title": "World Port Index",
"metadata.info.cautions": "Any changes to the port facilities that have occurred since the last publication will not be reflected. This reproduction, partial or complete, of any National Geospatial-Intelligence Agency (NGA), National Imagery and Mapping Agency (NIMA), or Defense Mapping Agency (DMA) product, information, or data is not approved, authorized, or endorsed by the Secretary of Defense, the Director of National Intelligence, the Director of the NGA, or any other element of the U.S. government. The U.S. government and the NGA accept no liability for the accuracy or quality of this reproduction or the use of any NGA, NIMA, or DMA products, information, or data.",
"dataset.taskId": "None",
"dataset.type": "tabular",
"dataset.application.0": "rw",
"metadata.info.data_download_link": "https://wri-public-data.s3.amazonaws.com/resourcewatch/com_017_rw2_major_ports.zip",
"dataset.tableName": "com_017_rw2_major_ports_edit",
"dataset.requested_application": "rw",
"metadata.resource.type": "dataset",
"dataset.name": "com.017.rw2 Major Ports",
"dataset.published": "true",
"dataset.dataLastUpdated": "None",
"dataset.subtitle": "None",
"metadata.info.data_type": "Vector",
"metadata.info.license_link": "https://creativecommons.org/share-your-work/public-domain/",
"metadata.dataset": "28d1f505-571c-4a52-8215-48ea02aa4928",
"metadata.createdAt": "2020-09-22T17:42:17.618Z",
"metadata.dataset_type": "metadata",
"metadata.application": "rw",
"dataset.createdAt": "2020-09-22T17:42:11.637Z",
"dataset.env": "production",
"dataset.updatedAt": "2020-09-25T13:05:42.080Z",
"metadata.language": "en",
"dataset.errorMessage": "",
"dataset.mainDateField": "None",
"metadata.info.sources.0.source-name": "",
"metadata.source": "NGA",
"metadata.info.sources.0.source-description": "National Geospatial-Intelligence Agency (NGA)",
"metadata.info.license": "Public domain",
"dataset.provider": "cartodb",
"dataset.connectorUrl": "https://wri-rw.carto.com/tables/com_017_rw2_major_ports_edit/public",
"metadata.resource.id": "28d1f505-571c-4a52-8215-48ea02aa4928",
"metadata.info.frequency_of_updates": "Varies",
"metadata.info.functions": "Locations, physical characteristics, facilities, and services offered by major ports around the world",
"metadata.info.learn_more_link": "https://msi.nga.mil/Publications/WPI",
"metadata.dataset_id": "5f6a3779b16c4d001a2f0f40",
"metadata.info.spatial_resolution": "None",
"metadata.name": "Major Ports",
"dataset.userId": "5efe38618e222c0010996c3c",
"dataset.dataset_type": "dataset",
"metadata.status": "published",
"metadata.info.name": "Major Ports",
"dataset.connectorType": "rest",
"metadata.info.date_of_content": "2019",
"metadata.info.data_download_original_link": "https://msi.nga.mil/Publications/WPI",
"metadata.info.rwId": "com.017.rw2",
"dataset.dataset_id": "28d1f505-571c-4a52-8215-48ea02aa4928",
"dataset.status": "saved",
"dataset.geoInfo": "true",
"dataset.slug": "com017rw2-Major-Ports",
"metadata.description": "The World Port Index is created by the Maritime Security Office of the National Geospatial-Intelligence Agency (NGA) to document the locations and features of major ports around the world. The Maritime Security Office requests that mariners send it corrections in plain language, which the office subsequently codes to create a consistent record of port facilities. Resource Watch shows only a subset of the data set. For access to the full data set and additional information, see the Learn More link. \n \n### Additional Information \n \nResource Watch shows only a subset of the dataset. For access to the full dataset and additional information, click on the “Learn more” button. \n \n### Disclaimer \n \nExcerpts of this description page were taken from the source metadata.",
"dataset.attributesPath": "None"
},
... (other CKAN dataset fields) ...
}This extension includes a custom harvester for the ingesting datasets from other instances of this project (e.g., harvesting datasets from Production to Staging). Most of the customizations are behind-the-scenes, but a new config option was added, and it will need to be set if you want to create new resources within the datasets instead of using links to the original resources.
In the harvest source configuration section, you can add the following option:
{
"create_resources": true
}
Along with the new config option, all of the existing harvest options should still work as expected, such as specifying organizations to include or exclude, using an API key/token, etc.
For example, let's assume that you want to:
- Create new resources (full resources, instead of merely metadata pointing to the URL of the original data)
- Create new organizations to match the source instance (if they don't exist already)
- Only harvest datasets from two source organizations (
global-forest-watchandland-carbon-lab), and - a single source group (
forests)
Your harvest source configuration would look something like this:
{
"remote_orgs": "create",
"remote_groups": "create",
"create_resources": true,
"organizations_filter_include": ["global-forest-watch", "land-carbon-lab"],
"groups_filter_include": ["forests"]
}
Note: This plugin requires a Google Analytics account with a Measurement ID and API Secret to work. For more information on how to set up Google Analytics, see the Google Analytics documentation. Steps to set up reports can be found in the main docs directory in this repo (see wri-odp/docs/ga-api/README.md).
This extension includes an optional plugin that sends API usage analytics to Google Analytics. To enable this feature, you need to add the wri_api_tracking plugin to the ckan.plugins list in your CKAN configuration file, just after the wri plugin:
ckan.plugins = ... wri wri_api_tracking ...
You also need to add the following configuration options to your CKAN configuration file (these are required if the wri_api_tracking plugin is enabled):
ckanext.wri.api_analytics.measurement_id = G-XXXXXXXXXX
ckanext.wri.api_analytics.api_secret = XXXXXXXXXXX
The plugin sends a simple event to Google Analytics whenever an API request is made. The event is tracked in Google Analytics as ckan_api with a few params/dimensions (action and user_agent being the most useful). This is the object that's sent to Google Analytics:
{
'client_id': cid, # This is either a randomly generated UUID or the hash hex digest of the current CKAN user.
'events': [
{
'name': 'ckan_api', # The event name, used to group all API events.
'params': {
'action': tk.request.environ['PATH_INFO'].split('/')[-1], # The API action, extracted from the URL (e.g., 'package_show').
'user_agent': tk.request.environ.get('HTTP_USER_AGENT', ''), # The user agent of the request (e.g., "curl/8.6.0", "ckanapi/4.7", "node", etc.).
'session_id': uuid.uuid4().hex, # A randomly generated UUID to trigger a session.
'engagement_time_msec': 1, # Engagement time set to 1, just to trigger user engagement.
},
}
],
}
See the CKAN Backend Development README for instructions on how to set up a local Docker CKAN backend development environment.
This extension lives in ckan-backend-dev/src/ckanext-wri. It is symlinked to the root of this repo for convenience and visibility.
Because it's part of this unified repo, if you need to make changes, you can do so directly in the ckanext-wri directory. There's no external repo to clone or fork (like other CKAN extensions), so you can just create a new branch off of dev, make your changes, and submit a PR.
The unit tests for this extension are run as part of the make unit-tests command in the ckan-backend-dev Docker development environment, but while developing, you can also run them alone. To do so, in another terminal window, go to ckan-backend-dev and run:
If the environment is not already running, start it:
make up
Then enter the Docker shell:
make shell
Once in the shell, navigate to the extension directory:
cd src_extensions/ckanext-wri
Finally, run the tests:
pytest --ckan-ini=test.ini ckanext/wri/tests