A utility for loading data into CKAN from remote sources based on Python Pandas and ckanapi.
Ensure the required system libraries (libxml-dev, libxslt-dev, python-dev) are installed, example:- on Ubuntu/Debian based systems
sudo apt-get install libxml2-dev libxslt-dev python-dev
Installation is similar to most of other Python packages as a global python package or within a virtual enviroment.
NOTE: Using a Python virtual environment is not mandatory but it is highly recommended.
Using pip
pip install git+https://github.com/WorldBank-Transport/ckan-loaddata
Or by downloading or clonnning the source code then directly using setup.py
git clone https://github.com/WorldBank-Transport/ckan-loaddata.git cd ckan-loaddata python setup.py install
Or for development installation
git clone https://github.com/WorldBank-Transport/ckan-loaddata.git cd ckan-loaddata python setup.py develop
ckan_loaddata <path-to-your-yaml-task-file>
In order to automate periodic publishing of new dataset resources using
the ckan_loaddata command a CRON job can be used.
Your yaml task file can be in this format
---
address: <your-ckan-host>
apikey: <your-api-key>
resources:
- url: '<your-data-source-url>'
input:
format: '<input-file-format>'
# other input parameters
output:
format: <output-file-format>
# other output parameters
metadata:
package_id: 'your-ckan-package-id'
# resource-metadata
For example:
---
address: http:ckan.example.com
apikey: 'your-api-key'
resources:
- url: 'http://remote.example.com/remote-data-source-file-url.xls'
input:
format: excel
output:
format: csv
filename: '%Y-%m-your-target-resource-file-name.csv'
metadata:
package_id: 'your-package-id'
name: '%Y-%m: Your target resource title'
url: ''
format: csv
For more information about YAML file syntax you can check online
| address: | A root URL of the target CKAN instance. |
|---|---|
| apikey: | A CKAN API key. default: |
| user_agent: | The User Agent string. default: |
| resources: | a collection/list of resources that to be loaded. default: |
Each resource item in the resources collection may contain the following arguments
| url: | A full URL of the resource file to be loaded. |
||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| input: | Arguments to be use in processing the input resource file
|
||||||||||
| output: | Arguments to be use in uploading the resource to CKAN
|