You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-1Lines changed: 11 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,6 +32,8 @@ Here's an example of a `pipeline-spec.yaml` file:
32
32
worldbank-co2-emissions:
33
33
title: CO2 emission data from the World Bank
34
34
description: Data per year, provided in metric tons per capita.
35
+
environment:
36
+
DEBUG: true
35
37
pipeline:
36
38
-
37
39
run: update_package
@@ -61,12 +63,20 @@ worldbank-co2-emissions:
61
63
62
64
In this example we see one pipeline called `worldbank-co2-emissions`. Its pipeline consists of 4 steps:
63
65
64
-
- `metadata`: This is a library processor (see below), which modifies the data-package's descriptor (in our case: the initial, empty descriptor) - adding `name`, `title` and other properties to the datapackage.
66
+
- `update_package`: This is a library processor (see below), which modifies the data-package's descriptor (in our case: the initial, empty descriptor) - adding `name`, `title` and other properties to the datapackage.
65
67
- `load`: This is another library processor, which loads data into the data-package.
66
68
This resource has a `name` and a `from` property, pointing to the remote location of the data.
67
69
- `set_types`: This processor assigns data types to fields in the data. In this example, field headers looking like years will be assigned the `number` type.
68
70
- `dump_to_zip`: Create a zipped and validated datapackage with the provided file name.
69
71
72
+
Also, we have provided some metadata:
73
+
74
+
- `title`: Title of a pipeline
75
+
- `description`: Description of a pipeline
76
+
- `environment`: Dictionary of environment variables to be set for all the pipeline's steps. For examples, it can be used to change the behaviour of the underlaying `requests` library - https://requests.readthedocs.io/en/master/user/advanced/#ssl-cert-verification
77
+
78
+
> Full JSONSchema of the `pipeline-spec.yaml` file can be found [here](https://github.com/frictionlessdata/datapackage-pipelines/blob/master/datapackage_pipelines/specs/schemas/pipeline-spec.schema.json)
79
+
70
80
### Mechanics
71
81
72
82
An important aspect of how the pipelines are run is the fact that data is passed in streams from one processor to another. If we get "technical" here, then each processor is run in its own dedicated process, where the datapackage is read from its `stdin` and output to its `stdout`. The important thing to note here is that no processor holds the entire data set at any point.
0 commit comments