Skip to content

Commit d78d139

Browse files
authored
Added support for setting environment variables in the pipeline-spec.yaml (#182)
* Added spec.environment * Updated readme, schema * Added documentation * Fixed linting
1 parent 30f627e commit d78d139

File tree

6 files changed

+29
-4
lines changed

6 files changed

+29
-4
lines changed

README.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ Here's an example of a `pipeline-spec.yaml` file:
3232
worldbank-co2-emissions:
3333
title: CO2 emission data from the World Bank
3434
description: Data per year, provided in metric tons per capita.
35+
environment:
36+
DEBUG: true
3537
pipeline:
3638
-
3739
run: update_package
@@ -61,12 +63,20 @@ worldbank-co2-emissions:
6163
6264
In this example we see one pipeline called `worldbank-co2-emissions`. Its pipeline consists of 4 steps:
6365

64-
- `metadata`: This is a library processor (see below), which modifies the data-package's descriptor (in our case: the initial, empty descriptor) - adding `name`, `title` and other properties to the datapackage.
66+
- `update_package`: This is a library processor (see below), which modifies the data-package's descriptor (in our case: the initial, empty descriptor) - adding `name`, `title` and other properties to the datapackage.
6567
- `load`: This is another library processor, which loads data into the data-package.
6668
This resource has a `name` and a `from` property, pointing to the remote location of the data.
6769
- `set_types`: This processor assigns data types to fields in the data. In this example, field headers looking like years will be assigned the `number` type.
6870
- `dump_to_zip`: Create a zipped and validated datapackage with the provided file name.
6971

72+
Also, we have provided some metadata:
73+
74+
- `title`: Title of a pipeline
75+
- `description`: Description of a pipeline
76+
- `environment`: Dictionary of environment variables to be set for all the pipeline's steps. For examples, it can be used to change the behaviour of the underlaying `requests` library - https://requests.readthedocs.io/en/master/user/advanced/#ssl-cert-verification
77+
78+
> Full JSONSchema of the `pipeline-spec.yaml` file can be found [here](https://github.com/frictionlessdata/datapackage-pipelines/blob/master/datapackage_pipelines/specs/schemas/pipeline-spec.schema.json)
79+
7080
### Mechanics
7181

7282
An important aspect of how the pipelines are run is the fact that data is passed in streams from one processor to another. If we get "technical" here, then each processor is run in its own dedicated process, where the datapackage is read from its `stdin` and output to its `stdout`. The important thing to note here is that no processor holds the entire data set at any point.

datapackage_pipelines/manager/runner.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,9 @@ def run_pipelines(pipeline_id_pattern,
224224
continue
225225

226226
if slave:
227+
# Set environment variables for the pipeline
228+
for key, value in spec.environment.items():
229+
os.environ[key] = str(value)
227230
ps = status_manager.get(spec.pipeline_id)
228231
ps.init(spec.pipeline_details,
229232
spec.source_details,

datapackage_pipelines/specs/parsers/base_parser.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@ def __init__(self,
77
validation_errors=None,
88
dependencies=None,
99
cache_hash='',
10-
schedule=None):
10+
schedule=None,
11+
environment=None):
1112
self.path = path
1213
self.pipeline_id = pipeline_id
1314
self.pipeline_details = pipeline_details
@@ -16,6 +17,7 @@ def __init__(self,
1617
self.dependencies = [] if dependencies is None else dependencies
1718
self.cache_hash = cache_hash
1819
self.schedule = schedule
20+
self.environment = environment
1921

2022
def __str__(self):
2123
return 'PipelineSpec({}, validation_errors={}, ' \

datapackage_pipelines/specs/schemas/pipeline-spec.schema.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@
1111
"description": {
1212
"type": "string"
1313
},
14+
"environment": {
15+
"type": "object"
16+
},
1417
"schedule": {
1518
"type": "object",
1619
"properties": {
@@ -69,4 +72,4 @@
6972
}
7073
}
7174
}
72-
}
75+
}

datapackage_pipelines/specs/specs.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,12 @@ def process_schedules(spec: PipelineSpec):
3737
spec.schedule = schedule
3838

3939

40+
def process_environment(spec: PipelineSpec):
41+
if spec.environment is None:
42+
environment = spec.pipeline_details.get('environment', {})
43+
spec.environment = environment
44+
45+
4046
def find_specs(root_dir='.') -> PipelineSpec:
4147
for dirpath, dirnames, filenames in dirtools.Dir(root_dir,
4248
exclude_file='.dpp_spec_ignore',
@@ -83,6 +89,7 @@ def pipelines(prefixes=None, ignore_missing_deps=False, root_dir='.', status_man
8389

8490
resolve_processors(spec)
8591
process_schedules(spec)
92+
process_environment(spec)
8693

8794
try:
8895
hasher.calculate_hash(spec, status_manager, ignore_missing_deps)

pylama.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[pylama]
22
linters = pyflakes,mccabe,pep8
3-
ignore = E128,E301
3+
ignore = E128,E301,E741
44

55
[pylama:pep8]
66
max_line_length = 120

0 commit comments

Comments
 (0)