- Added a new CLI command
kedro jupyter convertto facilitate converting Jupyter notebook cells into Kedro nodes. - Added
KedroContextbase class which holds the configuration and Kedro's main functionality (catalog, pipeline, config). - Added a new I/O module
ParquetS3DataSetincontribfor usage with Pandas. (by @mmchougule) - Added a new
--nodeflag tokedro run, allowing users to run only the nodes with the specified names. - Added
CSVHTTPDataSetto load CSV using HTTP(s) links. - Added new
--from-nodesand--to-nodesrun arguments, allowing users to run a range of nodes from the pipeline. - Added prefix
params:to the parameters specified inparameters.ymlwhich allows users to differentiate between their different parameters node inputs and outputs - Added
JSONBlobDataSetto load json (-delimited) files from Azure Blob Storage - Jupyter Lab/Notebook now starts with only one kernel by default.
- Documentation improvements including instructions on how to initialise a Spark session using YAML configuration.
anyconfigdefault log level changed fromINFOtoWARNING.- Added information on installed plugins to
kedro info. - Added style sheets for project documentation, so the output of
kedro build-docswill resemble the style ofkedro docs.
- Simplified the Kedro template in
run.pywith the introduction ofKedroContextclass. - Merged
FilepathVersionMixInandS3VersionMixInunder one abstract classAbstractVersionedDataSetwhich extendsAbstractDataSet. namechanged to be a keyword-only argument forPipeline.CSVLocalDataSetno longer supports URLs.
This guide assumes that:
- The framework specific code has not been altered significantly
- Your project specific code is stored in the dedicated python package under
src/.
The breaking changes were introduced in the following project template files:
<project-name>/.ipython/profile_default/startup/00-kedro-init.py<project-name>/kedro_cli.py<project-name>/src/tests/test_run.py<project-name>/src/<package-name>/run.py
The easiest way to migrate your project from Kedro 0.14.* to Kedro 0.15.0 is to create a new project (by using kedro new) and move code and files bit by bit as suggested in the detailed guide below:
-
Create a new project with the same name by running
kedro new -
Copy the following folders to the new project:
results/references/notebooks/logs/data/conf/
- If you customised your
src/<package>/run.py, make sure you apply the same customisations tosrc/<package>/run.py
- If you customised
get_config(), you can override_create_config()method inProjectContextderived class - If you customised
create_catalog(), you can override_create_catalog()method inProjectContextderived class - If you customised
run(), you can overriderun()method inProjectContextderived class - If you customised default
env, you can override it inProjectContextderived class or pass it at construction. By default,envislocal. - If you customised default
root_conf, you can overrideCONF_ROOTattribute inProjectContextderived class. By default,KedroContextbase class hasCONF_ROOTattribute set toconf.
- The following syntax changes are introduced in ipython or Jupyter notebook/labs:
proj_dir->context.project_pathproj_name->context.project_nameconf->context.config_loader.io->context.catalog(e.g.,io.load()->context.catalog.load())
- If you customised your
kedro_cli.py, you need to apply the same customisations to yourkedro_cli.pyin the new project.
If you defined any custom dataset classes which support versioning in your project, you need to apply the following changes:
- Make sure your dataset inherits from
AbstractVersionedDataSetonly. - Call
super().__init__()with the appropriate arguments in the dataset's__init__. If storing on local filesystem, providing the filepath and the version is enough. Otherwise, you should also pass in anexists_functionand aglob_functionthat emulateexistsandglobin a different filesystem (seeCSVS3DataSetas an example). - Remove setting of the
_filepathand_versionattributes in the dataset's__init__, as this is taken care of in the base abstract class. - Any calls to
_get_load_pathand_get_save_pathmethods should take no arguments. - Ensure you convert the output of
_get_load_pathand_get_save_pathappropriately, as these now returnPurePaths instead of strings. - Make sure
_check_paths_consistencyis called withPurePaths as input arguments, instead of strings.
These steps should have brought your project to Kedro 0.15.0. There might be some more minor tweaks needed as every project is unique, but now you have a pretty solid base to work with. If you run into any problems, please consult the Kedro documentation.
Dmitry Vukolov, Jo Stichbury, Angus Williams, Deepyaman Datta, Mayur Chougule, Marat Kopytjuk
- Tab completion for catalog datasets in
ipythonorjupytersessions. (Thank you @datajoely and @WaylonWalker) - Added support for transcoding, an ability to decouple loading/saving mechanisms of a dataset from its storage location, denoted by adding '@' to the dataset name.
- Datasets have a new
releasefunction that instructs them to free any cached data. The runners will call this when the dataset is no longer needed downstream.
- Add support for pipeline nodes made up from partial functions.
- Expand user home directory
~for TextLocalDataSet (see issue #19). - Add a
short_nameproperty toNodes for a display-friendly (but not necessarily unique) name. - Add Kedro project loader for IPython:
extras/kedro_project_loader.py. - Fix source file encoding issues with Python 3.5 on Windows.
- Fix local project source not having priority over the same source installed as a package, leading to local updates not being recognised.
- Remove the max_loads argument from the
MemoryDataSetconstructor and from theAbstractRunner.create_default_data_setmethod.
Joel Schwarzmann, Alex Kalmikov
- Added Data Set transformer support in the form of AbstractTransformer and DataCatalog.add_transformer.
- Merged the
ExistsMixinintoAbstractDataSet. Pipeline.node_dependenciesreturns a dictionary keyed by node, with sets of parent nodes as values;PipelineandParallelRunnerwere refactored to make use of this for topological sort for node dependency resolution and running pipelines respectively.Pipeline.grouped_nodesreturns a list of sets, rather than a list of lists.
- New I/O module
HDFS3DataSet.
- Improved API docs.
- Template
run.pywill throw a warning instead of error ifcredentials.ymlis not present.
None
The initial release of Kedro.
Nikolaos Tsaousis, Ivan Danov, Dmitrii Deriabin, Gordon Wrigley, Yetunde Dada, Nasef Khan, Kiyohito Kunii, Nikolaos Kaltsas, Meisam Emamjome, Peteris Erins, Lorena Balan, Richard Westenra
Jo Stichbury, Aris Valtazanos, Fabian Peters, Guilherme Braccialli, Joel Schwarzmann, Miguel Beltre, Mohammed ElNabawy, Deepyaman Datta, Shubham Agrawal, Oleg Andreyev, Mayur Chougule, William Ashford, Ed Cannon, Nikhilesh Nukala, Sean Bailey, Vikram Tegginamath, Thomas Huijskens, Musa Bilal
We are also grateful to everyone who advised and supported us, filed issues or helped resolve them, asked and answered questions and were part of inspiring discussions.