Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/styles/Kedro/ignore.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ doxenix
fsspec
globals
Globals
iPython
ipython
jupyter
Jupyter
Kaggle
namespace
namespaces
Expand Down Expand Up @@ -49,6 +53,7 @@ Claypot
ethanknights
Aneira
Printify
show_source
Pandera
Dask
Polars
Expand Down
2 changes: 1 addition & 1 deletion docs/about/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ To help the [Kedro Project maintainers](../about/technical_steering_committee.md
Kedro can capture anonymised telemetry.
This data is collected with the sole purpose of improving Kedro by understanding feature usage.
Importantly, we do not store personal information about you or sensitive data from your project,
and this process is never utilized for marketing or promotional purposes.
and this process is never utilised for marketing or promotional purposes.
Participation in this program is optional, and it is enabled by default. Kedro will continue working as normal if you opt-out.

The Kedro Project's telemetry has been reviewed and approved under the
Expand Down
2 changes: 1 addition & 1 deletion docs/api/framework/kedro.framework.cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
|------------------------------------|-----------------------------------------------------------------------------|
| [`kedro.framework.cli.catalog`](#kedro.framework.cli.catalog) | A collection of CLI commands for working with Kedro catalog. |
| [`kedro.framework.cli.cli`](#kedro.framework.cli.cli) | `kedro` is a CLI for managing Kedro projects. |
| [`kedro.framework.cli.hooks`](#kedro.framework.cli.hooks) | Provides primitives to use hooks to extend KedroCLI's behavior. |
| [`kedro.framework.cli.hooks`](#kedro.framework.cli.hooks) | Provides primitives to use hooks to extend KedroCLI's behaviour. |
| [`kedro.framework.cli.jupyter`](#kedro.framework.cli.jupyter) | A collection of helper functions to integrate with Jupyter/IPython. |
| [`kedro.framework.cli.pipeline`](#kedro.framework.cli.pipeline) | A collection of CLI commands for working with Kedro pipelines. |
| [`kedro.framework.cli.project`](#kedro.framework.cli.project) | A collection of CLI commands for working with Kedro projects. |
Expand Down
2 changes: 1 addition & 1 deletion docs/api/framework/kedro.framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
|---------------------------------|-----------------------------------------------------------------------------|
| [`kedro.framework.cli`](kedro.framework.cli.md) | Implements commands available from Kedro's CLI. |
| [`kedro.framework.context`](kedro.framework.context.md) | Provides functionality for loading Kedro project context. |
| [`kedro.framework.hooks`](kedro.framework.hooks.md) | Provides primitives to use hooks to extend KedroContext's behavior. |
| [`kedro.framework.hooks`](kedro.framework.hooks.md) | Provides primitives to use hooks to extend KedroContext's behaviour. |
| [`kedro.framework.project`](kedro.framework.project.md) | Provides utilities to configure a Kedro project and access its settings.|
| [`kedro.framework.session`](kedro.framework.session.md) | Provides access to `KedroSession` responsible for project lifecycle. |
| [`kedro.framework.startup`](kedro.framework.startup.md) | Provides metadata for a Kedro project. | |
12 changes: 6 additions & 6 deletions docs/catalog-data/advanced_data_catalog_usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- [How to configure the Data Catalog](#how-to-configure-the-data-catalog)
- [How to access datasets in the catalog](#how-to-access-datasets-in-the-catalog)
- [How to add datasets to the catalog](#how-to-add-datasets-to-the-catalog)
- [How to iterate through datasets in the catalog](#how-to-iterate-trough-datasets-in-the-catalog)
- [How to iterate through datasets in the catalog](#how-to-iterate-through-datasets-in-the-catalog)
- [How to get the number of datasets in the catalog](#how-to-get-the-number-of-datasets-in-the-catalog)
- [How to print the full catalog and individual datasets](#how-to-print-the-full-catalog-and-individual-datasets)
- [How to load datasets programmatically](#how-to-load-datasets-programmatically)
Expand Down Expand Up @@ -82,7 +82,7 @@
```

- Both methods retrieve a dataset by name from the catalog’s internal collection.
- If the dataset isn’t materialized but matches a configured pattern, it's instantiated and returned.
- If the dataset isn’t materialised but matches a configured pattern, it's instantiated and returned.
- The `.get()` method accepts:
- `fallback_to_runtime_pattern` (bool): If True, unresolved names fallback to `MemoryDataset` or `SharedMemoryDataset` (in `SharedMemoryDataCatalog`).
- `version`: Specify dataset version if versioning is enabled.
Expand All @@ -103,7 +103,7 @@
```
When raw data is added, it's automatically wrapped in a `MemoryDataset`.

## How to iterate trough datasets in the catalog
## How to iterate through datasets in the catalog

`DataCatalog` supports iteration over dataset names (keys), datasets (values), and both (items). Iteration defaults to dataset names, similar to standard Python dictionaries:

Expand Down Expand Up @@ -341,7 +341,7 @@

## How to save catalog to config

You can serialize a `DataCatalog` into configuration format (e.g., for saving to a YAML file) using `.to_config()`:
You can serialise a `DataCatalog` into configuration format (e.g., for saving to a YAML file) using `.to_config()`:

Check warning on line 344 in docs/catalog-data/advanced_data_catalog_usage.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'. Raw Output: {"message": "[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.", "location": {"path": "docs/catalog-data/advanced_data_catalog_usage.md", "range": {"start": {"line": 344, "column": 62}}}, "severity": "WARNING"}

```python
from kedro.io import DataCatalog
Expand All @@ -362,7 +362,7 @@
```

!!! note
This method only works for datasets with static, serializable parameters. For example, you can serialize credentials passed as dictionaries, but not as actual credential objects (like `google.auth.credentials.Credentials)`. In-memory datasets are excluded.
This method only works for datasets with static, serialisable parameters. For example, you can serialise credentials passed as dictionaries, but not as actual credential objects (like `google.auth.credentials.Credentials)`. In-memory datasets are excluded.

Check warning on line 365 in docs/catalog-data/advanced_data_catalog_usage.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.Spellings] Did you really mean 'serialisable'? Raw Output: {"message": "[Kedro.Spellings] Did you really mean 'serialisable'?", "location": {"path": "docs/catalog-data/advanced_data_catalog_usage.md", "range": {"start": {"line": 365, "column": 54}}}, "severity": "WARNING"}

Check warning on line 365 in docs/catalog-data/advanced_data_catalog_usage.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'only' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "docs/catalog-data/advanced_data_catalog_usage.md", "range": {"start": {"line": 365, "column": 17}}}, "severity": "WARNING"}

## How to filter catalog datasets

Expand All @@ -382,7 +382,7 @@
```

## How to get dataset type
You can check the dataset type without materializing or adding it to the catalog:
You can check the dataset type without materialising or adding it to the catalog:

```python
from kedro.io import DataCatalog, MemoryDataset
Expand Down
14 changes: 7 additions & 7 deletions docs/catalog-data/kedro_dataset_factories.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@
These patterns enable automatic creation of in-memory or shared-memory datasets during execution.

## Patterns resolution order
When the `DataCatalog` is initialized, it scans the configuration to extract and validate any dataset patterns and user catch-all pattern.
When the `DataCatalog` is initialised, it scans the configuration to extract and validate any dataset patterns and user catch-all pattern.

When resolving a dataset name, Kedro uses the following order of precedence:

Expand All @@ -104,7 +104,7 @@
A general fallback pattern (e.g., `{default_dataset}`) that is matched if no dataset patterns apply. Only one user catch-all pattern is allowed. Multiple will raise a `DatasetError`.

3. **Default runtime patterns:**
Internal fallback behavior provided by Kedro. These patterns are built-in to catalog and automatically used at runtime to create datasets (e.g., `MemoryDatase`t or `SharedMemoryDataset`) when none of the above match.
Internal fallback behaviour provided by Kedro. These patterns are built-in to catalog and automatically used at runtime to create datasets (e.g., `MemoryDatase`t or `SharedMemoryDataset`) when none of the above match.

Check warning on line 107 in docs/catalog-data/kedro_dataset_factories.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'. Raw Output: {"message": "[Kedro.abbreviations] Use 'for example' instead of abbreviations like 'e.g.,'.", "location": {"path": "docs/catalog-data/kedro_dataset_factories.md", "range": {"start": {"line": 107, "column": 141}}}, "severity": "WARNING"}

## How resolution works in practice

Expand Down Expand Up @@ -153,10 +153,10 @@
Out[2]: CSVDataset(filepath=.../data/nonexistent.csv)
```

**Default vs runtime behavior**
**Default vs runtime behaviour**

- Default behavior: `DataCatalog` resolves dataset patterns and user catch-all patterns only.
- Runtime behavior (e.g. during `kedro run`): Default runtime patterns are automatically enabled to resolve intermediate datasets not defined in `catalog.yml`.
- Default behaviour: `DataCatalog` resolves dataset patterns and user catch-all patterns only.

Check warning on line 158 in docs/catalog-data/kedro_dataset_factories.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'only' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "docs/catalog-data/kedro_dataset_factories.md", "range": {"start": {"line": 158, "column": 90}}}, "severity": "WARNING"}
- Runtime behaviour (e.g. during `kedro run`): Default runtime patterns are automatically enabled to resolve intermediate datasets not defined in `catalog.yml`.

!!! note
Enabling `fallback_to_runtime_pattern=True` is recommended only for advanced users with specific use cases. In most scenarios, Kedro handles it automatically during runtime.
Expand Down Expand Up @@ -475,7 +475,7 @@

**Why use a mixin?**

The goal was to keep pipeline logic decoupled from the core `DataCatalog`, while still providing seamless access to helpful methods utilizing pipelines.
The goal was to keep pipeline logic decoupled from the core `DataCatalog`, while still providing seamless access to helpful methods utilising pipelines.

This mixin approach allows these commands to be injected only when needed - avoiding unnecessary overhead in simpler catalog use cases.

Expand All @@ -486,7 +486,7 @@
- You're using Kedro via CLI, or
- Working inside an interactive environment (e.g. IPython, Jupyter Notebook).

Kedro automatically composes the catalog with `CatalogCommandsMixin` behind the scenes when initializing the session.
Kedro automatically composes the catalog with `CatalogCommandsMixin` behind the scenes when initialising the session.

If you're working outside a Kedro session and want to access the extra catalog commands, you have two options:

Expand Down
14 changes: 7 additions & 7 deletions docs/catalog-data/lazy_loading.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Lazy loading

From Kedro version **`0.19.10`** `DataCatalog` introduces a helper class called `_LazyDataset` to improve performance and optimize dataset loading.
From Kedro version **`0.19.10`** `DataCatalog` introduces a helper class called `_LazyDataset` to improve performance and optimise dataset loading.

## What is `_LazyDataset`?
`_LazyDataset` is a lightweight internal class that stores the configuration and versioning information of a dataset without immediately instantiating it. This allows the catalog to defer actual dataset creation (also called materialization) until it is explicitly accessed.
This approach reduces startup overhead, especially when working with large catalogs, since only the datasets you actually use are initialized.
`_LazyDataset` is a lightweight internal class that stores the configuration and versioning information of a dataset without immediately instantiating it. This allows the catalog to defer actual dataset creation (also called materialisation) until it is explicitly accessed.

Check warning on line 6 in docs/catalog-data/lazy_loading.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'immediately' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'immediately' is a weasel word!", "location": {"path": "docs/catalog-data/lazy_loading.md", "range": {"start": {"line": 6, "column": 126}}}, "severity": "WARNING"}
This approach reduces startup overhead, especially when working with large catalogs, since only the datasets you actually use are initialised.

Check warning on line 7 in docs/catalog-data/lazy_loading.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'only' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'only' is a weasel word!", "location": {"path": "docs/catalog-data/lazy_loading.md", "range": {"start": {"line": 7, "column": 92}}}, "severity": "WARNING"}

Check warning on line 7 in docs/catalog-data/lazy_loading.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'especially' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'especially' is a weasel word!", "location": {"path": "docs/catalog-data/lazy_loading.md", "range": {"start": {"line": 7, "column": 41}}}, "severity": "WARNING"}

## When is `_LazyDataset` used?
When you instantiate a `DataCatalog` from a config file (such as `catalog.yml`), Kedro doesn't immediately create all the underlying dataset objects. Instead, it wraps each dataset in a `_LazyDataset` and registers it in the catalog.
These placeholders are automatically materialized when a dataset is accessed for the first time-either directly or during pipeline execution.
These placeholders are automatically materialised when a dataset is accessed for the first time-either directly or during pipeline execution.

```bash
In [1]: catalog
Expand All @@ -27,7 +27,7 @@
writer_args={'engine': 'openpyxl'}
)

# Accessing the dataset triggers materialization.
# Accessing the dataset triggers materialisation.

In [3]: catalog
Out[3]: {
Expand All @@ -42,10 +42,10 @@
```

## When is this useful?
This lazy loading mechanism is especially beneficial before runtime, during the warm-up phase of a pipeline. You can force materialization of all datasets early on to:
This lazy loading mechanism is especially beneficial before runtime, during the warm-up phase of a pipeline. You can force materialisation of all datasets early on to:

Check warning on line 45 in docs/catalog-data/lazy_loading.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.weaselwords] 'especially' is a weasel word! Raw Output: {"message": "[Kedro.weaselwords] 'especially' is a weasel word!", "location": {"path": "docs/catalog-data/lazy_loading.md", "range": {"start": {"line": 45, "column": 32}}}, "severity": "WARNING"}

- Catch configuration or import errors
- Validate external dependencies
- Ensure all datasets can be created before execution begins

Although `_LazyDataset` is not exposed to end users and doesn't affect your usual catalog usage, it's a useful concept to understand when debugging catalog behavior or troubleshooting dataset instantiation issues.
Although `_LazyDataset` is not exposed to end users and doesn't affect your usual catalog usage, it's a useful concept to understand when debugging catalog behaviour or troubleshooting dataset instantiation issues.

Check warning on line 51 in docs/catalog-data/lazy_loading.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer. Raw Output: {"message": "[Kedro.sentencelength] Try to keep your sentence length to 30 words or fewer.", "location": {"path": "docs/catalog-data/lazy_loading.md", "range": {"start": {"line": 51, "column": 1}}}, "severity": "INFO"}
4 changes: 2 additions & 2 deletions docs/create/minimal_kedro_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ This informs Kedro where to look for the source code, `settings.py` and `pipelin
The `settings.py` file is an important configuration file in a Kedro project that allows you to define various settings and hooks for your project. Here’s a breakdown of its purpose and functionality:
- Project Settings: This file is where you can configure project-wide settings, such as defining the logging level, setting environment variables, or specifying paths for data and outputs.
- Hooks Registration: You can register custom hooks in `settings.py`, which are functions that can be executed at specific points in the Kedro pipeline lifecycle (e.g., before or after a node runs). This is useful for adding additional functionality, such as logging or monitoring.
- Integration with Plugins: If you are using Kedro plugins, `settings.py` can also be utilized to configure them appropriately.
- Integration with Plugins: If you are using Kedro plugins, `settings.py` can also be utilised to configure them appropriately.

Even if you do not have any settings, an empty `settings.py` is still required. Typically, they are stored at `src/<package_name>/settings.py`.

#### `pipeline_registry.py`
The `pipeline_registry.py` file is essential for managing the pipelines within your Kedro project. It provides a centralized way to register and access all pipelines defined in the project. Here are its key features:
The `pipeline_registry.py` file is essential for managing the pipelines within your Kedro project. It provides a centralised way to register and access all pipelines defined in the project. Here are its key features:
- Pipeline Registration: The file must contain a top-level function called `register_pipelines()` that returns a mapping from pipeline names to Pipeline objects. This function is crucial because it enables the Kedro CLI and other tools to discover and run the defined pipelines.
- Autodiscovery of Pipelines: Since Kedro 0.18.3, you can use the [`find_pipeline`](../build/pipeline_registry.md#pipeline-autodiscovery) function to automatically discover pipelines defined in your project without manually updating the registry each time you create a new pipeline.

Expand Down
2 changes: 1 addition & 1 deletion docs/create/new_project.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ kedro run
```

```{warning}
`kedro run` requires at least one pipeline with nodes. Please define a pipeline before running this command and ensure it is registred in `pipeline_registry.py`.
`kedro run` requires at least one pipeline with nodes. Please define a pipeline before running this command and ensure it is registered in `pipeline_registry.py`.
```

## Visualise a Kedro project
Expand Down
2 changes: 1 addition & 1 deletion docs/deploy/distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ We encourage you to play with different ways of parameterising your runs as you

## 4. (Optional) Create starters

You may opt to [build your own Kedro starter](../tutorials/settings.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to re-use any deployment scripts written as part of step 2.
You may opt to [build your own Kedro starter](../tutorials/settings.md) if you regularly have to deploy in a similar environment or to a similar platform. The starter enables you to reuse any deployment scripts written as part of step 2.
8 changes: 4 additions & 4 deletions docs/deploy/supported-platforms/amazon_emr_serverless.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ can be used to manage and execute distributed computing workloads using Apache S
independently allocates the resources needed for each job and releases them at completion.

EMR Serverless is typically used for pipelines that are either fully or partially dependent on PySpark.
For other parts of the pipeline such as modeling, where a non-distributed computing approach may be suitable, EMR Serverless might not be needed.
For other parts of the pipeline such as modelling, where a non-distributed computing approach may be suitable, EMR Serverless might not be needed.


## Context
Expand Down Expand Up @@ -58,7 +58,7 @@ ENV PYENV_ROOT /usr/.pyenv
ENV PATH $PYENV_ROOT/shims:$PYENV_ROOT/bin:$PATH
ENV PYTHON_VERSION=3.9.16

# Install pyenv, initialize it, install desired Python version and set as global
# Install pyenv, initialise it, install desired Python version and set as global
RUN curl https://pyenv.run | bash
RUN eval "$(pyenv init -)"
RUN pyenv install ${PYTHON_VERSION} && pyenv global ${PYTHON_VERSION}
Expand Down Expand Up @@ -94,7 +94,7 @@ For more details, see the following resources:

- [Package a Kedro project](https://docs.kedro.org/en/stable/deploy/package_a_project/#package-a-kedro-project)
- [Run a packaged project](https://docs.kedro.org/en/stable/deploy/package_a_project/#run-a-packaged-project)
- [Customizing an EMR Serverless image](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-custom-image.html)
- [Customising an EMR Serverless image](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-custom-image.html)
- [Using custom images with EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-custom-images.html)

## Setup
Expand Down Expand Up @@ -154,7 +154,7 @@ for more details.
!!! note
On making changes to the custom image, and rebuilding and pushing to ECR, be sure to restart the EMR Serverless application before submitting a job if your application is **already started**. Otherwise, new changes may not be reflected in the job run.

This may be due to the fact that when the application has started, EMR Serverless keeps a pool of warm resources (also referred to as [pre-initialized capacity](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/pre-init-capacity.html)) ready to run a job, and the nodes may have already used the previous version of the ECR image.
This may be due to the fact that when the application has started, EMR Serverless keeps a pool of warm resources (also referred to as [pre-initialised capacity](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/pre-init-capacity.html)) ready to run a job, and the nodes may have already used the previous version of the ECR image.

See details on [how to run a Spark job on EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/jobs-spark.html).

Expand Down
2 changes: 1 addition & 1 deletion docs/deploy/supported-platforms/kubeflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Here are the main reasons to use Kubeflow Pipelines:

- It is cloud-agnostic and can run on any Kubernetes cluster
- Kubeflow is tailored towards machine learning workflows for model deployment, experiment tracking, and hyperparameter tuning
- You can re-use components and pipelines to create E2E solutions
- You can reuse components and pipelines to create E2E solutions


## The `kedro-kubeflow` plugin
Expand Down
4 changes: 2 additions & 2 deletions docs/deploy/supported-platforms/prefect.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ def my_flow(pipeline_name: str, env: str):
metadata = bootstrap_project(project_path)
logger.info("Project name: %s", metadata.project_name)

logger.info("Initializing Kedro...")
logger.info("Initialising Kedro...")
execution_config = kedro_init(
pipeline_name=pipeline_name, project_path=project_path, env=env
)
Expand All @@ -142,7 +142,7 @@ def kedro_init(
env: str,
):
"""
Initializes a Kedro session and returns the DataCatalog and
Initialises a Kedro session and returns the DataCatalog and
KedroSession
"""
# bootstrap project within task / flow scope
Expand Down
Loading
Loading