@@ -86,7 +86,7 @@ This can be done by passing an extra dictionary to the MLPipeline when it is cre
8686 ' n_estimators' : 100
8787 }
8888 }
89- pipeline = MLPipeline(primitives, init_params)
89+ pipeline = MLPipeline(primitives, init_params = init_params )
9090
9191 This dictionary must have as keys the name of the blocks that the arguments belong to, and
9292as values the dictionary that contains the argument names and their values.
@@ -271,7 +271,7 @@ Like primitives, Pipelines can also be annotated and stored as dicts or JSON fil
271271the different arguments expected by the ``MLPipeline `` class, as well as the set hyperparameters
272272and tunable hyperparameters.
273273
274- Representing a Pipeline as a dict
274+ Representing a Pipeline as a dict
275275~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276276
277277The dict representation of an Pipeline can be obtained directly from an ``MLPipeline `` instance,
@@ -344,6 +344,86 @@ that allows loading the pipeline directly from a JSON file:
344344
345345 pipeline = MLPipeline.load(' pipeline.json' )
346346
347+
348+ Intermediate Outputs and Partial Execution
349+ ------------------------------------------
350+
351+ Sometimes we might be interested in capturing an intermediate output within a
352+ pipeline execution in order to inspect it, for debugging purposes, or to reuse
353+ it later on in order to speed up a tuning process where the pipeline needs
354+ to be executed multiple times over the same data.
355+
356+ For this, two special arguments have been included in the ``fit `` and ``predict ``
357+ methods of an MLPipeline:
358+
359+ output\_
360+ ~~~~~~~~
361+
362+ The ``output_ `` argument indicates which block within the pipeline we are interested
363+ in taking the output values from. This, implicitly, indicates up to which block the
364+ pipeline needs to be executed within ``fit `` and ``predict `` before returning.
365+
366+ The ``output_ `` argument is optional, and it can either be ``None ``, which is the default,
367+ and Integer or a String.
368+
369+ And its format is as follows:
370+
371+ * If it is ``None `` (default), the ``fit `` method will return nothing and the
372+ ``predict `` method will return the output of the last block in the pipeline.
373+ * If an integer is given, it is interpreted as the block index, starting on 0,
374+ and the whole context after executing the specified block will be returned.
375+ In case of ``fit ``, this means that the outputs will be returned after fitting
376+ a block and then producing it on the same data.
377+ * If it is a string, it can be interpreted in three ways:
378+
379+ * **block name **: If the string matches a block name exactly, including
380+ its hash and counter number ``#n `` at the end, the whole context will be
381+ returned after that block is produced.
382+ * **variable_name **: If the string does not match any block name and does
383+ not contain any dot character, ``'.' ``, it will be considered a variable
384+ name. In this case, the indicated variable will be extracted from the
385+ context and returned after the last block has been produced.
386+ * **block_name + variable_name **: If the complete string does not match a
387+ block name but it contains at least one dot, ``'.' ``, it will be split
388+ in two parts on the last dot. If the first part of the string matches a
389+ block name exactly, the second part of the string will be considered a
390+ variable name, assuming the format ``{block_name}.{variable_name} ``, and
391+ the indicated variable will be extracted from the context and returned
392+ after the block has been produced. Otherwise, if the extracted
393+ ``block_name `` does not match a block name exactly, a ``ValueError ``
394+ will be raised.
395+
396+ start\_
397+ ~~~~~~~
398+
399+ The ``start_ `` argument indicates which block within the pipeline we are interested
400+ in starting the computation from when executing ``fit `` and ``predict ``, allowing us
401+ to skip some of the initial blocks.
402+
403+ The ``start_ `` argument is optional, and it can either be ``None ``, which is the default,
404+ and Integer or a String.
405+
406+ And its format is as follows:
407+
408+ * If it is ``None ``, the execution will start on the first block.
409+ * If it is an integer, it is interpreted as the block index
410+ * If it is a string, it is expected to be the name of the block, including the counter
411+ number at the end.
412+
413+ This is specially useful when used in combination with the ``output_ `` argument, as it
414+ effectively allows us to both capture intermediate outputs for debugging purposes or
415+ reusing intermediate states of the pipeline to accelerate tuning processes.
416+
417+ An example of this situation, where we want to reuse the output of the first block, could be::
418+
419+ context_0 = pipeline.fit(X_train, y_train, output_=0)
420+
421+ # Afterwards, within the tuning loop
422+ pipeline.fit(start_=1, **context_0)
423+ predictions = pipeline.predict(X_test)
424+ score = compute_score(y_test, predictions)
425+
426+
347427.. _API Reference : ../api_reference.html
348428.. _primitives : ../primitives.html
349429.. _mlblocks.MLPipeline : ../api_reference.html#mlblocks.MLPipeline
0 commit comments