-
Couldn't load subscription status.
- Fork 6.4k
[Modular] Add Custom Blocks guide to doc #12339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
DN6
wants to merge
13
commits into
main
Choose a base branch
from
custom-blocks-guide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+335
−0
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
e3f111a
update
DN6 c67dda4
update
DN6 ed3f885
Update docs/source/en/modular_diffusers/custom_blocks.md
DN6 830603e
Update docs/source/en/modular_diffusers/custom_blocks.md
DN6 180c9ea
Update docs/source/en/_toctree.yml
DN6 e78aa54
Update docs/source/en/modular_diffusers/custom_blocks.md
DN6 ddaf986
Apply suggestion from @stevhliu
DN6 df67d52
Apply suggestion from @stevhliu
DN6 c194bf1
update
DN6 a4815ab
update
DN6 c918079
update
DN6 28d3856
Apply suggestion from @stevhliu
DN6 331a7a1
Apply suggestion from @stevhliu
DN6 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,333 @@ | ||
| <!--Copyright 2025 The HuggingFace Team. All rights reserved. | ||
|
|
||
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
| the License. You may obtain a copy of the License at | ||
|
|
||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
| specific language governing permissions and limitations under the License. | ||
| --> | ||
|
|
||
|
|
||
| # Building custom blocks | ||
|
|
||
| [ModularPipelineBlocks](./pipeline_block) are the fundamental building blocks for a [`ModularPipeline`]. As long as they contain the appropriate inputs, outputs, and computation logic, you can customize these blocks to create custom blocks. | ||
|
|
||
| This guide will show you how to create and use a custom block. | ||
|
|
||
| The project should be structured as shown below. The custom block code is contained in `block.py` and the `modular_config.json` file contains metadata for loading the block. | ||
|
|
||
| ```shell | ||
| . | ||
| ├── block.py | ||
| └── modular_config.json | ||
| ``` | ||
|
|
||
| This example creates a custom block that uses the [Florence 2](https://huggingface.co/docs/transformers/model_doc/florence2) model to process an input image and generate a mask for inpainting. | ||
|
|
||
| ```py | ||
| from typing import List, Union | ||
| from PIL import Image, ImageDraw | ||
| import torch | ||
| import numpy as np | ||
|
|
||
| from diffusers.modular_pipelines import ( | ||
| PipelineState, | ||
| ModularPipelineBlocks, | ||
| InputParam, | ||
| ComponentSpec, | ||
| OutputParam, | ||
| ) | ||
| from transformers import AutoProcessor, Florence2ForConditionalGeneration | ||
|
|
||
|
|
||
| class Florence2ImageAnnotatorBlock(ModularPipelineBlocks): | ||
| @property | ||
| def expected_components(self): | ||
| return [ | ||
| ComponentSpec( | ||
| name="image_annotator", | ||
| type_hint=Florence2ForConditionalGeneration, | ||
| repo="florence-community/Florence-2-base-ft", | ||
| ), | ||
| ComponentSpec( | ||
| name="image_annotator_processor", | ||
| type_hint=AutoProcessor, | ||
| repo="florence-community/Florence-2-base-ft", | ||
| ), | ||
| ] | ||
|
|
||
| @property | ||
| def inputs(self) -> List[InputParam]: | ||
| return [ | ||
| InputParam( | ||
| "image", | ||
| type_hint=Union[Image.Image, List[Image.Image]], | ||
| required=True, | ||
| description="Image(s) to annotate", | ||
| ), | ||
| InputParam( | ||
| "annotation_task", | ||
| type_hint=Union[str, List[str]], | ||
| required=True, | ||
| default="<REFERRING_EXPRESSION_SEGMENTATION>", | ||
| description="""Annotation Task to perform on the image. | ||
| Supported Tasks: | ||
|
|
||
| <OD> | ||
| <REFERRING_EXPRESSION_SEGMENTATION> | ||
| <CAPTION> | ||
| <DETAILED_CAPTION> | ||
| <MORE_DETAILED_CAPTION> | ||
| <DENSE_REGION_CAPTION> | ||
| <CAPTION_TO_PHRASE_GROUNDING> | ||
| <OPEN_VOCABULARY_DETECTION> | ||
|
|
||
| """, | ||
| ), | ||
| InputParam( | ||
| "annotation_prompt", | ||
| type_hint=Union[str, List[str]], | ||
| required=True, | ||
| description="""Annotation Prompt to provide more context to the task. | ||
| Can be used to detect or segment out specific elements in the image | ||
| """, | ||
| ), | ||
| InputParam( | ||
| "annotation_output_type", | ||
| type_hint=str, | ||
| required=True, | ||
| default="mask_image", | ||
| description="""Output type from annotation predictions. Availabe options are | ||
| mask_image: | ||
| -black and white mask image for the given image based on the task type | ||
| mask_overlay: | ||
| - mask overlayed on the original image | ||
| bounding_box: | ||
| - bounding boxes drawn on the original image | ||
| """, | ||
| ), | ||
| InputParam( | ||
| "annotation_overlay", | ||
| type_hint=bool, | ||
| required=True, | ||
| default=False, | ||
| description="", | ||
| ), | ||
| ] | ||
|
|
||
| @property | ||
| def intermediate_outputs(self) -> List[OutputParam]: | ||
| return [ | ||
| OutputParam( | ||
| "mask_image", | ||
| type_hint=Image, | ||
| description="Inpainting Mask for input Image(s)", | ||
| ), | ||
| OutputParam( | ||
| "annotations", | ||
| type_hint=dict, | ||
| description="Annotations Predictions for input Image(s)", | ||
| ), | ||
| OutputParam( | ||
| "image", | ||
| type_hint=Image, | ||
| description="Annotated input Image(s)", | ||
| ), | ||
| ] | ||
|
|
||
| def get_annotations(self, components, images, prompts, task): | ||
| task_prompts = [task + prompt for prompt in prompts] | ||
|
|
||
| inputs = components.image_annotator_processor( | ||
| text=task_prompts, images=images, return_tensors="pt" | ||
| ).to(components.image_annotator.device, components.image_annotator.dtype) | ||
|
|
||
| generated_ids = components.image_annotator.generate( | ||
| input_ids=inputs["input_ids"], | ||
| pixel_values=inputs["pixel_values"], | ||
| max_new_tokens=1024, | ||
| early_stopping=False, | ||
| do_sample=False, | ||
| num_beams=3, | ||
| ) | ||
| annotations = components.image_annotator_processor.batch_decode( | ||
| generated_ids, skip_special_tokens=False | ||
| ) | ||
| outputs = [] | ||
| for image, annotation in zip(images, annotations): | ||
| outputs.append( | ||
| components.image_annotator_processor.post_process_generation( | ||
| annotation, task=task, image_size=(image.width, image.height) | ||
| ) | ||
| ) | ||
| return outputs | ||
|
|
||
| def prepare_mask(self, images, annotations, overlay=False, fill="white"): | ||
| masks = [] | ||
| for image, annotation in zip(images, annotations): | ||
| mask_image = image.copy() if overlay else Image.new("L", image.size, 0) | ||
| draw = ImageDraw.Draw(mask_image) | ||
|
|
||
| for _, _annotation in annotation.items(): | ||
| if "polygons" in _annotation: | ||
| for polygon in _annotation["polygons"]: | ||
| polygon = np.array(polygon).reshape(-1, 2) | ||
| if len(polygon) < 3: | ||
| continue | ||
| polygon = polygon.reshape(-1).tolist() | ||
| draw.polygon(polygon, fill=fill) | ||
|
|
||
| elif "bbox" in _annotation: | ||
| bbox = _annotation["bbox"] | ||
| draw.rectangle(bbox, fill="white") | ||
|
|
||
| masks.append(mask_image) | ||
|
|
||
| return masks | ||
|
|
||
| def prepare_bounding_boxes(self, images, annotations): | ||
| outputs = [] | ||
| for image, annotation in zip(images, annotations): | ||
| image_copy = image.copy() | ||
| draw = ImageDraw.Draw(image_copy) | ||
| for _, _annotation in annotation.items(): | ||
| bbox = _annotation["bbox"] | ||
| label = _annotation["label"] | ||
|
|
||
| draw.rectangle(bbox, outline="red", width=3) | ||
| draw.text((bbox[0], bbox[1] - 20), label, fill="red") | ||
|
|
||
| outputs.append(image_copy) | ||
|
|
||
| return outputs | ||
|
|
||
| def prepare_inputs(self, images, prompts): | ||
| prompts = prompts or "" | ||
|
|
||
| if isinstance(images, Image.Image): | ||
| images = [images] | ||
| if isinstance(prompts, str): | ||
| prompts = [prompts] | ||
|
|
||
| if len(images) != len(prompts): | ||
| raise ValueError("Number of images and annotation prompts must match.") | ||
|
|
||
| return images, prompts | ||
|
|
||
| @torch.no_grad() | ||
| def __call__(self, components, state: PipelineState) -> PipelineState: | ||
| block_state = self.get_block_state(state) | ||
| images, annotation_task_prompt = self.prepare_inputs( | ||
| block_state.image, block_state.annotation_prompt | ||
| ) | ||
| task = block_state.annotation_task | ||
| fill = block_state.fill | ||
|
|
||
| annotations = self.get_annotations( | ||
| components, images, annotation_task_prompt, task | ||
| ) | ||
| block_state.annotations = annotations | ||
| if block_state.annotation_output_type == "mask_image": | ||
| block_state.mask_image = self.prepare_mask(images, annotations) | ||
| else: | ||
| block_state.mask_image = None | ||
|
|
||
| if block_state.annotation_output_type == "mask_overlay": | ||
| block_state.image = self.prepare_mask(images, annotations, overlay=True, fill=fill) | ||
|
|
||
| elif block_state.annotation_output_type == "bounding_box": | ||
| block_state.image = self.prepare_bounding_boxes(images, annotations) | ||
|
|
||
| self.set_block_state(state, block_state) | ||
|
|
||
| return components, state | ||
| ``` | ||
|
|
||
| Save the custom block to the Hub, from either the CLI or with the [`push_to_hub`] method, so you can easily share and reuse it. | ||
|
|
||
| <hfoptions id="share"> | ||
| <hfoption id="hf CLI"> | ||
|
|
||
| ```shell | ||
| # In the folder with the `block.py` file, run: | ||
| diffusers-cli custom_block | ||
| ``` | ||
|
|
||
| Then upload the block to the Hub: | ||
|
|
||
| ```shell | ||
| hf upload <your repo id> . . | ||
| ``` | ||
| </hfoption> | ||
| <hfoption id="push_to_hub"> | ||
|
|
||
| ```py | ||
| from block import Florence2ImageAnnotatorBlock | ||
| block = Florence2ImageAnnotatorBlock() | ||
| block.push_to_hub("<your repo id>") | ||
| ``` | ||
|
|
||
| </hfoption> | ||
| </hfoptions> | ||
|
|
||
| Load the custom block with [`~ModularPipelineBlocks.from_pretrained`] and set `trust_remote_code=True`. | ||
|
|
||
| ```py | ||
| import torch | ||
| from diffusers.modular_pipelines import ModularPipelineBlocks, SequentialPipelineBlocks | ||
| from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS | ||
| from diffusers.utils import load_image | ||
|
|
||
| # Fetch the Florence2 image annotator block that will create our mask | ||
| image_annotator_block = ModularPipelineBlocks.from_pretrained("diffusers/florence-2-custom-block", }}}trust_remote_code=True) | ||
|
|
||
| my_blocks = INPAINT_BLOCKS.copy() | ||
| # insert the annotation block before the image encoding step | ||
| my_blocks.insert("image_annotator", image_annotator_block, 1) | ||
|
|
||
| # Create our initial set of inpainting blocks | ||
| blocks = SequentialPipelineBlocks.from_blocks_dict(my_blocks) | ||
|
|
||
| repo_id = "diffusers/modular-stable-diffusion-xl-base-1.0" | ||
| pipe = blocks.init_pipeline(repo_id) | ||
| pipe.load_components(torch_dtype=torch.float16, device_map="cuda", trust_remote_code=True) | ||
|
|
||
| image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true") | ||
| image = image.resize((1024, 1024)) | ||
|
|
||
| prompt = ["A red car"] | ||
| annotation_task = "<REFERRING_EXPRESSION_SEGMENTATION>" | ||
| annotation_prompt = ["the car"] | ||
|
|
||
| output = pipe( | ||
| prompt=prompt, | ||
| image=image, | ||
| annotation_task=annotation_task, | ||
| annotation_prompt=annotation_prompt, | ||
| annotation_output_type="mask_image", | ||
| num_inference_steps=35, | ||
| guidance_scale=7.5, | ||
| strength=0.95, | ||
| output="images" | ||
| ) | ||
| output[0].save("florence-inpainting.png") | ||
| ``` | ||
|
|
||
| ## Editing Custom Blocks | ||
|
|
||
| By default, custom blocks are saved in your cache directory. To download and edit a custom block you can use the `local_dir` argument to save the block to a specific folder. | ||
|
|
||
| ```py | ||
| import torch | ||
| from diffusers.modular_pipelines import ModularPipelineBlocks, SequentialPipelineBlocks | ||
| from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS | ||
| from diffusers.utils import load_image | ||
|
|
||
| # Fetch the Florence2 image annotator block that will create our mask | ||
| image_annotator_block = ModularPipelineBlocks.from_pretrained("diffusers/florence-2-custom-block", trust_remote_code=True, local_dir="/my-local-folder") | ||
| ``` | ||
|
|
||
| Any changes made to the block files to the blocks in this file will be reflected when you load the block again. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to break up this big code example into several smaller chunks that explain whats happening. Maybe break it up into:
which also nicely mirrors https://huggingface.co/docs/diffusers/main/en/modular_diffusers/pipeline_block#inputs-and-outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would explanations using inline comments in the code work? Reason I wanted to keep it as a single block was to make it easier to just copy and use as a template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i see! in that case, feel free to keep it as a single block but i would at least still add some text before it that very briefly explains whats going on. maybe even something like the below should be fine and you won't even need to add inline comments :)
The example below creates a custom block with the Florence 2 model. It accepts several inputs like image, annotation task type, prompt, output type and and outputs a mask image.