- 
                Notifications
    You must be signed in to change notification settings 
- Fork 6.5k
Move IP Adapter Scripts to research project #9960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    
  
     Merged
                    Changes from 9 commits
      Commits
    
    
            Show all changes
          
          
            20 commits
          
        
        Select commit
          Hold shift + click to select a range
      
      3996b29
              
                Move files to research-projects.
              
              
                ParagEkbote 14b3c1f
              
                docs: add IP Adapter training instructions
              
              
                AMohamedAakhil 2b91a16
              
                Delete venv
              
              
                AMohamedAakhil 886fc59
              
                Update examples/ip_adapter/tutorial_train_sdxl.py
              
              
                AMohamedAakhil a8968d1
              
                Cherry-picked commits and re-moved files
              
              
                ParagEkbote e84e1a9
              
                Merge branch 'main' into Move-files
              
              
                ParagEkbote 85a7583
              
                make style.
              
              
                ParagEkbote 39c3cea
              
                Merge branch 'Move-files' of https://github.com/ParagEkbote/diffusers…
              
              
                ParagEkbote 13da9cf
              
                Merge branch 'main' into Move-files
              
              
                ParagEkbote bfdf57b
              
                Update toctree and delete ip_adapter.
              
              
                ParagEkbote 9486a5e
              
                Merge branch 'Move-files' of https://github.com/ParagEkbote/diffusers…
              
              
                ParagEkbote 6f2718e
              
                Nit Fix
              
              
                ParagEkbote a2194eb
              
                Fix nit.
              
              
                ParagEkbote 4d22b0d
              
                Fix nit.
              
              
                ParagEkbote 3f0bdbb
              
                Merge branch 'main' into Move-files
              
              
                sayakpaul 9179302
              
                Create training script for single GPU and set
              
              
                ParagEkbote 8231773
              
                Add sample inference script and restore _toctree
              
              
                ParagEkbote 5d8e6d7
              
                Restore toctree.yaml
              
              
                ParagEkbote 237b17a
              
                fix spacing.
              
              
                ParagEkbote 6f64798
              
                Update toctree.yaml
              
              
                ParagEkbote File filter
Filter by extension
Conversations
          Failed to load comments.   
        
        
          
      Loading
        
  Jump to
        
          Jump to file
        
      
      
          Failed to load files.   
        
        
          
      Loading
        
  Diff view
Diff view
There are no files selected for viewing
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # IP Adapter Training Example | ||
|         
                  ParagEkbote marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
|  | ||
| [IP Adapter](https://arxiv.org/abs/2308.06721) is a novel approach designed to enhance text-to-image models such as Stable Diffusion by enabling them to generate images based on image prompts rather than text prompts alone. Unlike traditional methods that rely solely on complex text prompts, IP Adapter introduces the concept of using image prompts, leveraging the idea that "an image is worth a thousand words." By decoupling cross-attention layers for text and image features, IP Adapter effectively integrates image prompts into the generation process without the need for extensive fine-tuning or large computing resources. | ||
|  | ||
| ## Training locally with PyTorch | ||
|  | ||
| ### Installing the dependencies | ||
|  | ||
| Before running the scripts, make sure to install the library's training dependencies: | ||
|  | ||
| **Important** | ||
|  | ||
| To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: | ||
|  | ||
| ```bash | ||
| git clone https://github.com/huggingface/diffusers | ||
| cd diffusers | ||
| pip install -e . | ||
| ``` | ||
|  | ||
| Then cd in the example folder and run | ||
|  | ||
| ```bash | ||
| pip install -r requirements.txt | ||
| ``` | ||
|  | ||
| And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: | ||
|  | ||
| ```bash | ||
| accelerate config | ||
| ``` | ||
|  | ||
| Or for a default accelerate configuration without answering questions about your environment | ||
|  | ||
| ```bash | ||
| accelerate config default | ||
| ``` | ||
|  | ||
| Or if your environment doesn't support an interactive shell e.g. a notebook | ||
|  | ||
| ```python | ||
| from accelerate.utils import write_basic_config | ||
| write_basic_config() | ||
| ``` | ||
|  | ||
| Certainly! Below is the documentation in pure Markdown format: | ||
|  | ||
| ### Accelerate Launch Command Documentation | ||
|  | ||
| #### Description: | ||
| The Accelerate launch command is used to train a model using multiple GPUs and mixed precision training. It launches the training script `tutorial_train_ip-adapter.py` with specified parameters and configurations. | ||
|  | ||
| #### Usage Example: | ||
| ``` | ||
| accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \ | ||
| tutorial_train_ip-adapter.py \ | ||
| --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \ | ||
| --image_encoder_path="{image_encoder_path}" \ | ||
| --data_json_file="{data.json}" \ | ||
| --data_root_path="{image_path}" \ | ||
| --mixed_precision="fp16" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=8 \ | ||
| --dataloader_num_workers=4 \ | ||
| --learning_rate=1e-04 \ | ||
| --weight_decay=0.01 \ | ||
| --output_dir="{output_dir}" \ | ||
| --save_steps=10000 | ||
| ``` | ||
|  | ||
| #### Parameters: | ||
| - `--num_processes`: Number of processes to launch for distributed training (in this example, 8 processes). | ||
| - `--multi_gpu`: Flag indicating the usage of multiple GPUs for training. | ||
| - `--mixed_precision "fp16"`: Enables mixed precision training with 16-bit floating-point precision. | ||
| - `tutorial_train_ip-adapter.py`: Name of the training script to be executed. | ||
| - `--pretrained_model_name_or_path`: Path or identifier for a pretrained model. | ||
| - `--image_encoder_path`: Path to the CLIP image encoder. | ||
| - `--data_json_file`: Path to the training data in JSON format. | ||
| - `--data_root_path`: Root path where training images are located. | ||
| - `--resolution`: Resolution of input images (512x512 in this example). | ||
| - `--train_batch_size`: Batch size for training data (8 in this example). | ||
| - `--dataloader_num_workers`: Number of subprocesses for data loading (4 in this example). | ||
| - `--learning_rate`: Learning rate for training (1e-04 in this example). | ||
| - `--weight_decay`: Weight decay for regularization (0.01 in this example). | ||
| - `--output_dir`: Directory to save model checkpoints and predictions. | ||
| - `--save_steps`: Frequency of saving checkpoints during training (10000 in this example). | ||
|  | ||
| ### Inference | ||
|  | ||
| #### Description: | ||
| The provided inference code is used to load a trained model checkpoint and extract the components related to image projection and IP (Image Processing) adapter. These components are then saved into a binary file for later use in inference. | ||
|  | ||
| #### Usage Example: | ||
| ```python | ||
| import torch | ||
|  | ||
| # Load the trained model checkpoint | ||
| ckpt = "checkpoint-50000/pytorch_model.bin" | ||
| sd = torch.load(ckpt, map_location="cpu") | ||
|  | ||
| # Extract image projection and IP adapter components | ||
| image_proj_sd = {} | ||
| ip_sd = {} | ||
| for k in sd: | ||
| if k.startswith("unet"): | ||
| pass | ||
| elif k.startswith("image_proj_model"): | ||
| image_proj_sd[k.replace("image_proj_model.", "")] = sd[k] | ||
| elif k.startswith("adapter_modules"): | ||
| ip_sd[k.replace("adapter_modules.", "")] = sd[k] | ||
|  | ||
| # Save the components into a binary file | ||
| torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin") | ||
| ``` | ||
|  | ||
| #### Parameters: | ||
| - `ckpt`: Path to the trained model checkpoint file. | ||
| - `map_location="cpu"`: Specifies that the model should be loaded onto the CPU. | ||
| - `image_proj_sd`: Dictionary to store the components related to image projection. | ||
| - `ip_sd`: Dictionary to store the components related to the IP adapter. | ||
| - `"unet"`, `"image_proj_model"`, `"adapter_modules"`: Prefixes indicating components of the model. | ||
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,121 @@ | ||
| # IP Adapter Training Example | ||
|  | ||
| [IP Adapter](https://arxiv.org/abs/2308.06721) is a novel approach designed to enhance text-to-image models such as Stable Diffusion by enabling them to generate images based on image prompts rather than text prompts alone. Unlike traditional methods that rely solely on complex text prompts, IP Adapter introduces the concept of using image prompts, leveraging the idea that "an image is worth a thousand words." By decoupling cross-attention layers for text and image features, IP Adapter effectively integrates image prompts into the generation process without the need for extensive fine-tuning or large computing resources. | ||
|  | ||
| ## Training locally with PyTorch | ||
|  | ||
| ### Installing the dependencies | ||
|  | ||
| Before running the scripts, make sure to install the library's training dependencies: | ||
|  | ||
| **Important** | ||
|  | ||
| To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: | ||
|  | ||
| ```bash | ||
| git clone https://github.com/huggingface/diffusers | ||
| cd diffusers | ||
| pip install -e . | ||
| ``` | ||
|  | ||
| Then cd in the example folder and run | ||
|  | ||
| ```bash | ||
| pip install -r requirements.txt | ||
| ``` | ||
|  | ||
| And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: | ||
|  | ||
| ```bash | ||
| accelerate config | ||
| ``` | ||
|  | ||
| Or for a default accelerate configuration without answering questions about your environment | ||
|  | ||
| ```bash | ||
| accelerate config default | ||
| ``` | ||
|  | ||
| Or if your environment doesn't support an interactive shell e.g. a notebook | ||
|  | ||
| ```python | ||
| from accelerate.utils import write_basic_config | ||
| write_basic_config() | ||
| ``` | ||
|  | ||
| Certainly! Below is the documentation in pure Markdown format: | ||
|  | ||
| ### Accelerate Launch Command Documentation | ||
|  | ||
| #### Description: | ||
| The Accelerate launch command is used to train a model using multiple GPUs and mixed precision training. It launches the training script `tutorial_train_ip-adapter.py` with specified parameters and configurations. | ||
|  | ||
| #### Usage Example: | ||
| ``` | ||
| accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \ | ||
| tutorial_train_ip-adapter.py \ | ||
| --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \ | ||
| --image_encoder_path="{image_encoder_path}" \ | ||
| --data_json_file="{data.json}" \ | ||
| --data_root_path="{image_path}" \ | ||
| --mixed_precision="fp16" \ | ||
| --resolution=512 \ | ||
| --train_batch_size=8 \ | ||
| --dataloader_num_workers=4 \ | ||
| --learning_rate=1e-04 \ | ||
| --weight_decay=0.01 \ | ||
| --output_dir="{output_dir}" \ | ||
| --save_steps=10000 | ||
| ``` | ||
|  | ||
| #### Parameters: | ||
| - `--num_processes`: Number of processes to launch for distributed training (in this example, 8 processes). | ||
| - `--multi_gpu`: Flag indicating the usage of multiple GPUs for training. | ||
| - `--mixed_precision "fp16"`: Enables mixed precision training with 16-bit floating-point precision. | ||
| - `tutorial_train_ip-adapter.py`: Name of the training script to be executed. | ||
| - `--pretrained_model_name_or_path`: Path or identifier for a pretrained model. | ||
| - `--image_encoder_path`: Path to the CLIP image encoder. | ||
| - `--data_json_file`: Path to the training data in JSON format. | ||
| - `--data_root_path`: Root path where training images are located. | ||
| - `--resolution`: Resolution of input images (512x512 in this example). | ||
| - `--train_batch_size`: Batch size for training data (8 in this example). | ||
| - `--dataloader_num_workers`: Number of subprocesses for data loading (4 in this example). | ||
| - `--learning_rate`: Learning rate for training (1e-04 in this example). | ||
| - `--weight_decay`: Weight decay for regularization (0.01 in this example). | ||
| - `--output_dir`: Directory to save model checkpoints and predictions. | ||
| - `--save_steps`: Frequency of saving checkpoints during training (10000 in this example). | ||
|  | ||
| ### Inference | ||
|  | ||
| #### Description: | ||
| The provided inference code is used to load a trained model checkpoint and extract the components related to image projection and IP (Image Processing) adapter. These components are then saved into a binary file for later use in inference. | ||
|  | ||
| #### Usage Example: | ||
| ```python | ||
| import torch | ||
|  | ||
| # Load the trained model checkpoint | ||
| ckpt = "checkpoint-50000/pytorch_model.bin" | ||
| sd = torch.load(ckpt, map_location="cpu") | ||
|  | ||
| # Extract image projection and IP adapter components | ||
| image_proj_sd = {} | ||
| ip_sd = {} | ||
| for k in sd: | ||
| if k.startswith("unet"): | ||
| pass | ||
| elif k.startswith("image_proj_model"): | ||
| image_proj_sd[k.replace("image_proj_model.", "")] = sd[k] | ||
| elif k.startswith("adapter_modules"): | ||
| ip_sd[k.replace("adapter_modules.", "")] = sd[k] | ||
|  | ||
| # Save the components into a binary file | ||
| torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin") | ||
| ``` | ||
|  | ||
| #### Parameters: | ||
| - `ckpt`: Path to the trained model checkpoint file. | ||
| - `map_location="cpu"`: Specifies that the model should be loaded onto the CPU. | ||
| - `image_proj_sd`: Dictionary to store the components related to image projection. | ||
| - `ip_sd`: Dictionary to store the components related to the IP adapter. | ||
| - `"unet"`, `"image_proj_model"`, `"adapter_modules"`: Prefixes indicating components of the model. | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
              | Original file line number | Diff line number | Diff line change | 
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| accelerate | ||
| torchvision | ||
| transformers>=4.25.1 | ||
| ip_adapter | 
      
      Oops, something went wrong.
        
    
  
      
      Oops, something went wrong.
        
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not be changing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated toctree.yml to reset the changes.