Generate your very own "Leichte Sprache" images by creating a dataset and fine-tune SDXL. This repository will lead you through the process of dataset-creation, fine-tuning, image-generation and evaluation. It consists of 4 directory:
- dataset-preparation: Scrape "Leichte Sprache" images with their descriptions and prepare folder-structure for fine-tuning. All in all 4 different variations of datasets are created.
- image-generation: Apply the LoRAs on SDXL from diffusers and generate images with the scraped image descriptions as prompts
- evaluation: Apply ImageReward and FID to your generated images
- storage: In this directory all the datasets and generated images are stored as well as an example configuration.toml for LoRA Fine-Tuning. Fell free to change the hyperparameters.
- Install necessary packages
pip install -r requirements.txt - Create the basic dataset by following the notebook at dataset-preparation/create-dataset.ipynb
- Process the basic dataset and create different variations of it by following the notebook at dataset-preparation/process-dataset.ipynb
- (optional) you can find some suggests visualization steps in dataset-preparation/visualize-dataset.ipynb
- Fill in missing absolut paths in the fine-tuning configuration file storage/basic-lora-config.toml - you can vary the hyperparameters (e.g. reduce the batch size if you cuda runs out of memory)
- Clone the fine-tuning scripts repository for stable diffusion
cd ..
git clone https://github.com/kohya-ss/sd-scripts.git
and follow the instructions in its README.md to install necessary requirements - Start fine-tuning by
python sdxl_train_network.py --config_file=<absolut-path-to>/basic-lora-config.toml
- run generate_images in image-generation/generate-images.py with the base directory from one of the four dataset varies.
It will generate an image for each test image (<dataset-path>/test) and LoRA-Checkpoint (<dataset-path>/loras). The images will be saved in <dataset-path>/output/<checkpoint-name>.
- calculate fid-score by running calculate_fid in image-evaluation/frechet-inception-distance.py with the base directroy from one of the four dataset varies.
Keep in mind that you will need to have some generated images in <dataset-path>/output/<checkpoint-name> and <dataset-path>/test-images-only/<checkpoint-name> - calculate ImageReward-score by running calculate_image_reward in image-evaluation/image-reward.py with the base directroy from one of the four dataset varies.
Keep in mind that you will need to have some generated images in <dataset-path>/output/<checkpoint-name> and <dataset-path>/test-images-only/<checkpoint-name> - (optional) check the example visualization of ImageReward and FID in image-evaluation/visualize-evaluation.ipynb
This repository has been created as part of a bachelor's thesis titled 'Leichte Sprache und generative KI: Bilder als Unterstützung für Texte in Leichter Sprache.'
The dataset names in this repository follow a slightly different naming strategy. Below is a mapping of the dataset names:
| Name in repository | Name in thesis |
|---|---|
| translated-description_random-split | RS_D |
| generated-description_random-split | RS_IC |
| translated-description_category-split | CS_D |
| generated-description_category-split | CS_IC |
The original output is available at storage/CS_D/output, storage/CS_IC/output, storage/RS_D/output & storage/RS_IC/output.