Skip to content

Commit 5865b5e

Browse files
authored
refactor: split SDParams to SDCliParams/SDContextParams/SDGenerationParams (leejet#1032)
1 parent edf2cb3 commit 5865b5e

File tree

4 files changed

+1556
-1412
lines changed

4 files changed

+1556
-1412
lines changed

examples/cli/README.md

Lines changed: 57 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,21 @@
33
```
44
usage: ./bin/sd [options]
55
6-
Options:
6+
CLI Options:
7+
-o, --output <string> path to write result image to (default: ./output.png)
8+
--preview-path <string> path to write preview image to (default: ./preview.png)
9+
--preview-interval <int> interval in denoising steps between consecutive updates of the image preview file (default is 1, meaning updating at
10+
every step)
11+
--canny apply canny preprocessor (edge detection)
12+
-v, --verbose print extra info
13+
--color colors the logging tags according to level
14+
--taesd-preview-only prevents usage of taesd for decoding the final image. (for use with --preview tae)
15+
--preview-noisy enables previewing noisy inputs of the models rather than the denoised outputs
16+
-M, --mode run mode, one of [img_gen, vid_gen, upscale, convert], default: img_gen
17+
--preview preview method. must be one of the following [none, proj, tae, vae] (default is none)
18+
-h, --help show this help message and exit
19+
20+
Context Options:
721
-m, --model <string> path to full model
822
--clip_l <string> path to the clip-l text encoder
923
--clip_g <string> path to the clip-g text encoder
@@ -20,39 +34,64 @@ Options:
2034
--control-net <string> path to control net model
2135
--embd-dir <string> embeddings directory
2236
--lora-model-dir <string> lora model directory
23-
-i, --init-img <string> path to the init image
24-
--end-img <string> path to the end image, required by flf2v
2537
--tensor-type-rules <string> weight type per tensor pattern (example: "^vae\.=f16,model\.=q8_0")
2638
--photo-maker <string> path to PHOTOMAKER model
27-
--pm-id-images-dir <string> path to PHOTOMAKER input id images dir
28-
--pm-id-embed-path <string> path to PHOTOMAKER v2 id embed
39+
--upscale-model <string> path to esrgan model.
40+
-t, --threads <int> number of threads to use during computation (default: -1). If threads <= 0, then threads will be set to the number of
41+
CPU physical cores
42+
--chroma-t5-mask-pad <int> t5 mask pad size of chroma
43+
--vae-tile-overlap <float> tile overlap for vae tiling, in fraction of tile size (default: 0.5)
44+
--flow-shift <float> shift value for Flow models like SD3.x or WAN (default: auto)
45+
--vae-tiling process vae in tiles to reduce memory usage
46+
--force-sdxl-vae-conv-scale force use of conv scale on sdxl vae
47+
--offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM when needed
48+
--control-net-cpu keep controlnet in cpu (for low vram)
49+
--clip-on-cpu keep clip in cpu (for low vram)
50+
--vae-on-cpu keep vae in cpu (for low vram)
51+
--diffusion-fa use flash attention in the diffusion model
52+
--diffusion-conv-direct use ggml_conv2d_direct in the diffusion model
53+
--vae-conv-direct use ggml_conv2d_direct in the vae model
54+
--chroma-disable-dit-mask disable dit mask for chroma
55+
--chroma-enable-t5-mask enable t5 mask for chroma
56+
--type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K). If not specified, the default is the
57+
type of the weight file
58+
--rng RNG, one of [std_default, cuda, cpu], default: cuda(sd-webui), cpu(comfyui)
59+
--sampler-rng sampler RNG, one of [std_default, cuda, cpu]. If not specified, use --rng
60+
--prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow, flux2_flow]
61+
--lora-apply-mode the way to apply LoRA, one of [auto, immediately, at_runtime], default is auto. In auto mode, if the model weights
62+
contain any quantized parameters, the at_runtime mode will be used; otherwise,
63+
immediately will be used.The immediately mode may have precision and
64+
compatibility issues with quantized parameters, but it usually offers faster inference
65+
speed and, in some cases, lower memory usage. The at_runtime mode, on the
66+
other hand, is exactly the opposite.
67+
--vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32)
68+
--vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size if < 1, in number of tiles per dim if >=1
69+
(overrides --vae-tile-size)
70+
71+
Generation Options:
72+
-p, --prompt <string> the prompt to render
73+
-n, --negative-prompt <string> the negative prompt (default: "")
74+
-i, --init-img <string> path to the init image
75+
--end-img <string> path to the end image, required by flf2v
2976
--mask <string> path to the mask image
3077
--control-image <string> path to control image, control net
3178
--control-video <string> path to control video frames, It must be a directory path. The video frames inside should be stored as images in
3279
lexicographical (character) order. For example, if the control video path is
3380
`frames`, the directory contain images such as 00.png, 01.png, ... etc.
34-
-o, --output <string> path to write result image to (default: ./output.png)
35-
-p, --prompt <string> the prompt to render
36-
-n, --negative-prompt <string> the negative prompt (default: "")
37-
--preview-path <string> path to write preview image to (default: ./preview.png)
38-
--upscale-model <string> path to esrgan model.
39-
-t, --threads <int> number of threads to use during computation (default: -1). If threads <= 0, then threads will be set to the number of
40-
CPU physical cores
41-
--upscale-repeats <int> Run the ESRGAN upscaler this many times (default: 1)
81+
--pm-id-images-dir <string> path to PHOTOMAKER input id images dir
82+
--pm-id-embed-path <string> path to PHOTOMAKER v2 id embed
4283
-H, --height <int> image height, in pixel space (default: 512)
4384
-W, --width <int> image width, in pixel space (default: 512)
4485
--steps <int> number of sample steps (default: 20)
4586
--high-noise-steps <int> (high noise) number of sample steps (default: -1 = auto)
4687
--clip-skip <int> ignore last layers of CLIP network; 1 ignores none, 2 ignores one layer (default: -1). <= 0 represents unspecified,
4788
will be 1 for SD1.x, 2 for SD2.x
4889
-b, --batch-count <int> batch count
49-
--chroma-t5-mask-pad <int> t5 mask pad size of chroma
5090
--video-frames <int> video frames (default: 1)
5191
--fps <int> fps (default: 24)
5292
--timestep-shift <int> shift timestep for NitroFusion models (default: 0). recommended N for NitroSD-Realism around 250 and 500 for
5393
NitroSD-Vibrant
54-
--preview-interval <int> interval in denoising steps between consecutive updates of the image preview file (default is 1, meaning updating at
55-
every step)
94+
--upscale-repeats <int> Run the ESRGAN upscaler this many times (default: 1)
5695
--cfg-scale <float> unconditional guidance scale: (default: 7.0)
5796
--img-cfg-scale <float> image guidance scale for inpaint or instruct-pix2pix models: (default: same as --cfg-scale)
5897
--guidance <float> distilled guidance scale for models with guidance input (default: 3.5)
@@ -72,53 +111,18 @@ Options:
72111
--pm-style-strength <float>
73112
--control-strength <float> strength to apply Control Net (default: 0.9). 1.0 corresponds to full destruction of information in init image
74113
--moe-boundary <float> timestep boundary for Wan2.2 MoE model. (default: 0.875). Only enabled if `--high-noise-steps` is set to -1
75-
--flow-shift <float> shift value for Flow models like SD3.x or WAN (default: auto)
76114
--vace-strength <float> wan vace strength
77-
--vae-tile-overlap <float> tile overlap for vae tiling, in fraction of tile size (default: 0.5)
78-
--vae-tiling process vae in tiles to reduce memory usage
79-
--force-sdxl-vae-conv-scale force use of conv scale on sdxl vae
80-
--offload-to-cpu place the weights in RAM to save VRAM, and automatically load them into VRAM when needed
81-
--control-net-cpu keep controlnet in cpu (for low vram)
82-
--clip-on-cpu keep clip in cpu (for low vram)
83-
--vae-on-cpu keep vae in cpu (for low vram)
84-
--diffusion-fa use flash attention in the diffusion model
85-
--diffusion-conv-direct use ggml_conv2d_direct in the diffusion model
86-
--vae-conv-direct use ggml_conv2d_direct in the vae model
87-
--canny apply canny preprocessor (edge detection)
88-
-v, --verbose print extra info
89-
--color colors the logging tags according to level
90-
--chroma-disable-dit-mask disable dit mask for chroma
91-
--chroma-enable-t5-mask enable t5 mask for chroma
92115
--increase-ref-index automatically increase the indices of references images based on the order they are listed (starting with 1).
93116
--disable-auto-resize-ref-image disable auto resize of ref images
94-
--taesd-preview-only prevents usage of taesd for decoding the final image. (for use with --preview tae)
95-
--preview-noisy enables previewing noisy inputs of the models rather than the denoised outputs
96-
-M, --mode run mode, one of [img_gen, vid_gen, upscale, convert], default: img_gen
97-
--type weight type (examples: f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0, q2_K, q3_K, q4_K). If not specified, the default is the
98-
type of the weight file
99-
--rng RNG, one of [std_default, cuda, cpu], default: cuda(sd-webui), cpu(comfyui)
100-
--sampler-rng sampler RNG, one of [std_default, cuda, cpu]. If not specified, use --rng
101117
-s, --seed RNG seed (default: 42, use random seed for < 0)
102118
--sampling-method sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm, ddim_trailing,
103119
tcd] (default: euler for Flux/SD3/Wan, euler_a otherwise)
104-
--prediction prediction type override, one of [eps, v, edm_v, sd3_flow, flux_flow, flux2_flow]
105-
--lora-apply-mode the way to apply LoRA, one of [auto, immediately, at_runtime], default is auto. In auto mode, if the model weights
106-
contain any quantized parameters, the at_runtime mode will be used; otherwise,
107-
immediately will be used.The immediately mode may have precision and
108-
compatibility issues with quantized parameters, but it usually offers faster inference
109-
speed and, in some cases, lower memory usage. The at_runtime mode, on the
110-
other hand, is exactly the opposite.
120+
--high-noise-sampling-method (high noise) sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm,
121+
ddim_trailing, tcd] default: euler for Flux/SD3/Wan, euler_a otherwise
111122
--scheduler denoiser sigma scheduler, one of [discrete, karras, exponential, ays, gits, smoothstep, sgm_uniform, simple, lcm],
112123
default: discrete
113124
--skip-layers layers to skip for SLG steps (default: [7,8,9])
114-
--high-noise-sampling-method (high noise) sampling method, one of [euler, euler_a, heun, dpm2, dpm++2s_a, dpm++2m, dpm++2mv2, ipndm, ipndm_v, lcm,
115-
ddim_trailing, tcd] default: euler for Flux/SD3/Wan, euler_a otherwise
116125
--high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9])
117126
-r, --ref-image reference image for Flux Kontext models (can be used multiple times)
118-
-h, --help show this help message and exit
119-
--vae-tile-size tile size for vae tiling, format [X]x[Y] (default: 32x32)
120-
--vae-relative-tile-size relative tile size for vae tiling, format [X]x[Y], in fraction of image size if < 1, in number of tiles per dim if >=1
121-
(overrides --vae-tile-size)
122-
--preview preview method. must be one of the following [none, proj, tae, vae] (default is none)
123127
--easycache enable EasyCache for DiT models with optional "threshold,start_percent,end_percent" (default: 0.2,0.15,0.95)
124128
```

0 commit comments

Comments
 (0)