Skip to content

Commit 7ac9672

Browse files
committed
Best quality generation and more optionss
2 parents ae44801 + 97ece7d commit 7ac9672

12 files changed

+120
-71
lines changed

README.md

Lines changed: 120 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -43,92 +43,100 @@ Ten AI-powered lyrics generation nodes supporting various LLM providers:
4343
### Ace-Step OpenAI Lyrics
4444
Lyrics generation using OpenAI GPT models.
4545

46-
**Supported Models:**
47-
- `gpt-4o` - Latest multimodal model (recommended)
48-
- `gpt-4o-mini` - Smaller, faster variant
49-
- `gpt-4-turbo` - Previous gen high performance
50-
- `gpt-4` - Stable production model
51-
- `gpt-3.5-turbo` - Fast, cost-effective
46+
**Supported Models (December 2025):**
47+
- `gpt-5.1` - Reasoning model (latest)
48+
- `gpt-5.1-codex` - Coding optimized
49+
- `gpt-5` - High performance
50+
- `gpt-5-pro` - Professional variant
51+
- `gpt-4o` - Multimodal (recommended)
52+
- `gpt-4o-mini` - Fast variant
53+
- `gpt-4-turbo` - High performance
54+
- `gpt-4` - Stable base
55+
- `o3` - Reasoning model
56+
- `o3-mini` - Compact reasoning
57+
- `o1` - Advanced reasoning
58+
- `o1-mini` - Compact advanced reasoning
5259

5360
**Category:** `JK AceStep Nodes/Lyrics`
5461

5562
### Ace-Step Claude Lyrics
5663
Lyrics generation using Anthropic Claude models.
5764

58-
**Supported Models:**
59-
- `claude-3-5-sonnet-20241022` - Latest, best quality
60-
- `claude-3-opus-20250219` - Most capable variant
61-
- `claude-3-sonnet-20240229` - Balanced performance
62-
- `claude-3-haiku-20240307` - Fast, compact model
65+
**Supported Models (December 2025):**
66+
- `claude-opus-4.5` - Latest flagship (recommended)
67+
- `claude-opus-4.1` - Previous flagship
68+
- `claude-sonnet-4.5` - Latest balanced
69+
- `claude-sonnet-4` - Previous balanced
70+
- `claude-haiku-4.5` - Latest fast
71+
- `claude-haiku-3.5` - Previous fast
72+
- `claude-3-5-sonnet-20241022` - Snapshot variant
73+
- `claude-3-5-haiku-20241022` - Snapshot variant
74+
- `claude-3-opus-20250219` - Dated variant
6375

6476
**Category:** `JK AceStep Nodes/Lyrics`
6577

6678
### Ace-Step Gemini Lyrics
6779
Lyrics generation using Google Gemini API.
6880

69-
**Supported Models:**
70-
- `gemini-2.5-flash` - Latest flash model
71-
- `gemini-2.5-flash-latest` - Latest updates
72-
- `gemini-2.5-flash-lite` - Fastest/cheapest
73-
- `gemini-2.5-pro` - Highest quality (paid)
74-
- `gemini-2.5-pro-latest` - Pro with latest features
81+
**Supported Models (December 2025):**
82+
- `gemini-3-pro` - Latest pro model (recommended)
83+
- `gemini-2.5-flash` - Fast with latest capabilities
84+
- `gemini-2.5-flash-lite` - Ultra-fast variant
85+
- `gemini-2.5-pro` - High quality
7586
- `gemini-2.0-flash` - Previous generation
76-
- `gemini-2.0-flash-lite` - Previous lite variant
77-
- `gemini-1.5-pro` - Older pro model
78-
- `gemini-1.5-flash` - Older flash model
87+
- `gemini-2.0-flash-lite` - Previous generation lite
7988

8089
**Category:** `JK AceStep Nodes/Lyrics`
8190

8291
### Ace-Step Groq Lyrics
8392
High-speed lyrics generation using Groq API.
8493

85-
**Supported Models (Production):**
86-
- `llama-3.3-70b-versatile` - Meta Llama 3.3 (best quality)
87-
- `llama-3.1-8b-instant` - Meta Llama 3.1 (fast, compact)
88-
- `llama-3.2-1b-preview` - Meta Llama 3.2 (ultra-compact)
89-
- `llama-3.2-3b-preview` - Meta Llama 3.2 (small)
90-
- `llama-3.2-11b-vision-preview` - Meta Llama with vision
91-
- `llama-3.2-90b-vision-preview` - Large vision model
92-
- `mixtral-8x7b-32768` - Mistral mixture of experts
93-
- `openai/gpt-oss-120b` - OpenAI OSS 120B
94-
- `openai/gpt-oss-20b` - OpenAI OSS 20B
95-
96-
**Supported Models (Preview):**
97-
- `meta-llama/llama-4-maverick-17b-128e-instruct` - Llama 4 (preview)
98-
- `meta-llama/llama-4-scout-17b-16e-instruct` - Llama 4 Scout (preview)
94+
**Supported Models (December 2025):**
95+
- `llama-3.3-70b-versatile` - Meta Llama 3.3 70B (best quality)
96+
- `llama-3.1-8b-instant` - Meta Llama 3.1 8B (fast)
97+
- `llama-guard-4-12b` - Meta Guard model
98+
- `deepseek-v3` - DeepSeek V3
99+
- `mistral-small-3` - Mistral Small v3
100+
- `gpt-oss-120b` - OpenAI OSS 120B
101+
- `gpt-oss-20b` - OpenAI OSS 20B
102+
- Plus additional production models
99103

100104
**Category:** `JK AceStep Nodes/Lyrics`
101105

102106
### Ace-Step Perplexity Lyrics
103107
Lyrics generation using Perplexity Sonar models.
104108

105-
**Supported Models:**
109+
**Supported Models (December 2025):**
106110
- `sonar` - Standard model
107111
- `sonar-pro` - Professional variant
108112
- `sonar-reasoning` - Reasoning-focused
109-
- `sonar-reasoning-pro` - Professional reasoning
113+
- `sonar-reasoning-pro` - Advanced reasoning
114+
- `sonar-deep-research` - Deep research variant
110115

111116
**Category:** `JK AceStep Nodes/Lyrics`
112117

113118
### Ace-Step Cohere Lyrics
114119
Lyrics generation using Cohere Command models.
115120

116-
**Supported Models:**
117-
- `command-a-03-2025` - Latest Command A (upcoming)
118-
- `command-r7b-12-2024` - December 2024 update
121+
**Supported Models (December 2025):**
122+
- `command-a-03-2025` - Latest Command A
123+
- `command-r7b-12-2024` - December 2024 variant
119124
- `command-r-plus-08-2024` - R+ August 2024
120125
- `command-r-08-2024` - R August 2024
121-
- `command-r-plus` - Latest R+ variant
122-
- `command-r` - Latest R variant
123-
- `command-a-translate-08-2025` - Specialized translation model
124-
- `command-a-reasoning-08-2025` - Reasoning-focused variant
126+
- `command-a-translate` - Translation specialist
127+
- `command-a-reasoning` - Reasoning-focused
128+
- `command-a-vision` - Vision capabilities
129+
- `aya-expanse-32b` - Aya Expanse 32B
130+
- `aya-expanse-8b` - Aya Expanse 8B
131+
- `aya-vision` - Aya with vision
132+
- `aya-translate` - Aya translation specialist
125133

126134
**Category:** `JK AceStep Nodes/Lyrics`
127135

128136
### Ace-Step Replicate Lyrics
129137
Lyrics generation using Replicate API models.
130138

131-
**Supported Models:**
139+
**Supported Models (December 2025):**
132140
- `meta/llama-3.1-405b-instruct` - 405B instruction-tuned
133141
- `meta/llama-3.1-70b-instruct` - 70B instruction-tuned
134142
- `meta/llama-3.1-8b-instruct` - 8B instruction-tuned
@@ -143,52 +151,52 @@ Lyrics generation using Replicate API models.
143151
### Ace-Step HuggingFace Lyrics
144152
Lyrics generation using HuggingFace Inference API.
145153

146-
**Supported Models:**
154+
**Supported Models (December 2025):**
147155
- `meta-llama/Llama-3.1-405B-Instruct` - Large instruction-tuned
156+
- `meta-llama/Llama-3.3-70B-Instruct-Turbo` - Llama 3.3 70B turbo
148157
- `meta-llama/Llama-3.1-70B-Instruct` - 70B instruction-tuned
149-
- `mistralai/Mistral-7B-Instruct-v0.2` - Mistral 7B
158+
- `mistralai/Mistral-Large` - Large Mistral variant
159+
- `microsoft/Phi-4` - Phi-4 model
150160
- `deepseek-ai/deepseek-v3` - DeepSeek V3
151-
- `qwen/Qwen2.5-72B-Instruct` - Qwen 2.5 72B
152-
- `HuggingFaceH4/zephyr-7b-beta` - Zephyr 7B
153-
- `tiiuae/falcon-7b-instruct` - Falcon 7B
161+
- `Qwen/Qwen2.5-72B-Instruct` - Qwen 2.5 72B
162+
- `google/gemma-2-27b` - Gemma 2 27B
163+
- `tiiuae/falcon-180b` - Falcon 180B
154164

155165
**Category:** `JK AceStep Nodes/Lyrics`
156166

157167
### Ace-Step Together AI Lyrics
158168
Lyrics generation using Together AI serverless models.
159169

160-
**Supported Models (Selection):**
161-
- `meta-llama/Llama-3.3-70B-Instruct-Turbo` - Llama 3.3 70B
162-
- `meta-llama/Llama-3.1-405B-Instruct-Turbo` - Llama 3.1 405B
163-
- `meta-llama/Llama-3.1-70B-Instruct-Turbo` - Llama 3.1 70B
164-
- `meta-llama/Llama-3.1-8B-Instruct-Turbo` - Llama 3.1 8B
170+
**Supported Models (December 2025):**
171+
- `meta-llama/Llama-3.3-70B-Instruct-Turbo` - Llama 3.3 70B turbo
172+
- `meta-llama/Llama-3.1-405B-Instruct-Turbo` - Llama 3.1 405B turbo
165173
- `mistralai/Mistral-Small-24B-Instruct-2501` - Mistral Small 24B
166-
- `mistralai/Ministral-3-8B-Instruct-2512` - Ministral 3 8B
167-
- `deepseek-ai/DeepSeek-V3.1` - DeepSeek V3.1
168-
- `deepseek-ai/DeepSeek-R1` - DeepSeek R1 reasoning
169-
- `Qwen/Qwen3-235B-A22B-Instruct-2507` - Qwen 3 235B
170-
- `moonshotai/Kimi-K2-Instruct-0905` - Kimi K2 instruction
171-
- `google/gemma-3-27b-it` - Gemma 3 27B
172-
- Plus 50+ additional models available
174+
- `Qwen/Qwen2.5-72B-Instruct` - Qwen 2.5 72B
175+
- `deepseek-ai/DeepSeek-V3` - DeepSeek V3
176+
- `moonshotai/Kimi-K2-Instruct` - Kimi K2
177+
- `GLM-4-Plus` - GLM 4 Plus
178+
- `Nous-Hermes-3-70B` - Nous Hermes 3 70B
179+
- Plus 100+ additional models available
173180

174181
**Category:** `JK AceStep Nodes/Lyrics`
175182

176183
### Ace-Step Fireworks Lyrics
177184
Lyrics generation using Fireworks AI models (100+ available).
178185

179-
**Supported Models (Selection):**
180-
- `deepseek-ai/deepseek-v3p2` - DeepSeek V3 P2
186+
**Supported Models (December 2025):**
187+
- `deepseek-ai/deepseek-v3` - DeepSeek V3 (latest)
181188
- `deepseek-ai/deepseek-r1` - DeepSeek R1 reasoning
182-
- `Qwen/Qwen3-235B-A22B-Instruct-2507` - Qwen 3 235B
183-
- `Qwen/Qwen3-Next-80B-A3B-Instruct` - Qwen 3 Next 80B
189+
- `Qwen/Qwen3-235B-A22B-Instruct` - Qwen 3 235B
190+
- `Qwen/Qwen2.5-72B-Instruct-Turbo` - Qwen 2.5 72B turbo
191+
- `meta-llama/Llama-4-Maverick-17B` - Llama 4 Maverick
192+
- `meta-llama/Llama-4-Scout-17B` - Llama 4 Scout
184193
- `meta-llama/Llama-3.3-70B-Instruct` - Llama 3.3 70B
185194
- `meta-llama/Llama-3.1-405B-Instruct` - Llama 3.1 405B
186-
- `meta-llama/Llama-3.1-70B-Instruct` - Llama 3.1 70B
187-
- `mistralai/Mistral-Large-3-675B-Instruct-2512` - Mistral Large 675B
195+
- `mistralai/Mistral-Large-3-675B-Instruct` - Mistral Large 675B
188196
- `mistralai/Mistral-Small-24B-Instruct-2501` - Mistral Small 24B
189-
- `mistralai/Mistral-Nemo-Instruct-2407` - Mistral Nemo
190-
- `mistralai/Mixtral-8x22B-Instruct` - Mixtral 8x22B
191-
- `zai-org/GLM-4.6` - GLM 4.6
197+
- `google/GLM-4.6` - GLM 4.6
198+
- `moonshotai/Kimi-K2` - Kimi K2
199+
- `google/Gemma-3-27b` - Gemma 3 27B
192200
- Plus 90+ additional models available
193201

194202
**Category:** `JK AceStep Nodes/Lyrics`
@@ -235,6 +243,13 @@ Select your preferred variant from any sampler dropdown (default: `jkass_quality
235243
- **CFG:** 4.0-4.5
236244
- **Anti-Autotune:** 0.25-0.35 (vocals), 0.0-0.15 (instruments)
237245

246+
### ✅ Extra settings for reducing 'AI' female vocals
247+
- **Sampler:** `jkass_quality` (for best quality) or `jkass_fast` (for speed)
248+
- **Frequency Damping:** 0.15-0.5 for female vocals to reduce metallic sizzle (0=disabled)
249+
- **Temporal Smoothing:** 0.02-0.12 to reduce pitch quantization and temporal discontinuities
250+
- **Beat Stability:** 0.05-0.2 to preserve stable rhythmic strikes and avoid per-frame jitter
251+
- **Anti-Autotune:** 0.25-0.35 (vocals), 0.0-0.15 (instruments)
252+
238253
---
239254

240255
## 🎯 Quality Check Feature
@@ -249,6 +264,14 @@ Automatically tests multiple step counts to find optimal settings for your promp
249264

250265
- **Word cutting/stuttering:** Use `jkass_quality` sampler, disable advanced optimizations
251266
- **Metallic voice:** Increase `anti_autotune_strength` to 0.3-0.4
267+
- **AI-sounding female voice:** Try the following sequence:
268+
1. Use `jkass_quality` and 80-120 steps, CFG 4.0-4.5, APG enabled
269+
2. Set Anti-Autotune (0.25-0.35), Frequency Damping (0.15-0.4), Temporal Smoothing (0.02-0.06)
270+
3. Use the Prompt Gen with `voice_style` -> `natural_female` and add 'breathy, micro pitch variation' in extra prompt
271+
4. Decode using a high-quality VAE/vocoder (HiFi-GAN, or validated VAE) for improved timbre
272+
5. If still metallic: de-esser and mild EQ cut at 7-12 kHz; add subtle formant correction and breath overlay
273+
274+
**Optional:** Use the `Ace-Step Post Process` node to apply a quick de-essing (reduce 6-10 kHz energy), spectral smoothing, and subtle breath overlay to humanize the vocal further.
252275
- **Poor quality:** Increase steps (80-120), use CFG 4.0-4.5, enable APG, try `jkass_quality` sampler
253276

254277
---
@@ -272,7 +295,7 @@ JK-AceStep-Nodes/
272295
- **OpenAI** - gpt-4o, gpt-4-turbo, gpt-4, gpt-3.5-turbo, and more
273296
- **Anthropic Claude** - Claude 3.5 Sonnet, Claude 3 Opus/Sonnet/Haiku
274297
- **Google Gemini** - gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro/flash
275-
- **Groq** - Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B, GPT-OSS (120B/20B), and preview models
298+
- **Groq** - Llama 3.3 70B, Llama 3.1 8B, Llama Guard 3, GPT-OSS (120B/20B), and Llama 4 preview models
276299
- **Perplexity** - Sonar, Sonar Pro, Sonar Reasoning (with 128k context)
277300
- **Cohere** - Command A/R+ (with reasoning & vision), Aya (multilingual)
278301
- **Replicate** - Llama 3.1 (405B/70B/8B), Mistral Small/Nemo, Mixtral
@@ -282,6 +305,32 @@ JK-AceStep-Nodes/
282305

283306
---
284307

308+
## 🗣️ How to Use the Vocoder (ADaMoSHiFiGAN)
309+
310+
To enable audio conversion with the vocoder (for improved final audio quality):
311+
312+
1. **Obtain the vocoder files:**
313+
- `diffusion_pytorch_model.safetensors` (vocoder model)
314+
- `config.json` (vocoder configuration)
315+
316+
2. **Place both files in the folder:**
317+
- `JK-AceStep-Nodes/vocoder/`
318+
319+
The final path should be:
320+
```
321+
JK-AceStep-Nodes/vocoder/diffusion_pytorch_model.safetensors
322+
JK-AceStep-Nodes/vocoder/config.json
323+
```
324+
325+
3. **Done!**
326+
- The system will automatically detect these files when using nodes with vocoder enabled.
327+
- If the files are not present, audio will be generated without the vocoder.
328+
329+
> **Tip:**
330+
> Always use the correct file pair (model + config) to avoid artifacts or loading errors.
331+
332+
---
333+
285334
## 📄 License
286335

287336
MIT License
1.95 KB
Binary file not shown.
53.7 KB
Binary file not shown.
39.3 KB
Binary file not shown.
3.6 KB
Binary file not shown.
10.7 KB
Binary file not shown.
7.22 KB
Binary file not shown.
31.9 KB
Binary file not shown.
5.1 KB
Binary file not shown.
7.32 KB
Binary file not shown.

0 commit comments

Comments
 (0)