Skip to content

Commit fb77429

Browse files
authored
Merge pull request #23 from IAHispano/docs-overhaul
docs: Complete overhaul of Astro Starlight documentation
2 parents 4cf411e + bfc21c2 commit fb77429

File tree

18 files changed

+850
-1060
lines changed

18 files changed

+850
-1060
lines changed
Lines changed: 31 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,46 @@
11
---
2-
title: Audio Analyzer
3-
description: Audio Analyzer is a tool designed to obtain detailed information about audio files.
2+
title: "Audio Analyzer"
3+
description: "Learn how to use the Audio Analyzer to get detailed information about your audio files."
44
---
55

6-
![Audio Analyzer Interface](/images/audio-analyzer.png)
6+
import { Aside, Steps } from '@astrojs/starlight/components';
77

8-
## On what kind of occasion can audio analyzer be useful?
8+
The **Audio Analyzer** is a powerful tool that provides detailed information about your audio files, including sample rate, frequency distribution, and more. This information is crucial for training high-quality voice models.
99

10-
If you want to perform a training session correctly, it is advisable to know the frequency (Sample Rate) of the audio that is being used. Currently applio is compatible and has pretraineds in `32k, 40k and 48k`, these values refer to the hertz rate at which the pretraineds are created to use (32000hz, 40000hz, 48000hz). This clearly means that you will have to use audio in the mentioned frequencies to have an adequate result, especially when you have clean and quality audio.
11-
- You can observe the audio frequency in a reliable software such as audacity, fl studio, [Spek](https://github.com/alexkay/spek/releases/download/v0.8.5/spek-0.8.5-beta.zip) etc, But if you need to have more precise details about it, use the tool.
10+
![The Audio Analyzer interface in Applio, showing the audio upload section.](/images/audio-analyzer.png)
1211

13-
## Use Audio Analyzer Tool
12+
## Why is the Sample Rate Important?
1413

15-
### Upload your Audio
16-
To proceed to use the Analyzer tool, go to the extra section, upload your audio, and click "get information about audio".
14+
Applio's pre-trained models are available in three sample rates: **32k**, **40k**, and **48k** (corresponding to 32,000 Hz, 40,000 Hz, and 48,000 Hz). For the best training results, the sample rate of your dataset should match the sample rate of the pre-trained model you are using.
1715

18-
### Check the information given
16+
While you can check the sample rate in audio editors like Audacity, the Audio Analyzer provides a more detailed analysis of your audio's frequency content.
1917

20-
When you get your audio you will see several information about it
18+
## How to Use the Audio Analyzer
2119

22-
![Audio Analyzer Result](/images/audio-analyzer-result.png)
20+
<Steps>
21+
1. Navigate to the **Extras** tab in Applio.
22+
2. Upload your audio file using the **Upload Audio** box.
23+
3. Click the **Get Information About Audio** button.
24+
</Steps>
2325

24-
## How these graphs can help
26+
Once the analysis is complete, you will see a detailed breakdown of your audio file, including a spectrogram and several spectral feature graphs.
2527

26-
These three graphs provide valuable information about the audio you are about to use, allowing you to fine-tune your settings prior to training for optimal results, like the Spectrogram and Spectral Features, these provide crucial information. The spectrogram displays the full set of frequencies present in the audio, allowing you to identify unwanted sounds, such as background noise or unwanted frequencies.
28+
![The results of an audio analysis in Applio, showing the spectrogram and spectral feature graphs.](/images/audio-analyzer-result.png)
2729

28-
This also applies to the Spectral Features, with the three data thresholds that are provided, a lot of information is shared about the audio, which helps to further examine its characteristics in low, mid and high frequencies, for example;
30+
## Understanding the Graphs
2931

30-
- **Spectral Centroid:** This graph shows both low and high frequencies within the audio, _represented as mentioned_, in the graph the higher frequencies will be upwards, while the lower frequencies will be downwards.
31-
- **Spectral Bandwidth:** This can somewhat represent the "variety of things (in this case taking context of the general content of the audio)", normally this would not take on much importance except for convert something other than a voice.
32-
- **Spectral Rolloff:** Basically, the rolloff takes all the audio context of the above-mentioned graphs under a specific volume threshold (in this case of the audio).
32+
The graphs provided by the Audio Analyzer can help you identify issues with your audio and fine-tune your training settings for optimal results.
3333

34-
Finally, you can also get the frequency using the audio analyzer, both for the spectrogram section as the Spectral Features, in the spectrogram you can observe the values and duplicate them, as with the Spectral Features, the numbers shown around the graph will help determine the frequency.
34+
### Spectrogram
35+
36+
The spectrogram is a visual representation of the frequencies in your audio over time. It can help you identify unwanted noise, such as background hiss or electrical hum, which you can then remove using an audio editor.
37+
38+
### Spectral Features
39+
40+
The spectral feature graphs provide a more detailed look at the frequency content of your audio.
41+
42+
- **Spectral Centroid:** This graph represents the "center of mass" of the spectrum. A higher spectral centroid indicates that the audio has more high-frequency content, while a lower spectral centroid indicates more low-frequency content. This can help you understand the overall brightness or darkness of the audio.
43+
- **Spectral Bandwidth:** This graph shows the range of frequencies in the audio. A wider bandwidth indicates a more complex sound with a wider range of frequencies, while a narrower bandwidth indicates a simpler sound.
44+
- **Spectral Rolloff:** This graph shows the frequency below which a certain percentage of the total spectral energy lies. It's another way to measure the "skewness" of the spectral distribution and can be useful for distinguishing between different types of sounds.
45+
46+
By understanding these graphs, you can make more informed decisions about your audio processing and training settings, leading to better voice models.
Lines changed: 23 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,44 @@
11
---
2-
title: Embedders
3-
description: Learn about embedders and how to use them in voice conversion
2+
title: "Understanding Embedders"
3+
description: "Learn what embedders are and how to use them effectively in your voice conversion projects."
44
---
55

66
import { Aside, Steps } from '@astrojs/starlight/components';
77

8-
## What are embedders?
8+
## What is an Embedder?
99

10-
Embedders are neural network models that convert raw audio input into high-dimensional vector representations. These representations capture essential acoustic and linguistic features of the audio, making them crucial for various audio processing tasks, including voice conversion.
10+
An **embedder** is a crucial component in the voice conversion process. It's a neural network that analyzes an audio file and converts it into a set of numerical representations, called "embeddings." These embeddings capture the essential acoustic and linguistic features of the audio, such as the speaker's tone, pitch, and accent.
1111

12-
## How to use embedders?
12+
Think of an embedder as a translator that turns complex audio waves into a simplified language that the voice conversion model can understand and work with.
1313

14-
Embedders are used in two main stages of the voice conversion process:
14+
## How to Use Embedders in Applio
1515

16-
- **Training:** Select the embedder in the extraction settings.
17-
- **Inference:** Choose the same embedder in the advanced settings.
16+
You'll interact with embedders at two key stages of the voice conversion process:
1817

19-
<Aside type="caution">
20-
It is critical to use the same embedder for both training and inference. The embedder used to train the pretrained model must be consistent throughout the entire process.
18+
- **Training:** When you're training a new voice model, you'll need to select an embedder in the **Extraction Settings**.
19+
- **Inference:** When you're using a trained model to convert a voice, you must select the *same* embedder in the **Advanced Settings**.
20+
21+
<Aside type="danger" title="Critical Information">
22+
It is absolutely essential to use the same embedder for both training and inference. Using different embedders will result in poor-quality output or errors.
2123
</Aside>
2224

23-
## Where to find embedders?
25+
## Where to Find Embedders
2426

25-
You can find a variety of embedders on [Hugging Face](https://huggingface.co/models?pipeline_tag=feature-extraction&sort=trending&search=Hubert). To narrow down your search:
27+
You can find a wide variety of pre-trained embedders on [Hugging Face](https://huggingface.co/models?pipeline_tag=feature-extraction&sort=trending&search=Hubert). Here's how to find them:
2628

2729
<Steps>
28-
1. Visit the Hugging Face model hub.
29-
2. Apply the "Feature Extraction" filter.
30-
3. Search for specific embedder types (e.g., "HuBERT", "Contentvec").
31-
4. Sort by trending or other relevant metrics to find popular and well-maintained models.
30+
1. Go to the Hugging Face model hub.
31+
2. In the sidebar, filter by **Task > Feature Extraction**.
32+
3. Use the search bar to find specific embedder types, such as "HuBERT" or "ContentVec".
3233
</Steps>
3334

3435
<Aside type="note">
35-
When choosing an embedder, consider factors such as model size, supported languages, and community adoption.
36+
When choosing an embedder, consider factors like the model's size, the languages it was trained on, and its popularity within the community.
3637
</Aside>
3738

38-
## Best practices
39+
## Best Practices for Using Embedders
3940

40-
- Experiment with different embedders to find the best fit for your specific voice conversion task.
41-
- Keep track of which embedder you use for each model to ensure consistency.
42-
- Stay updated with the latest developments in audio embedders, as new models may offer improved performance.
41+
- **Consistency is Key:** Always use the same embedder for a given model, from training all the way through to inference.
42+
- **Keep Track:** If you're working with multiple models, keep a record of which embedder you used for each one.
43+
- **Experiment:** Don't be afraid to experiment with different embedders to see which one works best for your specific use case. Some embedders may be better suited for singing, while others excel at speech.
44+
- **Stay Updated:** The field of audio processing is constantly evolving. Keep an eye out for new and improved embedders that may offer better performance.
Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
title: "Google Colab Guide"
3+
description: "Learn how to use Applio in the cloud with Google Colab."
4+
---
5+
6+
import { Aside, Steps } from '@astrojs/starlight/components';
7+
8+
Google Colab provides a convenient way to use Applio without needing a powerful local computer. However, it's important to be aware of the risks and limitations.
9+
10+
<Aside type="danger" title="Important Notice">
11+
Launching graphical user interfaces (UIs) like Applio on Google Colab is against their Terms of Service. Doing so may result in limitations being placed on your Google account. If you understand and accept this risk, you may proceed.
12+
13+
As a safer alternative, we recommend using the official [Applio No UI Colab Notebook](https://colab.research.google.com/github/iahispano/applio/blob/main/assets/Applio_NoUI.ipynb), which is designed to be used without a graphical interface.
14+
</Aside>
15+
16+
## Getting Started with the Applio UI Colab
17+
18+
If you choose to proceed with the UI version, here's how to get it running.
19+
20+
<Steps>
21+
1. **Open the Colab Notebook:** Launch the [Applio UI Colab Notebook](https://colab.research.google.com/github/iahispano/applio/blob/main/assets/Applio.ipynb).
22+
2. **Install Applio:** Run the first cell, labeled "Install Applio," by clicking the play button. This will install Applio and all its dependencies.
23+
3. **Launch the Interface:** Run the second cell. This will launch the Applio interface and provide you with a URL to access it. We recommend using the `localtunnel` sharing method for a more stable connection.
24+
4. **Access the UI:** Open the provided URL. You will be prompted for a password, which is the IP address displayed in the Colab cell output.
25+
</Steps>
26+
27+
![A screenshot showing the two main cells to run in the Applio Colab notebook.](/images/colab.png)
28+
29+
## Training on Colab
30+
31+
Training models on Colab requires a bit of extra setup to ensure you don't lose your progress.
32+
33+
### Syncing with Google Drive
34+
35+
We highly recommend syncing your Colab instance with Google Drive. This will save your trained models to a folder called `ApplioBackup` in your Google Drive and allow you to resume training from a previously saved model.
36+
37+
To do this, run the **Sync with Google Drive** cell in the Colab notebook.
38+
39+
![A screenshot of the "Sync with Google Drive" cell in the Applio Colab notebook.](/images/extra-colab.png)
40+
41+
### Resuming Training
42+
43+
To resume training a model that you've previously saved to Google Drive:
44+
45+
<Steps>
46+
1. Run all the initial cells, including **Install Applio** and **Sync with Google Drive**.
47+
2. In the Applio UI, go to the **Train** tab.
48+
3. Enter the name of your model.
49+
4. Select the same sample rate you used previously.
50+
5. Load your custom pretrained model if you used one.
51+
6. Increase the number of epochs and click **Train** to continue training.
52+
</Steps>
53+
54+
## Managing Models on Colab
55+
56+
### Exporting Your Final Model
57+
58+
Once your model is fully trained, you can export it to your Google Drive.
59+
60+
<Steps>
61+
1. Go to the **Train** tab and click the **Export Model** sub-tab.
62+
2. Click the **Refresh** button.
63+
3. Select the `.pth` and `.index` files for your model.
64+
4. Click the **Upload** button. Your model will be saved to a folder named `ApplioExported` in your Google Drive.
65+
</Steps>
66+
67+
## Keeping Colab Active
68+
69+
Google Colab will automatically disconnect idle notebooks. To prevent this from happening during a long training session, you can run a small script in your browser's developer console.
70+
71+
<Steps>
72+
1. Press `Ctrl + Shift + i` to open the developer tools.
73+
2. Go to the **Console** tab.
74+
3. Type `Allow pasting` and press Enter.
75+
4. Paste the following code into the console and press Enter:
76+
```js
77+
function ClickConnect() {
78+
var iconElement = document.getElementById("toggle-header-button");
79+
if (iconElement) {
80+
var clickEvent = new MouseEvent("click", {
81+
bubbles: true,
82+
cancelable: true,
83+
view: window,
84+
});
85+
iconElement.dispatchEvent(clickEvent);
86+
}
87+
}
88+
setInterval(ClickConnect, 60000);
89+
```
90+
</Steps>
91+
92+
This script will simulate a click every minute, keeping your Colab session active.

0 commit comments

Comments
 (0)