Tip
Please check out LavaSR, it has several major benefits over NovaSR
- NovaSR: only 16khz input, ~52KB, ~3000× realtime speed, low-medium quality
- LavaSR: supports any input from 8–48kHz, ~5000× realtime speed, 50MB, and significantly better quality(surpasses 6gb models)
If you were using the old fast model, try switching to the new one, you’ll get much higher fidelity, flexible input rates, and faster speeds.
This is the repository for NovaSR, a tiny 50kB audio upsampling model that upscales muffled 16khz audio into clear and crisp 48khz audio at speeds over 3500x realtime.
NovaSR.mp4
- Speed: Can reach 3600x realtime speed on a single a100 gpu.
- Quality: On par with models 5,000x larger.
- Size: Just 52kB in size, several thousand times smaller then most.
- Enhancing models: NovaSR can enhance TTS model quality considerably with nearly 0 computational cost.
- Real-time enhancement: NovaSR allows for on device enhancement of any low quality calls, audio, etc. while using nearly no memory.
- Restoring datasets: NovaSR can enhance audio quality of any audio dataset.
Comparisons were done on A100 gpu. Higher realtime means faster processing speeds.
| Model | Speed (Real-Time) | Model Size |
|---|---|---|
| NovaSR | 3600x realtime | ~52 KB |
| FlowHigh | 20x realtime | ~450 MB |
| FlashSR | 14x realtime | ~1000 MB |
| AudioSR | 0.6x realtime | ~2000 MB |
Please check the huggingface model for a few examples.
You can try it on huggingface spaces or locally.
Simple 1 line installation:
pip install git+https://github.com/ysharma3501/NovaSR.git
Load model
from NovaSR import FastSR
upsampler = FastSR() ## downloads from hf
## Use this instead for CPUs as it leads to 3-4x speedup.
# upsampler = FastSR(half=False)Run model
from IPython.display import Audio
## replace audio_path.wav with your wav/mp3 file
lowres_audio = upsampler.load_audio('audio_path.wav')
## infer with model
highres_audio = upsampler.infer(lowres_audio).cpu()
display(Audio(highres_audio, rate=48000))Please check out the kaggle notebook for training the model further on custom datasets: https://www.kaggle.com/code/yatharthsharma888/novasr-training
Q: How much data was this trained on?
A: Just 100 hours of data(mls_sidon along with vctk)
Q: How is it so small?
A: It uses less then 10 tiny conv1d layers along with snake activations based on bigvgan for maximum quality and size.
Q: Will benchmarks come?
A: Yes, I am still training it further and will benchmark it later.
Repo stars and model likes would be appreciated if found helpful, thank you.
Email: yatharthsharma3501@gmail.com