Skip to content

ysharma3501/NovaSR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NovaSR: Pushing the Limits of Extreme Efficiency in Audio Super-Resolution

Tip

Please check out LavaSR, it has several major benefits over NovaSR

  • NovaSR: only 16khz input, ~52KB, ~3000× realtime speed, low-medium quality
  • LavaSR: supports any input from 8–48kHz, ~5000× realtime speed, 50MB, and significantly better quality(surpasses 6gb models)

If you were using the old fast model, try switching to the new one, you’ll get much higher fidelity, flexible input rates, and faster speeds.

Hugging Face Model   Hugging Face Space   Kaggle Notebook

This is the repository for NovaSR, a tiny 50kB audio upsampling model that upscales muffled 16khz audio into clear and crisp 48khz audio at speeds over 3500x realtime.

NovaSR.mp4

Key benefits

  • Speed: Can reach 3600x realtime speed on a single a100 gpu.
  • Quality: On par with models 5,000x larger.
  • Size: Just 52kB in size, several thousand times smaller then most.

Why is this even useful?

  • Enhancing models: NovaSR can enhance TTS model quality considerably with nearly 0 computational cost.
  • Real-time enhancement: NovaSR allows for on device enhancement of any low quality calls, audio, etc. while using nearly no memory.
  • Restoring datasets: NovaSR can enhance audio quality of any audio dataset.

Comparisons

Comparisons were done on A100 gpu. Higher realtime means faster processing speeds.

Model Speed (Real-Time) Model Size
NovaSR 3600x realtime ~52 KB
FlowHigh 20x realtime ~450 MB
FlashSR 14x realtime ~1000 MB
AudioSR 0.6x realtime ~2000 MB

Examples

Please check the huggingface model for a few examples.

Usage

You can try it on huggingface spaces or locally.

Simple 1 line installation:

pip install git+https://github.com/ysharma3501/NovaSR.git

Load model

from NovaSR import FastSR

upsampler = FastSR() ## downloads from hf

## Use this instead for CPUs as it leads to 3-4x speedup.
# upsampler = FastSR(half=False)

Run model

from IPython.display import Audio

## replace audio_path.wav with your wav/mp3 file
lowres_audio = upsampler.load_audio('audio_path.wav') 

## infer with model
highres_audio = upsampler.infer(lowres_audio).cpu()

display(Audio(highres_audio, rate=48000))

Training

Please check out the kaggle notebook for training the model further on custom datasets: https://www.kaggle.com/code/yatharthsharma888/novasr-training

Info

Q: How much data was this trained on?

A: Just 100 hours of data(mls_sidon along with vctk)

Q: How is it so small?

A: It uses less then 10 tiny conv1d layers along with snake activations based on bigvgan for maximum quality and size.

Q: Will benchmarks come?

A: Yes, I am still training it further and will benchmark it later.

Final Notes

Repo stars and model likes would be appreciated if found helpful, thank you.

Email: yatharthsharma3501@gmail.com

About

A lightning fast audio upsampler.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages