GitHub - ysharma3501/NovaSR: A lightning fast audio upsampler.

NovaSR: Pushing the Limits of Extreme Efficiency in Audio Super-Resolution

Tip

Please check out LavaSR, it has several major benefits over NovaSR

NovaSR: only 16khz input, ~52KB, ~3000× realtime speed, low-medium quality
LavaSR: supports any input from 8–48kHz, ~5000× realtime speed, 50MB, and significantly better quality(surpasses 6gb models)

If you were using the old fast model, try switching to the new one, you’ll get much higher fidelity, flexible input rates, and faster speeds.

This is the repository for NovaSR, a tiny 50kB audio upsampling model that upscales muffled 16khz audio into clear and crisp 48khz audio at speeds over 3500x realtime.

NovaSR.mp4

Key benefits

Speed: Can reach 3600x realtime speed on a single a100 gpu.
Quality: On par with models 5,000x larger.
Size: Just 52kB in size, several thousand times smaller then most.

Why is this even useful?

Enhancing models: NovaSR can enhance TTS model quality considerably with nearly 0 computational cost.
Real-time enhancement: NovaSR allows for on device enhancement of any low quality calls, audio, etc. while using nearly no memory.
Restoring datasets: NovaSR can enhance audio quality of any audio dataset.

Comparisons

Comparisons were done on A100 gpu. Higher realtime means faster processing speeds.

Model	Speed (Real-Time)	Model Size
NovaSR	3600x realtime	~52 KB
FlowHigh	20x realtime	~450 MB
FlashSR	14x realtime	~1000 MB
AudioSR	0.6x realtime	~2000 MB

Examples

Please check the huggingface model for a few examples.

Usage

You can try it on huggingface spaces or locally.

Simple 1 line installation:

pip install git+https://github.com/ysharma3501/NovaSR.git

Load model

from NovaSR import FastSR

upsampler = FastSR() ## downloads from hf

## Use this instead for CPUs as it leads to 3-4x speedup.
# upsampler = FastSR(half=False)

Run model

from IPython.display import Audio

## replace audio_path.wav with your wav/mp3 file
lowres_audio = upsampler.load_audio('audio_path.wav') 

## infer with model
highres_audio = upsampler.infer(lowres_audio).cpu()

display(Audio(highres_audio, rate=48000))

Training

Please check out the kaggle notebook for training the model further on custom datasets: https://www.kaggle.com/code/yatharthsharma888/novasr-training

Info

Q: How much data was this trained on?

A: Just 100 hours of data(mls_sidon along with vctk)

Q: How is it so small?

A: It uses less then 10 tiny conv1d layers along with snake activations based on bigvgan for maximum quality and size.

Q: Will benchmarks come?

A: Yes, I am still training it further and will benchmark it later.

Final Notes

Repo stars and model likes would be appreciated if found helpful, thank you.

Email: yatharthsharma3501@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
NovaSR		NovaSR
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NovaSR: Pushing the Limits of Extreme Efficiency in Audio Super-Resolution

Key benefits

Why is this even useful?

Comparisons

Examples

Usage

Training

Info

Final Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

NovaSR: Pushing the Limits of Extreme Efficiency in Audio Super-Resolution

Key benefits

Why is this even useful?

Comparisons

Examples

Usage

Training

Info

Final Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages