This repository holds the code for adapting OpenAI's Whisper model for typical and aphasic English (Singapore) speech.
You can access the model in HuggingFace here.
Click this link to create the instance with the custom template. By default, Vast.ai only creates one exposed port for the docker container.
Create a GPU instance by clicking the 'RENT' button in any GPU. To reduce latency as much as possible, click Planet Earth in the dropdown and select Asia.
Take a look at your opened instances by clicking the Instances button on the left side. It will take some time for your instance to get up and running. Once it starts running, you should be able to see the Open button turn blue.
Once you have entered the instance, open the terminal. You can do this by clicking file -> new -> terminal. When you're in the terminal, run this:
git clone https://github.com/Aphasia-Chatter/aphasia-whisper-fine-tuning.git
Then, run this:
cd aphasia-whisper-fine-tuning
This script does the following:
- Creates the virtual environment.
- Installs the necessary dependencies and packages.
- Runs the service.
Run:
chmod +x run.sh
Run:
./run.sh
You will be prompted to enter the Hugging Face credentials to gain access to the model. Consult @farhanazmiCS for details.
The application will run on port 50001. In your Vast.ai instance, you can see which external port is mapped to the internal port 50001 by clicking over the public IP of your running instance. Once you click it, you can see something like this:
Open Ports
172.81.127.5:63678 -> 22/tcp
172.81.127.5:63952 -> 50001/tcp
<PUBLIC IP>:<EXTERNAL_PORT> -> <INTERNAL_IP>/tcp
As you can see, external port 63952 is mapped to internal port 50001, which is what our application is running on. This is just an example, as external port and public IPs can change. So, you can access the endpoint at:
http://<PUBLIC_IP>:<EXTERNAL_PORT_THAT_MAPS_TO_INTERNAL_PORT_50001>
Use these parameters to transcribe:
Request Type: POST
Request Endpoint:
http://<PUBLIC_IP>:<EXTERNAL_PORT_THAT_MAPS_TO_INTERNAL_PORT_50001>/transcribe
Request Headers:
Accept: */*
Content-Type: multipart/form-data
Request Body:
audiofile: <Audio File (.m4a, .wav, etc.)>
Expected Response:
{
"status": "SUCCESS",
"transcription": "! I don't know how to smile, Smiling Face, I think."
}