|
| 1 | +# Riva App Backend |
| 2 | + |
| 3 | +This is a Node.js proxy server that connects to the Riva API server. It provides API endpoints for automatic speech recognition (ASR) and text-to-speech (TTS) services. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- Direct connection to Riva server using official proto files |
| 8 | +- ASR (Automatic Speech Recognition) endpoint |
| 9 | +- TTS (Text-to-Speech) endpoint |
| 10 | +- WAV file support with header analysis and proper processing |
| 11 | +- Configurable via environment variables |
| 12 | +- WebSocket support for real-time streaming recognition |
| 13 | + |
| 14 | +## Setup |
| 15 | + |
| 16 | +1. Ensure you have Node.js installed (v14 or higher recommended) |
| 17 | + |
| 18 | +2. Install dependencies: |
| 19 | + ``` |
| 20 | + npm install |
| 21 | + ``` |
| 22 | + |
| 23 | +3. Download the proto files: |
| 24 | + ``` |
| 25 | + npm run download-protos |
| 26 | + ``` |
| 27 | + This script will clone the nvidia-riva/common repository and copy the necessary proto files to the `riva/proto` directory. |
| 28 | + |
| 29 | +## Configuration |
| 30 | + |
| 31 | +Create a `.env` file in the root directory with the following variables: |
| 32 | + |
| 33 | +``` |
| 34 | +PORT=3002 |
| 35 | +RIVA_API_URL=localhost:50051 |
| 36 | +``` |
| 37 | + |
| 38 | +- `PORT`: The port on which the proxy server will run |
| 39 | +- `RIVA_API_URL`: The URL of the Riva API server |
| 40 | + |
| 41 | +## Running the Server |
| 42 | + |
| 43 | +Start the server: |
| 44 | + |
| 45 | +``` |
| 46 | +npm start |
| 47 | +``` |
| 48 | + |
| 49 | +This will automatically run the `download-protos` script before starting the server if the proto files are not already present. |
| 50 | + |
| 51 | +## Testing the Application |
| 52 | + |
| 53 | +### Prerequisites |
| 54 | + |
| 55 | +Before testing: |
| 56 | +1. Ensure the Riva API server is running at the configured URL |
| 57 | +2. Verify that the proto files have been downloaded successfully |
| 58 | +3. Make sure the Node.js server is running (check for "Server listening on port 3002" message) |
| 59 | +4. Have sample audio files available for testing |
| 60 | + |
| 61 | +### Testing the API Endpoints Directly |
| 62 | + |
| 63 | +#### Testing the Health Endpoint |
| 64 | + |
| 65 | +```bash |
| 66 | +curl http://localhost:3002/health |
| 67 | +``` |
| 68 | + |
| 69 | +Expected response: |
| 70 | +```json |
| 71 | +{ |
| 72 | + "status": "ok", |
| 73 | + "services": { |
| 74 | + "asr": { |
| 75 | + "available": true |
| 76 | + }, |
| 77 | + "tts": { |
| 78 | + "available": true |
| 79 | + } |
| 80 | + } |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +#### Testing ASR with a WAV File |
| 85 | + |
| 86 | +You can use the included test script: |
| 87 | + |
| 88 | +```bash |
| 89 | +# If you have a sample WAV file |
| 90 | +node test-asr.js /path/to/your/audio.wav |
| 91 | +``` |
| 92 | + |
| 93 | +Or test manually with curl: |
| 94 | + |
| 95 | +```bash |
| 96 | +# Convert WAV to base64 first |
| 97 | +base64 -w 0 /path/to/your/audio.wav > audio.b64 |
| 98 | + |
| 99 | +# Send the request |
| 100 | +curl -X POST http://localhost:3002/api/recognize \ |
| 101 | + -H "Content-Type: application/json" \ |
| 102 | + -d @- << EOF |
| 103 | +{ |
| 104 | + "audio": "$(cat audio.b64)", |
| 105 | + "config": { |
| 106 | + "encoding": "LINEAR_PCM", |
| 107 | + "sampleRateHertz": 16000, |
| 108 | + "languageCode": "en-US", |
| 109 | + "enableAutomaticPunctuation": true |
| 110 | + } |
| 111 | +} |
| 112 | +EOF |
| 113 | +``` |
| 114 | + |
| 115 | +### Testing with the Frontend |
| 116 | + |
| 117 | +The best way to test the complete functionality is using the provided frontend application: |
| 118 | + |
| 119 | +1. Start this backend server |
| 120 | +2. Start the Riva frontend application |
| 121 | +3. Use the frontend to upload audio files or test streaming recognition |
| 122 | + |
| 123 | +### Debugging and Log Information |
| 124 | + |
| 125 | +The server provides detailed logging for audio processing. When processing WAV files, it will: |
| 126 | + |
| 127 | +1. Log detection of WAV headers |
| 128 | +2. Display information about: |
| 129 | + - Sample rate |
| 130 | + - Number of channels |
| 131 | + - Bits per sample |
| 132 | + - Audio format |
| 133 | + |
| 134 | +When issues occur, check the console output for detailed error messages. |
| 135 | + |
| 136 | +## Troubleshooting Proto Files Download |
| 137 | + |
| 138 | +If you encounter issues downloading proto files: |
| 139 | + |
| 140 | +1. Check your internet connection |
| 141 | +2. Verify that git is installed and accessible |
| 142 | +3. Look for specific errors in the console output |
| 143 | +4. Make sure the `riva_common.proto` file is included in the filter (the download script now includes this file) |
| 144 | +5. Try running the download script manually: |
| 145 | + ``` |
| 146 | + node download-protos.js |
| 147 | + ``` |
| 148 | +6. If problems persist, you can manually clone the repository and copy the proto files: |
| 149 | + ``` |
| 150 | + git clone https://github.com/nvidia-riva/common.git |
| 151 | + mkdir -p riva/proto |
| 152 | + cp common/riva/proto/*.proto riva/proto/ |
| 153 | + ``` |
| 154 | + |
| 155 | +## API Endpoints |
| 156 | + |
| 157 | +### Status |
| 158 | + |
| 159 | +- **GET** `/health` |
| 160 | + - Returns the status of the ASR and TTS services |
| 161 | + |
| 162 | +### Speech Recognition (ASR) |
| 163 | + |
| 164 | +- **POST** `/api/recognize` |
| 165 | + - Request body: |
| 166 | + ```json |
| 167 | + { |
| 168 | + "audio": "<base64-encoded audio data>", |
| 169 | + "config": { |
| 170 | + "encoding": "LINEAR_PCM", |
| 171 | + "sampleRateHertz": 16000, |
| 172 | + "languageCode": "en-US", |
| 173 | + "maxAlternatives": 1, |
| 174 | + "enableAutomaticPunctuation": true, |
| 175 | + "audioChannelCount": 1 |
| 176 | + } |
| 177 | + } |
| 178 | + ``` |
| 179 | + - Response: |
| 180 | + ```json |
| 181 | + { |
| 182 | + "results": [ |
| 183 | + { |
| 184 | + "alternatives": [ |
| 185 | + { |
| 186 | + "transcript": "recognized text", |
| 187 | + "confidence": 0.98 |
| 188 | + } |
| 189 | + ] |
| 190 | + } |
| 191 | + ], |
| 192 | + "text": "recognized text", |
| 193 | + "confidence": 0.98 |
| 194 | + } |
| 195 | + ``` |
| 196 | + |
| 197 | +### WebSocket Streaming (ASR) |
| 198 | + |
| 199 | +- **WebSocket** `/streaming/asr` |
| 200 | + - First message (config): |
| 201 | + ```json |
| 202 | + { |
| 203 | + "sampleRate": 16000, |
| 204 | + "encoding": "LINEAR_PCM", |
| 205 | + "languageCode": "en-US", |
| 206 | + "maxAlternatives": 1, |
| 207 | + "enableAutomaticPunctuation": true |
| 208 | + } |
| 209 | + ``` |
| 210 | + - Subsequent messages: Binary audio data (16-bit PCM) |
| 211 | + - Server responses: |
| 212 | + ```json |
| 213 | + { |
| 214 | + "results": [ |
| 215 | + { |
| 216 | + "alternatives": [ |
| 217 | + { |
| 218 | + "transcript": "recognized text" |
| 219 | + } |
| 220 | + ] |
| 221 | + } |
| 222 | + ], |
| 223 | + "isPartial": true|false |
| 224 | + } |
| 225 | + ``` |
| 226 | + |
| 227 | +### Text to Speech (TTS) |
| 228 | + |
| 229 | +- **POST** `/api/synthesize` |
| 230 | + - Request body: |
| 231 | + ```json |
| 232 | + { |
| 233 | + "text": "Text to be synthesized", |
| 234 | + "voice": "en-US-Scarlett", |
| 235 | + "language": "en-US" |
| 236 | + } |
| 237 | + ``` |
| 238 | + - Response: |
| 239 | + ```json |
| 240 | + { |
| 241 | + "audio": "<base64-encoded audio data>" |
| 242 | + } |
| 243 | + ``` |
0 commit comments