Skip to content

Commit 2d878b5

Browse files
committed
Node.js backend server and riva-frontend app for testing
1 parent 232c737 commit 2d878b5

28 files changed

+20376
-0
lines changed

.gitignore

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -186,3 +186,22 @@ tests/integration/tts/outputs
186186
riva/client/proto/*_pb2.py
187187
riva/client/proto/*_pb2_grpc.py
188188

189+
190+
# Downloaded/generated proto files and repositories
191+
app-backend/common/
192+
app-backend/riva/proto/
193+
194+
# Node.js specific ignores for app-backend
195+
app-backend/node_modules/
196+
app-backend/coverage/
197+
app-backend/.nyc_output/
198+
app-backend/logs/
199+
app-backend/*.log
200+
app-backend/.env
201+
app-backend/.env.local
202+
app-backend/.env.development.local
203+
app-backend/.env.test.local
204+
app-backend/.env.production.local
205+
app-backend/npm-debug.log*
206+
app-backend/yarn-debug.log*
207+
app-backend/yarn-error.log*

app-backend/README.md

Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
# Riva App Backend
2+
3+
This is a Node.js proxy server that connects to the Riva API server. It provides API endpoints for automatic speech recognition (ASR) and text-to-speech (TTS) services.
4+
5+
## Features
6+
7+
- Direct connection to Riva server using official proto files
8+
- ASR (Automatic Speech Recognition) endpoint
9+
- TTS (Text-to-Speech) endpoint
10+
- WAV file support with header analysis and proper processing
11+
- Configurable via environment variables
12+
- WebSocket support for real-time streaming recognition
13+
14+
## Setup
15+
16+
1. Ensure you have Node.js installed (v14 or higher recommended)
17+
18+
2. Install dependencies:
19+
```
20+
npm install
21+
```
22+
23+
3. Download the proto files:
24+
```
25+
npm run download-protos
26+
```
27+
This script will clone the nvidia-riva/common repository and copy the necessary proto files to the `riva/proto` directory.
28+
29+
## Configuration
30+
31+
Create a `.env` file in the root directory with the following variables:
32+
33+
```
34+
PORT=3002
35+
RIVA_API_URL=localhost:50051
36+
```
37+
38+
- `PORT`: The port on which the proxy server will run
39+
- `RIVA_API_URL`: The URL of the Riva API server
40+
41+
## Running the Server
42+
43+
Start the server:
44+
45+
```
46+
npm start
47+
```
48+
49+
This will automatically run the `download-protos` script before starting the server if the proto files are not already present.
50+
51+
## Testing the Application
52+
53+
### Prerequisites
54+
55+
Before testing:
56+
1. Ensure the Riva API server is running at the configured URL
57+
2. Verify that the proto files have been downloaded successfully
58+
3. Make sure the Node.js server is running (check for "Server listening on port 3002" message)
59+
4. Have sample audio files available for testing
60+
61+
### Testing the API Endpoints Directly
62+
63+
#### Testing the Health Endpoint
64+
65+
```bash
66+
curl http://localhost:3002/health
67+
```
68+
69+
Expected response:
70+
```json
71+
{
72+
"status": "ok",
73+
"services": {
74+
"asr": {
75+
"available": true
76+
},
77+
"tts": {
78+
"available": true
79+
}
80+
}
81+
}
82+
```
83+
84+
#### Testing ASR with a WAV File
85+
86+
You can use the included test script:
87+
88+
```bash
89+
# If you have a sample WAV file
90+
node test-asr.js /path/to/your/audio.wav
91+
```
92+
93+
Or test manually with curl:
94+
95+
```bash
96+
# Convert WAV to base64 first
97+
base64 -w 0 /path/to/your/audio.wav > audio.b64
98+
99+
# Send the request
100+
curl -X POST http://localhost:3002/api/recognize \
101+
-H "Content-Type: application/json" \
102+
-d @- << EOF
103+
{
104+
"audio": "$(cat audio.b64)",
105+
"config": {
106+
"encoding": "LINEAR_PCM",
107+
"sampleRateHertz": 16000,
108+
"languageCode": "en-US",
109+
"enableAutomaticPunctuation": true
110+
}
111+
}
112+
EOF
113+
```
114+
115+
### Testing with the Frontend
116+
117+
The best way to test the complete functionality is using the provided frontend application:
118+
119+
1. Start this backend server
120+
2. Start the Riva frontend application
121+
3. Use the frontend to upload audio files or test streaming recognition
122+
123+
### Debugging and Log Information
124+
125+
The server provides detailed logging for audio processing. When processing WAV files, it will:
126+
127+
1. Log detection of WAV headers
128+
2. Display information about:
129+
- Sample rate
130+
- Number of channels
131+
- Bits per sample
132+
- Audio format
133+
134+
When issues occur, check the console output for detailed error messages.
135+
136+
## Troubleshooting Proto Files Download
137+
138+
If you encounter issues downloading proto files:
139+
140+
1. Check your internet connection
141+
2. Verify that git is installed and accessible
142+
3. Look for specific errors in the console output
143+
4. Make sure the `riva_common.proto` file is included in the filter (the download script now includes this file)
144+
5. Try running the download script manually:
145+
```
146+
node download-protos.js
147+
```
148+
6. If problems persist, you can manually clone the repository and copy the proto files:
149+
```
150+
git clone https://github.com/nvidia-riva/common.git
151+
mkdir -p riva/proto
152+
cp common/riva/proto/*.proto riva/proto/
153+
```
154+
155+
## API Endpoints
156+
157+
### Status
158+
159+
- **GET** `/health`
160+
- Returns the status of the ASR and TTS services
161+
162+
### Speech Recognition (ASR)
163+
164+
- **POST** `/api/recognize`
165+
- Request body:
166+
```json
167+
{
168+
"audio": "<base64-encoded audio data>",
169+
"config": {
170+
"encoding": "LINEAR_PCM",
171+
"sampleRateHertz": 16000,
172+
"languageCode": "en-US",
173+
"maxAlternatives": 1,
174+
"enableAutomaticPunctuation": true,
175+
"audioChannelCount": 1
176+
}
177+
}
178+
```
179+
- Response:
180+
```json
181+
{
182+
"results": [
183+
{
184+
"alternatives": [
185+
{
186+
"transcript": "recognized text",
187+
"confidence": 0.98
188+
}
189+
]
190+
}
191+
],
192+
"text": "recognized text",
193+
"confidence": 0.98
194+
}
195+
```
196+
197+
### WebSocket Streaming (ASR)
198+
199+
- **WebSocket** `/streaming/asr`
200+
- First message (config):
201+
```json
202+
{
203+
"sampleRate": 16000,
204+
"encoding": "LINEAR_PCM",
205+
"languageCode": "en-US",
206+
"maxAlternatives": 1,
207+
"enableAutomaticPunctuation": true
208+
}
209+
```
210+
- Subsequent messages: Binary audio data (16-bit PCM)
211+
- Server responses:
212+
```json
213+
{
214+
"results": [
215+
{
216+
"alternatives": [
217+
{
218+
"transcript": "recognized text"
219+
}
220+
]
221+
}
222+
],
223+
"isPartial": true|false
224+
}
225+
```
226+
227+
## Integrating with a New Frontend Application
228+
229+
If you want to create a new frontend application that uses this backend server, follow these guidelines:
230+
231+
### REST API Integration
232+
233+
1. **Server URL Configuration**
234+
- Configure your frontend to connect to the backend at `http://localhost:3002` (or your custom port)
235+
- Ensure your application can handle CORS if the frontend is hosted on a different domain/port
236+
237+
2. **Health Check**
238+
- Implement a health check on application startup:
239+
```javascript
240+
fetch('http://localhost:3002/health')
241+
.then(response => response.json())
242+
.then(data => {
243+
// Check if services are available
244+
const asrAvailable = data.services.asr.available;
245+
const ttsAvailable = data.services.tts.available;
246+
// Update UI accordingly
247+
});
248+
```
249+
250+
3. **File Upload for Speech Recognition**
251+
- Read the audio file as ArrayBuffer
252+
- Convert to base64
253+
- Send to the `/api/recognize` endpoint:
254+
```javascript
255+
// Example in JavaScript
256+
const fileReader = new FileReader();
257+
fileReader.onload = async (event) => {
258+
const arrayBuffer = event.target.result;
259+
const base64Audio = arrayBufferToBase64(arrayBuffer);
260+
261+
const response = await fetch('http://localhost:3002/api/recognize', {
262+
method: 'POST',
263+
headers: { 'Content-Type': 'application/json' },
264+
body: JSON.stringify({
265+
audio: base64Audio,
266+
config: {
267+
encoding: 'LINEAR_PCM',
268+
sampleRateHertz: 16000,
269+
languageCode: 'en-US',
270+
enableAutomaticPunctuation: true
271+
}
272+
})
273+
});
274+
275+
const result = await response.json();
276+
// Process transcription result
277+
};
278+
fileReader.readAsArrayBuffer(audioFile);
279+
280+
// Helper function to convert ArrayBuffer to base64
281+
function arrayBufferToBase64(buffer) {
282+
let binary = '';
283+
const bytes = new Uint8Array(buffer);
284+
for (let i = 0; i < bytes.byteLength; i++) {
285+
binary += String.fromCharCode(bytes[i]);
286+
}
287+
return window.btoa(binary);
288+
}
289+
```
290+
291+
### WebSocket Integration for Streaming ASR
292+
293+
1. **Create WebSocket Connection**
294+
```javascript
295+
const ws = new WebSocket('ws://localhost:3002/streaming/asr');
296+
```
297+
298+
2. **Send Configuration on Connection**
299+
```javascript
300+
ws.onopen = () => {
301+
const config = {
302+
sampleRate: 16000,
303+
encoding: 'LINEAR_PCM',
304+
languageCode: 'en-US',
305+
maxAlternatives: 1,
306+
enableAutomaticPunctuation: true
307+
};
308+
ws.send(JSON.stringify(config));
309+
};
310+
```
311+
312+
3. **Capture and Send Audio**
313+
```javascript
314+
// Assuming you have access to audio data as Int16Array
315+
// This could be from a microphone input or processed audio data
316+
function sendAudioChunk(audioData) {
317+
if (ws.readyState === WebSocket.OPEN) {
318+
ws.send(audioData.buffer);
319+
}
320+
}
321+
```
322+
323+
4. **Process Recognition Results**
324+
```javascript
325+
ws.onmessage = (event) => {
326+
const response = JSON.parse(event.data);
327+
if (response.results && response.results.length > 0) {
328+
const result = response.results[0];
329+
if (result.alternatives && result.alternatives.length > 0) {
330+
const transcript = result.alternatives[0].transcript;
331+
const isPartial = response.isPartial;
332+
// Update UI with transcript
333+
// Treat partial results differently from final results
334+
}
335+
}
336+
};
337+
```
338+
339+
5. **Handle Errors and Connection Close**
340+
```javascript
341+
ws.onerror = (error) => {
342+
console.error('WebSocket error:', error);
343+
// Show error in UI
344+
};
345+
346+
ws.onclose = (event) => {
347+
console.log(`WebSocket closed with code ${event.code}`);
348+
// Handle reconnection or update UI
349+
};
350+
```
351+
352+
### CORS Considerations
353+
354+
The backend server is configured to allow cross-origin requests. If you encounter CORS issues:
355+
356+
1. Ensure the backend is properly configured with CORS headers
357+
2. Check that your frontend is using the correct protocol (http/https)
358+
3. Avoid mixing secure and insecure contexts
359+
360+
### Example Implementation
361+
362+
For a complete example of frontend integration, refer to the companion `riva-frontend` repository, which demonstrates both file upload and streaming implementations.

0 commit comments

Comments
 (0)