Skip to content

Adding Realtime ASR Client #120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
steps:
- run:
name: "Install build dependencies"
command: "sudo apt-get --allow-releaseinfo-change update && sudo apt-get install -y wget libasound2-dev libopus-dev libopusfile-dev"
command: "sudo apt-get --allow-releaseinfo-change update && sudo apt-get install -y wget libasound2-dev libopus-dev libopusfile-dev libboost-all-dev"
- run:
name: "Install bazel"
command: "wget https://github.com/bazelbuild/bazelisk/releases/download/v1.11.0/bazelisk-linux-amd64 && sudo mv bazelisk-linux-amd64 /usr/local/bin/bazelisk && sudo chmod +x /usr/local/bin/bazelisk"
Expand Down
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ RUN apt-get update && apt-get install -y \
libasound2t64 \
libogg0 \
openssl \
ca-certificates
ca-certificates \
libboost-all-dev

FROM base AS builddep
ARG BAZEL_VERSION
Expand Down Expand Up @@ -67,4 +68,5 @@ COPY --from=builder /opt/riva/clients/nlp/riva_nlp_punct /usr/local/bin/
COPY --from=builder /opt/riva/clients/nmt/riva_nmt_t2t_client /usr/local/bin/
COPY --from=builder /opt/riva/clients/nmt/riva_nmt_streaming_s2t_client /usr/local/bin/
COPY --from=builder /opt/riva/clients/nmt/riva_nmt_streaming_s2s_client /usr/local/bin/
COPY --from=builder /opt/riva/clients/realtime/riva_realtime_asr_client /usr/local/bin/
COPY examples /work/examples
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ NVIDIA Riva is a GPU-accelerated SDK for building Speech AI applications that ar
- **Automatic Speech Recognition (ASR)**
- `riva_streaming_asr_client`
- `riva_asr_client`
- `riva_realtime_asr_client`
- **Speech Synthesis (TTS)**
- `riva_tts_client`
- `riva_tts_perf_client`
Expand Down Expand Up @@ -73,6 +74,7 @@ You can find the built binaries in `bazel-bin/riva/clients`
Riva comes with 2 ASR clients:
1. `riva_asr_client` for offline usage. Using this client, the server will wait until it receives the full audio file before transcribing it and sending it back to the client.
2. `riva_streaming_asr_client` for online usage. Using this client, the server will start transcribing after it receives a sufficient amount of audio data, "streaming" intermediate transcripts as it goes on back to the client. By default, it is set to transcribe after every `100ms`, this can be changed using the `--chunk_duration_ms` command line flag.
3. `riva_realtime_asr_client` for realtime (websocket) usage. This client establishes a persistent websocket connection to the server, allowing for bidirectional real-time communication. The server will start transcribing after it receives a sufficient amount of audio data and continuously stream intermediate transcripts back to the client as it processes the audio. By default, it is set to transcribe after every `100ms`, which can be changed using the `--chunk_duration_ms` command line flag.

To use the clients, simply pass in a folder containing audio files or an individual audio file name with the `audio_file` flag:
```
Expand All @@ -82,6 +84,10 @@ or
```
$ riva_asr_client --audio_file audio_folder
```
or
```
$ riva_realtime_asr_client --audio_file individual_audio_file.wav
```

Note that only single-channel audio files in the `.wav` format are currently supported.

Expand Down
8 changes: 8 additions & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -102,3 +102,11 @@ http_archive(
strip_prefix = "platforms-1.0.0",
sha256 = "852b71bfa15712cec124e4a57179b6bc95d59fdf5052945f5d550e072501a769",
)

http_archive(
name = "websocketpp",
urls = ["https://github.com/zaphoyd/websocketpp/archive/refs/tags/0.8.2.tar.gz"],
sha256 = "6ce889d85ecdc2d8fa07408d6787e7352510750daa66b5ad44aacb47bea76755",
strip_prefix = "websocketpp-0.8.2",
build_file = "//third_party:BUILD.websocketpp"
)
59 changes: 59 additions & 0 deletions riva/clients/realtime/BUILD
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""
Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
NVIDIA CORPORATION and its licensors retain all intellectual property
and proprietary rights in and to this software, related documentation
and any modifications thereto. Any use, reproduction, disclosure or
distribution of this software and related documentation without an express
license agreement from NVIDIA CORPORATION is strictly prohibited.
"""

package(
default_visibility = ["//visibility:public"],
)

cc_library(
name = "realtime_audio_client_lib",
srcs = [
"audio_chunks.cpp",
"base_client.cpp",
"realtime_client.cpp",
],
hdrs = [
"audio_chunks.h",
"base_client.h",
"realtime_client.h",
],
deps = [
"//riva/utils/wav:reader",
"//riva/utils/stats_builder:stats_builder_lib",
"@websocketpp//:websocketpp",
"@rapidjson//:rapidjson",
"@glog//:glog",
"@com_github_gflags_gflags//:gflags",
],
)

cc_binary(
name = "riva_realtime_asr_client",
srcs = ["riva_realtime_asr_client.cc"],
includes = ["-Irealtime"],
deps = [
":realtime_audio_client_lib",
"@websocketpp//:websocketpp",
"@rapidjson//:rapidjson",
"//riva/utils/stats_builder:stats_builder_lib",
"//riva/utils/wav:reader",
] + select({
"@platforms//cpu:aarch64": [
"@alsa_aarch64//:libasound"
],
"//conditions:default": [
"@alsa//:libasound"
],
}),
linkopts = [
"-lssl",
"-lcrypto",
"-lboost_system",
]
)
Loading