Skip to content

Commit 32a4ad1

Browse files
ChrisJarkheiss-uwzoogreptile-apps[bot]
authored
Add docs for segment_audio param (#1818)
Co-authored-by: Kurt Heiss <kheiss@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
1 parent 588c472 commit 32a4ad1

File tree

4 files changed

+19
-3
lines changed

4 files changed

+19
-3
lines changed

client/src/nv_ingest_client/client/interface.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1070,6 +1070,9 @@ def extract(self, **kwargs: Any) -> "Ingestor":
10701070
- extract_charts: bool, extract charts (default True)
10711071
- extract_infographics: bool, extract infographics (default False)
10721072
- extract_page_as_image: bool, extract full page as image (default False)
1073+
- extract_audio_params: dict, audio extraction options such as
1074+
endpoint settings and `segment_audio` for sentence-like ASR
1075+
segmentation when using a hosted Parakeet service
10731076
- table_output_format: str, format for table output (default "markdown")
10741077
- auto_dedup: bool, auto-enable bbox deduplication when extracting both
10751078
structured elements and images (default True). Set to False to disable.

docs/docs/extraction/audio.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,13 @@ Use the following procedure to run the NIM locally.
8282
.extract(
8383
document_type="wav", # Ingestor should detect type automatically in most cases
8484
extract_method="audio",
85+
extract_audio_params={
86+
"segment_audio": True,
87+
},
8588
)
8689
)
8790
```
88-
91+
To generate one extracted element for each sentence-like ASR segment, include `extract_audio_params={"segment_audio": True}` when calling `.extract(...)`. This option applies when audio extraction runs with a Parakeet NIM (either locally through Docker or remotely via NVCF) but has no effect when using the local Hugging Face Parakeet model.
8992

9093
!!! tip
9194

@@ -117,6 +120,7 @@ Instead of running the pipeline locally, you can use NVCF to perform inference b
117120
"auth_token": "<API key>",
118121
"function_id": "<function ID>",
119122
"use_ssl": True,
123+
"segment_audio": True,
120124
},
121125
)
122126
)

docs/docs/extraction/nv-ingest-python-api.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -549,11 +549,13 @@ ingestor = Ingestor().files("audio_file.mp3")
549549
550550
ingestor = ingestor.extract(
551551
document_type="mp3",
552+
extract_method="audio",
552553
extract_text=True,
553554
extract_tables=False,
554555
extract_charts=False,
555556
extract_images=False,
556557
extract_infographics=False,
558+
extract_audio_params={"segment_audio": True},
557559
).split(
558560
tokenizer="meta-llama/Llama-3.2-1B",
559561
chunk_size=150,
@@ -563,8 +565,7 @@ ingestor = ingestor.extract(
563565
564566
results = ingestor.ingest()
565567
```
566-
567-
568+
Set extract_audio_params={"segment_audio": True} to output sentence-like audio segments as distinct extracted elements. This setting applies only when audio extraction runs through a hosted Parakeet endpoint—such as the Parakeet ASR NIM or NVCF—and has no effect when using the local Hugging Face Parakeet model.
568569

569570
## Related Topics
570571

docs/docs/extraction/python-api-reference.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -641,11 +641,13 @@ ingestor = Ingestor().files("audio_file.mp3")
641641
642642
ingestor = ingestor.extract(
643643
document_type="mp3",
644+
extract_method="audio",
644645
extract_text=True,
645646
extract_tables=False,
646647
extract_charts=False,
647648
extract_images=False,
648649
extract_infographics=False,
650+
extract_audio_params={"segment_audio": True},
649651
).split(
650652
tokenizer="meta-llama/Llama-3.2-1B",
651653
chunk_size=150,
@@ -656,6 +658,12 @@ ingestor = ingestor.extract(
656658
results = ingestor.ingest()
657659
```
658660

661+
Set `extract_audio_params={"segment_audio": True}` to emit sentence-like
662+
audio segments as separate extracted elements. This option only takes effect
663+
when audio extraction is performed through a hosted Parakeet endpoint--such as the
664+
Parakeet ASR NIM or NVCF--and does not affect behavior when using the local Hugging
665+
Face Parakeet model.
666+
659667

660668

661669
## Related Topics

0 commit comments

Comments
 (0)