Skip to content

Commit fbc1d4b

Browse files
committed
add documents
1 parent 422d52a commit fbc1d4b

File tree

2 files changed

+47
-0
lines changed

2 files changed

+47
-0
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# AutoClipper Service
2+
3+
The AutoClipper service consumes clip requests from Kafka, normalizes audio, transcribes it with Azure Speech, and
4+
segments the transcript into clips using a boundary-aware LLM workflow boosted by station heuristics. Key concepts:
5+
6+
- **Station profiles** (Config/Stations/*.yml) define language, sample rate, heuristic keywords, custom prompts, and
7+
category mappings for weather/traffic/ads.
8+
- **Pipeline** (ClipProcessingPipeline) normalizes audio, transcribes via AzureSpeechTranscriptionService, and feeds
9+
transcripts plus station config into ClipSegmentationService.
10+
- **Segmentation** uses Azure OpenAI to score story boundaries, merges in regex-based heuristics, snaps clips to transcript
11+
sentences, and tags each clip with a category before AutoClipperManager creates content and uploads the media.
12+
13+
## Development
14+
15+
1. Update station YAMLs under Config/Stations (copy CKNW.yml as a starting point).
16+
2. Run dotnet build services/net/auto-clipper/TNO.Services.AutoClipper.csproj to verify changes.
17+
3. Use the harness (see tools/auto-clipper-harness/README.md) to manually validate segmentation on sample audio.
18+
19+
## Configuration
20+
21+
Important Service__ env vars:
22+
- Service__AzureSpeechKey / Service__AzureSpeechRegion
23+
- Service__LlmApiUrl, Service__LlmApiKey, Service__LlmDeployment, Service__LlmApiVersion
24+
- Service__StationConfigPath (optional override for station YAML directory)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# AutoClipper Harness
2+
3+
The harness is a standalone console app that mirrors the AutoClipper pipeline for manual validation. It
4+
normalizes a local media file, runs Azure Speech transcription, feeds the transcript and station heuristics to the
5+
segmenter, and writes clips/transcripts/prompt debug files for inspection.
6+
7+
## Usage
8+
9+
`
10+
dotnet run --project tools/auto-clipper-harness -- <path-to-media> [language] [outputDir]
11+
`
12+
13+
- Configure Azure keys and LLM settings via .env (see .env.sample).
14+
- Station profiles are loaded from services/net/auto-clipper/Config/Stations by default; override with
15+
AUTOCLIP_HARNESS_STATION_PATH / AUTOCLIP_HARNESS_STATION.
16+
- Outputs: clip_XX.* media slices, clip_XX.txt transcripts, ranscript_full.txt, and
17+
llm_prompt_debug.txt (shows numbered transcript, heuristics, and the final prompt).
18+
19+
## Notes
20+
21+
- The harness shares the segmentation logic with the service, so any changes in ClipSegmentationService
22+
should be validated here first.
23+
- Ensure ffmpeg is available on PATH; the harness shells out to ffmpeg to produce media clips.

0 commit comments

Comments
 (0)