add documents

AlessiaYChen · AlessiaYChen · commit fbc1d4bf70c1 · 2025-12-19T13:57:55.000-08:00
diff --git a/services/net/auto-clipper/README.md b/services/net/auto-clipper/README.md
@@ -0,0 +1,24 @@
+# AutoClipper Service
+
+The AutoClipper service consumes clip requests from Kafka, normalizes audio, transcribes it with Azure Speech, and
+segments the transcript into clips using a boundary-aware LLM workflow boosted by station heuristics. Key concepts:
+
+- **Station profiles** (Config/Stations/*.yml) define language, sample rate, heuristic keywords, custom prompts, and
+  category mappings for weather/traffic/ads.
+- **Pipeline** (ClipProcessingPipeline) normalizes audio, transcribes via AzureSpeechTranscriptionService, and feeds
+  transcripts plus station config into ClipSegmentationService.
+- **Segmentation** uses Azure OpenAI to score story boundaries, merges in regex-based heuristics, snaps clips to transcript
+  sentences, and tags each clip with a category before AutoClipperManager creates content and uploads the media.
+
+## Development
+
+1. Update station YAMLs under Config/Stations (copy CKNW.yml as a starting point).
+2. Run dotnet build services/net/auto-clipper/TNO.Services.AutoClipper.csproj to verify changes.
+3. Use the harness (see tools/auto-clipper-harness/README.md) to manually validate segmentation on sample audio.
+
+## Configuration
+
+Important Service__ env vars:
+- Service__AzureSpeechKey / Service__AzureSpeechRegion
+- Service__LlmApiUrl, Service__LlmApiKey, Service__LlmDeployment, Service__LlmApiVersion
+- Service__StationConfigPath (optional override for station YAML directory)
diff --git a/tools/auto-clipper-harness/README.md b/tools/auto-clipper-harness/README.md
@@ -0,0 +1,23 @@
+# AutoClipper Harness
+
+The harness is a standalone console app that mirrors the AutoClipper pipeline for manual validation. It
+normalizes a local media file, runs Azure Speech transcription, feeds the transcript and station heuristics to the
+segmenter, and writes clips/transcripts/prompt debug files for inspection.
+
+## Usage
+
+`
+dotnet run --project tools/auto-clipper-harness -- <path-to-media> [language] [outputDir]
+`
+
+- Configure Azure keys and LLM settings via .env (see .env.sample).
+- Station profiles are loaded from services/net/auto-clipper/Config/Stations by default; override with
+  AUTOCLIP_HARNESS_STATION_PATH / AUTOCLIP_HARNESS_STATION.
+- Outputs: clip_XX.* media slices, clip_XX.txt transcripts, 	ranscript_full.txt, and
+  llm_prompt_debug.txt (shows numbered transcript, heuristics, and the final prompt).
+
+## Notes
+
+- The harness shares the segmentation logic with the service, so any changes in ClipSegmentationService
+  should be validated here first.
+- Ensure ffmpeg is available on PATH; the harness shells out to ffmpeg to produce media clips.