|
| 1 | +# Analyze and Label Link |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The `analyze_and_label` link is a powerful component of the vCon server that automatically analyzes dialog content and generates relevant labels/tags for categorization. It uses OpenAI's language models to process various dialog formats (transcripts, messages, chats, emails) and extract meaningful labels that are then applied as tags to the vCon. |
| 6 | + |
| 7 | +## How It Works |
| 8 | + |
| 9 | +1. The link retrieves a vCon from Redis storage |
| 10 | +2. For each dialog in the vCon, it checks if a source analysis (typically of type "transcript") is present |
| 11 | +3. It extracts the text content from the source analysis (from the specified location in the configuration) |
| 12 | +4. It sends the text to OpenAI's API with a customizable prompt |
| 13 | +5. It processes the API response to extract labels |
| 14 | +6. It adds the analysis as a new analysis object to the vCon |
| 15 | +7. It applies each extracted label as a tag to the vCon |
| 16 | + |
| 17 | +## Supported Dialog Formats |
| 18 | + |
| 19 | +The link is designed to handle various text formats that might appear in dialogs, including: |
| 20 | + |
| 21 | +- **Standard Transcripts**: Plain text transcripts of conversations |
| 22 | +- **Email Format**: Text with headers, subject, body, etc. |
| 23 | +- **Chat Format**: Text with timestamps and speaker identification |
| 24 | +- **Message Format**: Text with headers and body |
| 25 | + |
| 26 | +The link is able to intelligently process these different formats and extract appropriate labels regardless of the format. |
| 27 | + |
| 28 | +## Configuration Options |
| 29 | + |
| 30 | +The link accepts the following configuration options: |
| 31 | + |
| 32 | +| Option | Description | Default | |
| 33 | +|--------|-------------|--------| |
| 34 | +| `prompt` | The prompt sent to OpenAI for analysis | "Analyze this transcript and provide a list of relevant labels for categorization..." | |
| 35 | +| `analysis_type` | The type assigned to the analysis output | "labeled_analysis" | |
| 36 | +| `model` | The OpenAI model to use | "gpt-4-turbo" | |
| 37 | +| `sampling_rate` | Rate at which to run the analysis (1 = 100%, 0.5 = 50%, etc.) | 1 | |
| 38 | +| `temperature` | The temperature parameter for the OpenAI API | 0.2 | |
| 39 | +| `source.analysis_type` | The type of analysis to use as source | "transcript" | |
| 40 | +| `source.text_location` | The JSON path to the text within the source analysis | "body.paragraphs.transcript" | |
| 41 | +| `response_format` | Format specification for the OpenAI API response | `{"type": "json_object"}` | |
| 42 | +| `OPENAI_API_KEY` | The OpenAI API key (required but not defined in defaults) | None | |
| 43 | + |
| 44 | +## Usage Example |
| 45 | + |
| 46 | +```python |
| 47 | +from server.links.analyze_and_label import run |
| 48 | + |
| 49 | +# Run with default options (requires OPENAI_API_KEY in the options) |
| 50 | +run( |
| 51 | + vcon_uuid="your-vcon-uuid", |
| 52 | + link_name="analyze_and_label", |
| 53 | + opts={ |
| 54 | + "OPENAI_API_KEY": "your-openai-api-key", |
| 55 | + # Optionally override other defaults |
| 56 | + "prompt": "Identify key topics, sentiments, and issues in this conversation. Return your response as a JSON object with a single key 'labels' containing an array of strings.", |
| 57 | + "model": "gpt-3.5-turbo" |
| 58 | + } |
| 59 | +) |
| 60 | +``` |
| 61 | + |
| 62 | +## Customizing Label Generation |
| 63 | + |
| 64 | +You can customize the label generation process by modifying the `prompt` parameter. The prompt should instruct the model to return labels in a specific format - a JSON object with a "labels" key containing an array of strings. |
| 65 | + |
| 66 | +Example specialized prompts: |
| 67 | + |
| 68 | +- **Support Issues**: "Analyze this transcript and identify the specific support issues mentioned. Return your response as a JSON object with a single key 'labels' containing an array of issue categories." |
| 69 | +- **Sentiment Analysis**: "Analyze this conversation and identify the customer's sentiments and emotional states. Return your response as a JSON object with a single key 'labels' containing an array of sentiment descriptors." |
| 70 | +- **Product Mentions**: "Identify all products or services mentioned in this transcript. Return your response as a JSON object with a single key 'labels' containing an array of product names." |
| 71 | + |
| 72 | +## Error Handling |
| 73 | + |
| 74 | +The link includes robust error handling: |
| 75 | + |
| 76 | +- Exponential backoff retry mechanism for API calls |
| 77 | +- JSON parsing error handling |
| 78 | +- Logging of errors and performance metrics |
| 79 | + |
| 80 | +## Testing |
| 81 | + |
| 82 | +The link includes comprehensive tests for all functionality. To run the tests with actual OpenAI API calls (optional): |
| 83 | + |
| 84 | +```bash |
| 85 | +# Set environment variables |
| 86 | +export OPENAI_API_KEY="your-api-key" |
| 87 | +export RUN_OPENAI_ANALYZE_LABEL_TESTS=1 |
| 88 | + |
| 89 | +# Run the tests |
| 90 | +pytest server/links/analyze_and_label/tests/test_analyze_and_label.py |
| 91 | +``` |
| 92 | + |
| 93 | +Without setting `RUN_OPENAI_ANALYZE_LABEL_TESTS=1`, tests will run with mocked API responses. |
| 94 | + |
| 95 | +## Metrics and Monitoring |
| 96 | + |
| 97 | +The link emits several metrics for monitoring: |
| 98 | + |
| 99 | +- `conserver.link.openai.labels_added`: Number of labels added per run |
| 100 | +- `conserver.link.openai.analysis_time`: Time taken for analysis |
| 101 | +- `conserver.link.openai.json_parse_failures`: Count of JSON parsing failures |
| 102 | +- `conserver.link.openai.analysis_failures`: Count of overall analysis failures |
| 103 | + |
| 104 | +## Integration with vCon Structure |
| 105 | + |
| 106 | +The link integrates with the vCon structure in two ways: |
| 107 | + |
| 108 | +1. It adds a new analysis object with the `labeled_analysis` type (or the configured type) |
| 109 | +2. It adds tags to the vCon based on the extracted labels |
| 110 | + |
| 111 | +This allows for both structured access to the full analysis and quick filtering/categorization using the applied tags. |
0 commit comments