Skip to content

Commit be4d7ac

Browse files
committed
UPDATE
1 parent fc79a00 commit be4d7ac

File tree

16 files changed

+662
-107
lines changed

16 files changed

+662
-107
lines changed

config/README.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Custom YAML Schema Configuration
2+
3+
This directory contains YAML schema files for custom data extraction. Each YAML file defines both the extraction schema and the system prompt in a single, organized format.
4+
5+
## Schema Structure
6+
7+
Each YAML schema file must contain the following required fields:
8+
9+
```yaml
10+
# Schema metadata
11+
name: "Schema Name"
12+
description: "Brief description of what this schema extracts"
13+
14+
# System prompt for the LLM
15+
system_prompt: |
16+
Detailed instructions for the LLM on how to extract data.
17+
This can be multiple lines and should provide clear guidance
18+
on what information to look for and how to handle edge cases.
19+
20+
# JSON Schema definition
21+
schema:
22+
type: object
23+
properties:
24+
field_name:
25+
type: string
26+
description: "Description of this field"
27+
# Add more properties as needed
28+
required: ["field_name"]
29+
additionalProperties: false
30+
```
31+
32+
## Available Commands
33+
34+
### List Available Custom Schemas
35+
```bash
36+
structured-output list-schemas
37+
```
38+
39+
### Extract Using Custom YAML Schema
40+
```bash
41+
structured-output extract-custom SCHEMA_NAME --text "Your text here"
42+
```
43+
44+
Or from a file:
45+
```bash
46+
structured-output extract-custom SCHEMA_NAME --input-file input.txt
47+
```
48+
49+
### Save Results
50+
```bash
51+
structured-output extract-custom SCHEMA_NAME --input-file input.txt --output results.json
52+
```
53+
54+
### List Predefined Templates
55+
```bash
56+
structured-output list-templates
57+
```
58+
59+
### Extract Using Predefined Template
60+
```bash
61+
structured-output extract job --text "Your job description here"
62+
structured-output extract recipe --text "Your recipe text here"
63+
```
64+
65+
## Example Usage
66+
67+
1. Create a new YAML schema file in this directory (e.g., `my_schema.yaml`)
68+
2. Define your schema structure following the format above
69+
3. List available schemas: `structured-output list-schemas`
70+
4. Use your schema: `structured-output extract-custom my_schema --text "Sample text"`
71+
72+
## Pre-built Custom Schemas
73+
74+
The following example schemas are included:
75+
76+
- `news_article.yaml` - Extract structured information from news articles
77+
- `product_review.yaml` - Extract structured information from product reviews
78+
- `customer_support.yaml` - Extract structured information from support tickets
79+
80+
## Schema Validation
81+
82+
The system automatically validates:
83+
- Required fields (name, description, system_prompt, schema)
84+
- JSON schema structure (must be type: object with properties)
85+
- YAML syntax correctness
86+
87+
## Configuration Options
88+
89+
- `--config-dir`: Specify a different directory for YAML schemas (default: `config/schemas`)
90+
- `--pretty`: Pretty print JSON output
91+
- `--no-save`: Don't save results to file, only print to stdout
92+
- `--output`: Specify custom output file path
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Customer Support Ticket Extraction Schema
2+
name: "Customer Support Ticket"
3+
description: "Extract structured information from customer support tickets or emails"
4+
5+
# System prompt for extraction
6+
system_prompt: |
7+
Extract structured information from the following customer support ticket or email.
8+
Focus on identifying:
9+
- Customer information and contact details
10+
- Issue category and priority level
11+
- Product or service affected
12+
- Detailed problem description
13+
- Resolution status and actions taken
14+
15+
If information is not explicitly mentioned, leave fields as null or empty.
16+
17+
# JSON Schema definition
18+
schema:
19+
type: object
20+
properties:
21+
ticket_id:
22+
type: ["string", "null"]
23+
description: "Support ticket ID if mentioned"
24+
customer_name:
25+
type: ["string", "null"]
26+
description: "Customer's name"
27+
customer_email:
28+
type: ["string", "null"]
29+
description: "Customer's email address"
30+
issue_category:
31+
type: ["string", "null"]
32+
description: "Category of the issue (billing, technical, account, etc.)"
33+
priority_level:
34+
type: ["string", "null"]
35+
description: "Priority level (low, medium, high, urgent)"
36+
product_service:
37+
type: ["string", "null"]
38+
description: "Product or service the issue relates to"
39+
issue_summary:
40+
type: string
41+
description: "Brief summary of the issue"
42+
detailed_description:
43+
type: string
44+
description: "Detailed description of the problem"
45+
steps_to_reproduce:
46+
type: array
47+
items:
48+
type: string
49+
description: "Steps to reproduce the issue if mentioned"
50+
resolution_status:
51+
type: ["string", "null"]
52+
description: "Current status (open, in-progress, resolved, closed)"
53+
actions_taken:
54+
type: array
55+
items:
56+
type: string
57+
description: "Actions taken to resolve the issue"
58+
escalation_needed:
59+
type: ["boolean", "null"]
60+
description: "Whether the issue needs escalation"
61+
required: ["ticket_id", "customer_name", "customer_email", "issue_category", "priority_level", "product_service", "issue_summary", "detailed_description", "steps_to_reproduce", "resolution_status", "actions_taken", "escalation_needed"]
62+
additionalProperties: false

config/schemas/news_article.yaml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# News Article Extraction Schema
2+
name: "News Article"
3+
description: "Extract structured information from news articles"
4+
5+
# System prompt for extraction
6+
system_prompt: |
7+
Extract structured information from the following news article.
8+
Focus on identifying:
9+
- Main headline and brief summary
10+
- Publication details (date and author if mentioned)
11+
- Geographic location mentioned in the news
12+
- Key people and organizations mentioned
13+
- News category and sentiment
14+
15+
If information is not explicitly mentioned, leave the field as null.
16+
17+
# JSON Schema definition
18+
schema:
19+
type: object
20+
properties:
21+
headline:
22+
type: string
23+
description: "Main headline of the news article"
24+
summary:
25+
type: string
26+
description: "Brief summary of the article"
27+
publication_date:
28+
type: ["string", "null"]
29+
description: "Publication date if mentioned"
30+
author:
31+
type: ["string", "null"]
32+
description: "Author name if mentioned"
33+
location:
34+
type: ["string", "null"]
35+
description: "Geographic location mentioned in the news"
36+
key_people:
37+
type: array
38+
items:
39+
type: string
40+
description: "Names of key people mentioned in the article"
41+
organizations:
42+
type: array
43+
items:
44+
type: string
45+
description: "Organizations or companies mentioned"
46+
category:
47+
type: ["string", "null"]
48+
description: "News category (politics, technology, sports, etc.)"
49+
sentiment:
50+
type: ["string", "null"]
51+
description: "Overall sentiment of the article (positive, negative, neutral)"
52+
required: ["headline", "summary", "publication_date", "author", "location", "key_people", "organizations", "category", "sentiment"]
53+
additionalProperties: false

config/schemas/product_review.yaml

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Product Review Extraction Schema
2+
name: "Product Review"
3+
description: "Extract structured information from product reviews"
4+
5+
# System prompt for extraction
6+
system_prompt: |
7+
Extract structured information from the following product review.
8+
Focus on identifying:
9+
- Product name and brand
10+
- Reviewer information and rating
11+
- Key features mentioned (pros and cons)
12+
- Purchase details if mentioned
13+
- Overall sentiment and recommendation
14+
15+
If information is not available, leave fields as null or empty arrays.
16+
17+
# JSON Schema definition
18+
schema:
19+
type: object
20+
properties:
21+
product_name:
22+
type: string
23+
description: "Name of the product being reviewed"
24+
brand:
25+
type: ["string", "null"]
26+
description: "Brand or manufacturer name"
27+
rating:
28+
type: ["number", "null"]
29+
description: "Numerical rating (e.g., out of 5 stars)"
30+
reviewer_name:
31+
type: ["string", "null"]
32+
description: "Name of the reviewer if mentioned"
33+
review_date:
34+
type: ["string", "null"]
35+
description: "Date of the review if mentioned"
36+
verified_purchase:
37+
type: ["boolean", "null"]
38+
description: "Whether this is a verified purchase"
39+
pros:
40+
type: array
41+
items:
42+
type: string
43+
description: "Positive aspects mentioned in the review"
44+
cons:
45+
type: array
46+
items:
47+
type: string
48+
description: "Negative aspects mentioned in the review"
49+
overall_sentiment:
50+
type: ["string", "null"]
51+
description: "Overall sentiment (positive, negative, neutral, mixed)"
52+
would_recommend:
53+
type: ["boolean", "null"]
54+
description: "Whether the reviewer would recommend the product"
55+
price_mentioned:
56+
type: ["string", "null"]
57+
description: "Price mentioned in the review if any"
58+
required: ["product_name", "brand", "rating", "reviewer_name", "review_date", "verified_purchase", "pros", "cons", "overall_sentiment", "would_recommend", "price_mentioned"]
59+
additionalProperties: false

examples/README.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Examples
2+
3+
This directory contains example files demonstrating the usage of the Structured Output Cookbook.
4+
5+
## Files
6+
7+
### `example_usage.py`
8+
A comprehensive Python script showing how to use the library programmatically with:
9+
- Predefined templates (job descriptions, recipes)
10+
- Custom YAML schemas
11+
- Schema validation
12+
13+
### `job_description.txt`
14+
Sample job description for testing job extraction template.
15+
16+
### `news_article.txt`
17+
Sample news article for testing custom news extraction schema.
18+
19+
### `recipe.txt`
20+
Sample recipe for testing recipe extraction template.
21+
22+
## Running Examples
23+
24+
### CLI Usage
25+
```bash
26+
# List available predefined templates
27+
structured-output list-templates
28+
29+
# List available custom schemas
30+
structured-output list-schemas
31+
32+
# Extract using predefined template
33+
structured-output extract recipe --input-file examples/recipe.txt --pretty
34+
35+
# Extract using custom YAML schema
36+
structured-output extract-custom news_article --input-file examples/news_article.txt --pretty
37+
```
38+
39+
### Programmatic Usage
40+
```bash
41+
# Run the comprehensive example
42+
python examples/example_usage.py
43+
```
44+
45+
## Creating Custom Schemas
46+
47+
1. Create a new YAML file in `config/schemas/`
48+
2. Follow the structure defined in `config/README.md`
49+
3. Test your schema with the CLI or programmatically
50+
51+
## Environment Setup
52+
53+
Make sure you have your OpenAI API key set:
54+
```bash
55+
export OPENAI_API_KEY="your-api-key-here"
56+
```

0 commit comments

Comments
 (0)