Skip to content

Commit a8f1b53

Browse files
committed
Enhance ask2api with image support and update README
- Added functions to handle image input, including URL validation, MIME type detection, and base64 encoding. - Updated the CLI to accept an image argument, allowing multimodal prompts combining text and images. - Enhanced README.md to document new vision modality feature and provide usage examples for image analysis.
1 parent 965a5f5 commit a8f1b53

File tree

2 files changed

+77
-5
lines changed

2 files changed

+77
-5
lines changed

README.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# ask2api
22

33
[![CI](https://github.com/atasoglu/ask2api/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/atasoglu/ask2api/actions/workflows/pre-commit.yml)
4-
[![PyPI version](https://badge.fury.io/py/ask2api.svg)](https://badge.fury.io/py/ask2api)
4+
[![PyPI version](https://img.shields.io/pypi/v/ask2api)](https://pypi.org/project/ask2api/)
55
[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
66
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
77

@@ -21,6 +21,7 @@ Key features:
2121
- CLI first
2222
- Prompt → API behavior
2323
- No markdown, no explanations, only valid JSON
24+
- Vision modality support
2425
- Designed for automation pipelines and AI-driven backend workflows
2526

2627
## Installation
@@ -37,9 +38,11 @@ export OPENAI_API_KEY="your_api_key"
3738

3839
## Usage
3940

41+
### Text-only prompts
42+
4043
Instead of asking:
4144

42-
> *Where is the capital of France?*
45+
> *"Where is the capital of France?"*
4346
4447
and receiving free-form text, you can do this:
4548

@@ -56,12 +59,21 @@ And get a structured API response:
5659
}
5760
```
5861

62+
### Vision modality
63+
64+
You can also analyze images and get structured JSON responses:
65+
66+
```bash
67+
ask2api -p "Where is this place?" -sf schema.json -i https://upload.wikimedia.org/wikipedia/commons/6/64/Lesdeuxmagots.jpg
68+
```
69+
5970
## How it works
6071

6172
1. You define the desired output structure using a JSON Schema.
62-
2. The schema is passed to the model using OpenAIs `json_schema` structured output format.
73+
2. The schema is passed to the model using OpenAI's `json_schema` structured output format.
6374
3. The system prompt enforces strict JSON-only responses.
64-
4. The CLI prints the API-ready JSON output.
75+
4. For vision tasks, images are automatically encoded (base64 for local files) or passed as URLs.
76+
5. The CLI prints the API-ready JSON output.
6577

6678
The model is treated as a deterministic API function.
6779

ask2api.py

Lines changed: 61 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,65 @@
11
import argparse
2+
import base64
23
import json
4+
import mimetypes
35
import os
46
import requests
7+
from urllib.parse import urlparse
58

69
API_KEY = os.getenv("OPENAI_API_KEY")
710
OPENAI_URL = "https://api.openai.com/v1/chat/completions"
811

912

13+
def is_url(path):
14+
"""Check if the given path is a URL."""
15+
try:
16+
result = urlparse(path)
17+
return all([result.scheme, result.netloc])
18+
except Exception:
19+
return False
20+
21+
22+
def get_image_mime_type(image_path):
23+
"""Get MIME type for an image file."""
24+
mime_type, _ = mimetypes.guess_type(image_path)
25+
if mime_type and mime_type.startswith("image/"):
26+
return mime_type
27+
# Fallback for common image extensions
28+
ext = os.path.splitext(image_path)[1].lower()
29+
mime_map = {
30+
".jpg": "image/jpeg",
31+
".jpeg": "image/jpeg",
32+
".png": "image/png",
33+
".gif": "image/gif",
34+
".webp": "image/webp",
35+
}
36+
return mime_map.get(ext, "image/jpeg")
37+
38+
39+
def encode_image(image_path):
40+
"""Encode image file to base64."""
41+
with open(image_path, "rb") as image_file:
42+
return base64.b64encode(image_file.read()).decode("utf-8")
43+
44+
45+
def prepare_image_content(image_path):
46+
"""Prepare image content for OpenAI API (either URL or base64 encoded)."""
47+
if is_url(image_path):
48+
return {"type": "image_url", "image_url": {"url": image_path}}
49+
else:
50+
base64_image = encode_image(image_path)
51+
mime_type = get_image_mime_type(image_path)
52+
return {
53+
"type": "image_url",
54+
"image_url": {"url": f"data:{mime_type};base64,{base64_image}"},
55+
}
56+
57+
1058
def main():
1159
parser = argparse.ArgumentParser()
1260
parser.add_argument("-p", "--prompt", required=True)
1361
parser.add_argument("-sf", "--schema-file", required=True)
62+
parser.add_argument("-i", "--image")
1463
args = parser.parse_args()
1564

1665
with open(args.schema_file, "r", encoding="utf-8") as f:
@@ -25,11 +74,22 @@ def main():
2574
Never return markdown, comments or extra text.
2675
"""
2776

77+
# Build user message content
78+
if args.image:
79+
# Multimodal content: text + image
80+
user_content = [
81+
{"type": "text", "text": args.prompt},
82+
prepare_image_content(args.image),
83+
]
84+
else:
85+
# Text-only content
86+
user_content = args.prompt
87+
2888
payload = {
2989
"model": "gpt-4.1",
3090
"messages": [
3191
{"role": "system", "content": system_prompt},
32-
{"role": "user", "content": args.prompt},
92+
{"role": "user", "content": user_content},
3393
],
3494
"response_format": {
3595
"type": "json_schema",

0 commit comments

Comments
 (0)