Enhance ask2api with image support and update README

atasoglu · atasoglu · commit a8f1b53a1caf · 2025-12-27T12:20:29.000+03:00
- Added functions to handle image input, including URL validation, MIME type detection, and base64 encoding.
- Updated the CLI to accept an image argument, allowing multimodal prompts combining text and images.
- Enhanced README.md to document new vision modality feature and provide usage examples for image analysis.
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # ask2api
 
 [![CI](https://github.com/atasoglu/ask2api/actions/workflows/pre-commit.yml/badge.svg)](https://github.com/atasoglu/ask2api/actions/workflows/pre-commit.yml)
-[![PyPI version](https://badge.fury.io/py/ask2api.svg)](https://badge.fury.io/py/ask2api)
+[![PyPI version](https://img.shields.io/pypi/v/ask2api)](https://pypi.org/project/ask2api/)
 [![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
@@ -21,6 +21,7 @@ Key features:
 - CLI first  
 - Prompt → API behavior  
 - No markdown, no explanations, only valid JSON  
+- Vision modality support
 - Designed for automation pipelines and AI-driven backend workflows
 
 ## Installation
@@ -37,9 +38,11 @@ export OPENAI_API_KEY="your_api_key"
 
 ## Usage
 
+### Text-only prompts
+
 Instead of asking:
 
-> *“Where is the capital of France?”*
+> *"Where is the capital of France?"*
 
 and receiving free-form text, you can do this:
 
@@ -56,12 +59,21 @@ And get a structured API response:
 }
 ```
 
+### Vision modality
+
+You can also analyze images and get structured JSON responses:
+
+```bash
+ask2api -p "Where is this place?" -sf schema.json -i https://upload.wikimedia.org/wikipedia/commons/6/64/Lesdeuxmagots.jpg
+```
+
 ## How it works
 
 1. You define the desired output structure using a JSON Schema.
-2. The schema is passed to the model using OpenAI’s `json_schema` structured output format.
+2. The schema is passed to the model using OpenAI's `json_schema` structured output format.
 3. The system prompt enforces strict JSON-only responses.
-4. The CLI prints the API-ready JSON output.
+4. For vision tasks, images are automatically encoded (base64 for local files) or passed as URLs.
+5. The CLI prints the API-ready JSON output.
 
 The model is treated as a deterministic API function.
 
diff --git a/ask2api.py b/ask2api.py
@@ -1,16 +1,65 @@
 import argparse
+import base64
 import json
+import mimetypes
 import os
 import requests
+from urllib.parse import urlparse
 
 API_KEY = os.getenv("OPENAI_API_KEY")
 OPENAI_URL = "https://api.openai.com/v1/chat/completions"
 
 
+def is_url(path):
+    """Check if the given path is a URL."""
+    try:
+        result = urlparse(path)
+        return all([result.scheme, result.netloc])
+    except Exception:
+        return False
+
+
+def get_image_mime_type(image_path):
+    """Get MIME type for an image file."""
+    mime_type, _ = mimetypes.guess_type(image_path)
+    if mime_type and mime_type.startswith("image/"):
+        return mime_type
+    # Fallback for common image extensions
+    ext = os.path.splitext(image_path)[1].lower()
+    mime_map = {
+        ".jpg": "image/jpeg",
+        ".jpeg": "image/jpeg",
+        ".png": "image/png",
+        ".gif": "image/gif",
+        ".webp": "image/webp",
+    }
+    return mime_map.get(ext, "image/jpeg")
+
+
+def encode_image(image_path):
+    """Encode image file to base64."""
+    with open(image_path, "rb") as image_file:
+        return base64.b64encode(image_file.read()).decode("utf-8")
+
+
+def prepare_image_content(image_path):
+    """Prepare image content for OpenAI API (either URL or base64 encoded)."""
+    if is_url(image_path):
+        return {"type": "image_url", "image_url": {"url": image_path}}
+    else:
+        base64_image = encode_image(image_path)
+        mime_type = get_image_mime_type(image_path)
+        return {
+            "type": "image_url",
+            "image_url": {"url": f"data:{mime_type};base64,{base64_image}"},
+        }
+
+
 def main():
     parser = argparse.ArgumentParser()
     parser.add_argument("-p", "--prompt", required=True)
     parser.add_argument("-sf", "--schema-file", required=True)
+    parser.add_argument("-i", "--image")
     args = parser.parse_args()
 
     with open(args.schema_file, "r", encoding="utf-8") as f:
@@ -25,11 +74,22 @@ def main():
     Never return markdown, comments or extra text.
     """
 
+    # Build user message content
+    if args.image:
+        # Multimodal content: text + image
+        user_content = [
+            {"type": "text", "text": args.prompt},
+            prepare_image_content(args.image),
+        ]
+    else:
+        # Text-only content
+        user_content = args.prompt
+
     payload = {
         "model": "gpt-4.1",
         "messages": [
             {"role": "system", "content": system_prompt},
-            {"role": "user", "content": args.prompt},
+            {"role": "user", "content": user_content},
         ],
         "response_format": {
             "type": "json_schema",