Skip to content

Commit fccf2fa

Browse files
committed
ujpdate doc
1 parent 586324a commit fccf2fa

File tree

1 file changed

+36
-2
lines changed

1 file changed

+36
-2
lines changed

website/blog/vision.md

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Support vision input for the Planner
2+
title: Vision input for the Planner
33
authors: liqli
44
date: 2025-03-13
55
---
@@ -31,7 +31,7 @@ To have this new role, you need to include it in the project configure file as f
3131
The ImageReader role takes the path or the url of the image as input and prepares a response Post for the Planner role. As described [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/gpt-with-vision?tabs=rest) for Azure OpenAI API, if the image is local, ImageReader need to encode the image in base64 and pass it to the API. If the image is remote, ImageReader need to provide the url of the image.
3232
The Planner role can then use the image information for various tasks.
3333

34-
## Example
34+
## An example
3535

3636
Let's ask the agent to describe any uploaded image.
3737

@@ -51,6 +51,40 @@ In the example above, the User talks to the agent in Web UI and uploads an image
5151
TaskWeaver also support providing the image path in console mode, either using the `/load` command or just include
5252
the image path in the input message.
5353

54+
## Extension
55+
56+
If you look into the implementation of the ImageReader role, you will find that it is quite simple.
57+
The key logic is shown in the following code snippet:
58+
59+
```python
60+
if image_url.startswith("http"):
61+
image_content = image_url
62+
attachment_message = f"Image from {image_url}."
63+
else:
64+
if os.path.isabs(image_url):
65+
image_content = local_image_to_data_url(image_url)
66+
else:
67+
image_content = local_image_to_data_url(os.path.join(self.session_metadata.execution_cwd, image_url))
68+
attachment_message = f"Image from {image_url} encoded as a Base64 data URL."
69+
70+
post_proxy.update_attachment(
71+
message=attachment_message,
72+
type=AttachmentType.image_url,
73+
extra={"image_url": image_content},
74+
is_end=True,
75+
)
76+
```
77+
78+
After the image url is obtained, the ImageReader role will encode the image in base64 if the image is local.
79+
Then, it will create an attachment in the response Post and pass the image content to the Planner role.
80+
To achieve this, the attachment is created with the type `AttachmentType.image_url` and the image content is
81+
passed as extra data with the key `image_url`.
82+
83+
Therefore, if we want to support other scenarios with vision input, we can extend the ImageReader role by adding more logic
84+
to handle different types of contents. One example is to support reading a document with text and images.
85+
We can add an attachment for each image in the document and pass the list of attachments to the Planner role.
86+
87+
5488

5589

5690

0 commit comments

Comments
 (0)