You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/blog/vision.md
+36-2Lines changed: 36 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Support vision input for the Planner
2
+
title: Vision input for the Planner
3
3
authors: liqli
4
4
date: 2025-03-13
5
5
---
@@ -31,7 +31,7 @@ To have this new role, you need to include it in the project configure file as f
31
31
The ImageReader role takes the path or the url of the image as input and prepares a response Post for the Planner role. As described [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/gpt-with-vision?tabs=rest) for Azure OpenAI API, if the image is local, ImageReader need to encode the image in base64 and pass it to the API. If the image is remote, ImageReader need to provide the url of the image.
32
32
The Planner role can then use the image information for various tasks.
33
33
34
-
## Example
34
+
## An example
35
35
36
36
Let's ask the agent to describe any uploaded image.
37
37
@@ -51,6 +51,40 @@ In the example above, the User talks to the agent in Web UI and uploads an image
51
51
TaskWeaver also support providing the image path in console mode, either using the `/load` command or just include
52
52
the image path in the input message.
53
53
54
+
## Extension
55
+
56
+
If you look into the implementation of the ImageReader role, you will find that it is quite simple.
57
+
The key logic is shown in the following code snippet:
0 commit comments