Project idea #688
Replies: 1 comment 1 reply
-
Hey @Radiant690 , You could use a combination of computer vision/LLMs or even multi-modal LLMs (also called VLMs for Vision Language Model). One thing you could do is have a computer vision model understand what's in the image and then use the result of the computer vision as the prompt interface to the LLM. For example, say you had a picture of a coke drink, you could go: Picture -> Computer vision model -> Output: "coke drink" -> Input to LLM: "Is it safe for a pregnant woman to consume {coke_drink}?" -> Output This could all be done through an interface with Gradio: https://www.gradio.app/ See this example using the LLaVA model with an image/chat interface: https://llava.hliu.cc/ (made with Gradio) ![]() |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am working on my major project for btech and want to make a innovative product (gradio WEBSITE ) using computer vision and deep learning with LLM functionality. Now see Let me make it more clear. I want to have an application in such a manner that the pregnant lady takes a photo or upload one of the food item or beverage. Now The LLM model provides a q/a interface such that the user can ask some questions about the item. Example : A pregnant lady is shopping an coke drink but feels unsure, so opens our application and takes a photo or uploads one, and in the later llm segment asks a question like what quantity of this item is good. another question is alternative to this item.
I would love to know more information on how to connect the cv module with the llm part.
I have taken both the pytorch and llm workshop project, but lack some clarity.
Would highly appreciate some voice 😊
Beta Was this translation helpful? Give feedback.
All reactions