Welcome to OmniParser Discussions! #203
Replies: 8 comments 2 replies
-
|
Hello people. How are you? Can you guys point me in the right directions to integrate OmniParser with ChatGPT? Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
Hi! |
Beta Was this translation helpful? Give feedback.
-
|
I built a connector to use omniparserserver with this project which uses AI to help video games: https://github.com/ShipBit/wingman-ai Are you open to a PR to make using omniparser with the host computer (instead of a VM) possible? I assume not because it would have been easy enough for you all just to offer this option to begin with but if I’m wrong let me know. |
Beta Was this translation helpful? Give feedback.
-
|
Are there any hardware configuration requirements for running omniparser v2? What is the minimum configuration? |
Beta Was this translation helpful? Give feedback.
-
|
I just want to express how sad I am that you are using Conda. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I have deployed OmniParser V2 + Omnitool for a PoC for a production in my company. But seems like the OS automation is not working ideally. For example, "entering gmail.com in the URL bar and hit enter to access the website". But it cannot finish this command. I guess I may not entering a good prompt for this application? Any suggestions? Otherwise, I think we do need a prompt refining process to improve the app |
Beta Was this translation helpful? Give feedback.
-
|
Great, great work! One question: Can the model also detect activation states in an UI? Specifically, activation indicated by highlighting, icons (like dots, arrows, etc.), color, edges, and so on? I ran a couple of tests and interestingly, the model always managed to assign the same coloured box to the activated button. Is this coincidence or can I somehow access this in the output? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I’m a computer science student at KAIST. First of all, thank you for hosting this discussion session! It really motivated me to start writing down my small idea. As I recall, Omniparser-1 has three key limitations: Regarding the first two issues, would it be possible to address them by incorporating the context of the UI screen during training? From my perspective, each UI element is similar to a word in natural language—it varies in position and meaning. Given that modern LLMs effectively capture word context, could a similar approach be applied to UI elements? Specifically, I was thinking of using the following input data for training: Would this approach help mitigate these limitations? I’d love to hear your thoughts! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions