Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
setup.sh		setup.sh
tsconfig.json		tsconfig.json
vimbot.ts		vimbot.ts
vision.ts		vision.ts

Repository files navigation

vimGPT.js

A TypeScript port of vimGPT.

Getting Started

Clone this Repo
Run npm i
Run ./setup.sh
Run npx tsx index.ts

How it works

This tool utilizes Vimium and Playwright to provide a visual browsing interface for GPT-4V (gpt-4-vision-preview) to act on.

By sending screenshots of the browser (with the Vimium overlay) to GPT, we can skip all the DOM parsing normally required when building web automations.

You start by specifying a task (e.g. "find a 30 watt lightbulb on Amazon") and then a series of prompts, each with an updated view of the browser, get passed along to GPT-4V for determining the best next task (or "action").

Next Steps

The initial setup works OK, but it's easy for GPT-4V to get stuck in a loop. This is certainly far from a production-ready implementation.

With that said, try it out and tweak it to your heart's content!

Recognition

Huge props to Ishan Shah for building the Python version of this utility 🙏