Skip to content

Commit a910bfb

Browse files
committed
Merge branch 'main' into add-openai-cua
1 parent af91984 commit a910bfb

File tree

70 files changed

+13781
-303
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

70 files changed

+13781
-303
lines changed

.github/workflows/code_format.yml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,19 @@ jobs:
1818
- name: Checkout Repository
1919
uses: actions/checkout@v4
2020

21-
- name: Set up Python
22-
uses: actions/setup-python@v5
21+
- name: Install uv
22+
uses: astral-sh/setup-uv@v4
2323
with:
24-
python-version: '3.11'
25-
cache: 'pip' # caching pip dependencies
24+
enable-cache: true
25+
26+
- name: Set up Python
27+
run: uv python install 3.11
2628

27-
- name: Pip install
28-
run: pip install -r requirements.txt
29+
- name: Install dependencies
30+
run: uv sync --frozen --extra dev
2931

30-
- name: Pip list
31-
run: pip list
32+
- name: List packages
33+
run: uv pip list
3234

3335
- name: Code Formatting
34-
run: black . --check --diff
36+
run: uv run black src/ --check --diff

.github/workflows/darglint.yml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,19 @@ jobs:
1818
- name: Checkout Repository
1919
uses: actions/checkout@v4
2020

21-
- name: Set up Python
22-
uses: actions/setup-python@v5
21+
- name: Install uv
22+
uses: astral-sh/setup-uv@v4
2323
with:
24-
python-version: '3.12'
25-
cache: 'pip' # caching pip dependencies
24+
enable-cache: true
25+
26+
- name: Set up Python
27+
run: uv python install 3.12 # this fails in 3.11
2628

27-
- name: Pip install
28-
run: pip install darglint
29+
- name: Install dependencies
30+
run: uv sync --frozen --extra dev
2931

30-
- name: Pip list
31-
run: pip list
32+
- name: List packages
33+
run: uv pip list
3234

3335
- name: Darglint checks
34-
run: darglint -v 2 -z short src/
36+
run: uv run darglint -v 2 -z short src/
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
name: Deploy Landing Page to GitHub Pages
2+
3+
on:
4+
# Runs on pushes targeting the default branch
5+
push:
6+
branches: ["main", "master"]
7+
paths:
8+
- 'docs/landing_page/**'
9+
10+
# Allows you to run this workflow manually from the Actions tab
11+
workflow_dispatch:
12+
13+
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
14+
permissions:
15+
contents: read
16+
pages: write
17+
id-token: write
18+
19+
# Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued.
20+
# However, do NOT cancel in-progress runs as we want to allow these production deployments to complete.
21+
concurrency:
22+
group: "pages"
23+
cancel-in-progress: false
24+
25+
jobs:
26+
# Build job
27+
build:
28+
runs-on: ubuntu-latest
29+
steps:
30+
- name: Checkout
31+
uses: actions/checkout@v4
32+
33+
- name: Setup Pages
34+
uses: actions/configure-pages@v4
35+
36+
- name: Upload artifact
37+
uses: actions/upload-pages-artifact@v3
38+
with:
39+
# Upload the landing page directory
40+
path: './docs/landing_page'
41+
42+
# Deployment job
43+
deploy:
44+
environment:
45+
name: github-pages
46+
url: ${{ steps.deployment.outputs.page_url }}
47+
runs-on: ubuntu-latest
48+
needs: build
49+
steps:
50+
- name: Deploy to GitHub Pages
51+
id: deployment
52+
uses: actions/deploy-pages@v4

.github/workflows/python_version_compatibility.yml

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,9 +32,12 @@ jobs:
3232
- name: Check Python ${{ matrix.python-version }}
3333
continue-on-error: true
3434
run: |
35-
export PATH="$HOME/.cargo/bin:$PATH"
36-
if uvx --python ${{ matrix.python-version }} --from python --with-requirements requirements.txt python -c "print('✅ Compatible')"; then
37-
echo "✅ Python ${{ matrix.python-version }} works"
38-
else
39-
echo "❌ Python ${{ matrix.python-version }} incompatible"
40-
fi
35+
export PATH="$HOME/.cargo/bin:$PATH"
36+
uv python install ${{ matrix.python-version }}
37+
if uv sync --frozen --python ${{ matrix.python-version }}; then
38+
uv run -p ${{ matrix.python-version }} python -c "import sys; print('✅ Compatible:', sys.version)"
39+
echo "✅ Python ${{ matrix.python-version }} works"
40+
else
41+
echo "❌ Python ${{ matrix.python-version }} incompatible"
42+
exit 1
43+
fi

.github/workflows/unit_tests.yml

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,23 +23,25 @@ jobs:
2323
- name: Set up Git user
2424
run: git config --global user.email "[email protected]" && git config --global user.name "GitHub Actions"
2525

26+
- name: Install uv
27+
uses: astral-sh/setup-uv@v4
28+
with:
29+
enable-cache: true
30+
2631
- name: Set up Python
27-
uses: actions/setup-python@v5
28-
with: # python at least 3.11
29-
python-version: '3.11'
30-
cache: 'pip' # caching pip dependencies
32+
run: uv python install 3.11
3133

3234
- name: Install AgentLab
33-
run: pip install -e .
35+
run: uv sync --frozen --extra dev
3436

35-
- name: Pip list
36-
run: pip list
37+
- name: List packages
38+
run: uv pip list
3739

3840
- name: Install Playwright
39-
run: playwright install chromium --with-deps
41+
run: uv run playwright install chromium --with-deps
4042

4143
- name: Download WebArena / VisualWebArena ressource files
42-
run: python -c 'import nltk; nltk.download("punkt_tab")'
44+
run: uv run python -c 'import nltk; nltk.download("punkt_tab")'
4345

4446
- name: Fetch MiniWob
4547
uses: actions/checkout@v4
@@ -59,9 +61,9 @@ jobs:
5961
run: curl -I "http://localhost:8080/miniwob/" || echo "MiniWob not reachable"
6062

6163
- name: Pre-download nltk ressources
62-
run: python -c "import nltk; nltk.download('punkt_tab')"
64+
run: uv run python -c "import nltk; nltk.download('punkt_tab')"
6365

6466
- name: Run AgentLab Unit Tests
6567
env:
6668
MINIWOB_URL: "http://localhost:8080/miniwob/"
67-
run: pytest -n 5 --durations=10 -m 'not pricy' -v tests/
69+
run: uv run pytest -n 5 --durations=10 -m 'not pricy' -v tests/

.readthedocs.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,8 @@ sphinx:
3232
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
3333
python:
3434
install:
35+
- method: pip
36+
path: .
37+
extra_requirements:
38+
- dev
3539
- requirements: docs/source/requirements.txt

README.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ AgentLab Features:
6161
| [GAIA](https://huggingface.co/spaces/gaia-benchmark/leaderboard) (soon) | - | - | None | - | - | live web | soon |
6262
| [Mind2Web-live](https://huggingface.co/datasets/iMeanAI/Mind2Web-Live) (soon) | - | - | None | - | - | live web | soon |
6363
| [MiniWoB](https://miniwob.farama.org/index.html) | [setup](https://github.com/ServiceNow/BrowserGym/blob/main/browsergym/miniwob/README.md) | 125 | Medium | 10 | no | self hosted (static files) | soon |
64-
| [OSWorld](https://os-world.github.io/) | [setup](https://github.com/ServiceNow/AgentLab/blob/main/src/agentlab/benchmarks/setup.md) | 369 | None | - | - | self hosted | soon |
64+
| [OSWorld](https://os-world.github.io/) | [setup](https://github.com/ServiceNow/AgentLab/blob/main/src/agentlab/benchmarks/osworld.md) | 369 | None | - | - | self hosted | soon |
6565

6666

6767
## 🛠️ Setup AgentLab
@@ -294,3 +294,41 @@ pip install hf-transfer
294294
pip install torch
295295
export HF_HUB_ENABLE_HF_TRANSFER=1
296296
```
297+
298+
299+
## 📝 Citing This Work
300+
301+
Please use the two following bibtex entries if you wish to cite AgentLab:
302+
303+
```tex
304+
@article{
305+
chezelles2025browsergym,
306+
title={The BrowserGym Ecosystem for Web Agent Research},
307+
author={Thibault Le Sellier de Chezelles and Maxime Gasse and Alexandre Lacoste and Massimo Caccia and Alexandre Drouin and L{\'e}o Boisvert and Megh Thakkar and Tom Marty and Rim Assouel and Sahar Omidi Shayegan and Lawrence Keunho Jang and Xing Han L{\`u} and Ori Yoran and Dehan Kong and Frank F. Xu and Siva Reddy and Graham Neubig and Quentin Cappart and Russ Salakhutdinov and Nicolas Chapados},
308+
journal={Transactions on Machine Learning Research},
309+
issn={2835-8856},
310+
year={2025},
311+
url={https://openreview.net/forum?id=5298fKGmv3},
312+
note={Expert Certification}
313+
}
314+
315+
@inproceedings{workarena2024,
316+
title = {{W}ork{A}rena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?},
317+
author = {Drouin, Alexandre and Gasse, Maxime and Caccia, Massimo and Laradji, Issam H. and Del Verme, Manuel and Marty, Tom and Vazquez, David and Chapados, Nicolas and Lacoste, Alexandre},
318+
booktitle = {Proceedings of the 41st International Conference on Machine Learning},
319+
pages = {11642--11662},
320+
year = {2024},
321+
editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix},
322+
volume = {235},
323+
series = {Proceedings of Machine Learning Research},
324+
month = {21--27 Jul},
325+
publisher = {PMLR},
326+
url = {https://proceedings.mlr.press/v235/drouin24a.html},
327+
}
328+
```
329+
330+
Here is an example of how they can be used:
331+
332+
```tex
333+
We use the AgentLab framework to run and manage our experiments \cite{workarena2024,chezelles2025browsergym}.
334+
```

docs/landing_page/README.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# AgentLab Landing Page
2+
3+
This is a research landing page for AgentLab built using the [Academic Project Page Template](https://github.com/eliahuhorwitz/Academic-project-page-template).
4+
5+
## Structure
6+
7+
```
8+
docs/landing_page/
9+
├── index.html # Main landing page
10+
├── projects/ # Individual project pages
11+
│ ├── browsergym.html # BrowserGym Ecosystem page
12+
│ ├── webarena.html # WebArena Evaluation page
13+
│ └── workarena.html # WorkArena Benchmark page
14+
└── static/ # Static assets
15+
├── css/ # Stylesheets
16+
├── js/ # JavaScript files
17+
└── images/ # Images and icons
18+
```
19+
20+
## Features
21+
22+
- **Responsive Design**: Built with Bulma CSS framework for mobile-friendly layouts
23+
- **Project Navigation**: Dropdown menu linking to individual project pages
24+
- **Academic Template**: Uses the popular academic project page template
25+
- **Interactive Elements**: Smooth scrolling, animations, and hover effects
26+
- **Multiple Projects**: Separate pages for BrowserGym, WebArena, and WorkArena
27+
- **Social Media Ready**: Includes meta tags for social sharing
28+
29+
## Usage
30+
31+
### Viewing Locally
32+
33+
1. Open `index.html` in a web browser
34+
2. Navigate between project pages using the dropdown menu
35+
3. All links to external resources (GitHub, arXiv, etc.) are functional
36+
37+
### Hosting
38+
39+
This page can be hosted on:
40+
- GitHub Pages
41+
- Netlify
42+
- Vercel
43+
- Any static site hosting service
44+
45+
### Customization
46+
47+
1. **Update Content**: Edit the HTML files to update project information
48+
2. **Add Images**: Replace placeholder images in `static/images/`
49+
3. **Add Projects**: Create new HTML files in `projects/` directory
50+
4. **Styling**: Modify `static/css/index.css` for custom styling
51+
52+
## Required Images
53+
54+
The following images should be added to `static/images/`:
55+
56+
1. **favicon.ico** - Site favicon (16x16 or 32x32 px)
57+
2. **agentlab_overview.png** - Main overview diagram for landing page
58+
3. **social_preview.png** - Social media preview image (1200x630 px)
59+
60+
Current placeholder images are provided as SVG files.
61+
62+
## Dependencies
63+
64+
The page uses CDN links for:
65+
- Bulma CSS Framework
66+
- FontAwesome Icons
67+
- jQuery
68+
- Academic Icons
69+
70+
No build process or installation required.
71+
72+
## Project Pages
73+
74+
### BrowserGym Ecosystem (`projects/browsergym.html`)
75+
- Paper: https://arxiv.org/abs/2412.05467
76+
- Code: https://github.com/ServiceNow/BrowserGym
77+
- Focus: Unified web agent research framework
78+
79+
### WebArena Evaluation (`projects/webarena.html`)
80+
- Website: https://webarena.dev/
81+
- Setup: BrowserGym integration
82+
- Focus: 812 realistic web tasks
83+
84+
### WorkArena Benchmark (`projects/workarena.html`)
85+
- Repository: https://github.com/ServiceNow/WorkArena
86+
- Focus: Enterprise-focused web agent evaluation
87+
- Levels: L1 (33 tasks), L2/L3 (341 tasks each)
88+
89+
## Deployment
90+
91+
### GitHub Pages (Automatic)
92+
93+
The landing page is automatically deployed to GitHub Pages when changes are pushed to the main branch. The deployment is handled by the GitHub Actions workflow in `.github/workflows/deploy-landing-page.yml`.
94+
95+
**Setup Steps:**
96+
97+
1. Go to your GitHub repository settings
98+
2. Navigate to "Pages" in the left sidebar
99+
3. Under "Source", select "GitHub Actions"
100+
4. The site will be available at: `https://[username].github.io/AgentLab/`
101+
102+
**Manual Trigger:**
103+
104+
You can manually trigger the deployment by going to the "Actions" tab in your GitHub repository and running the "Deploy Landing Page to GitHub Pages" workflow.
105+
106+
### Local Development Server
107+
108+
For local testing:
109+
110+
```bash
111+
cd docs/landing_page
112+
python3 -m http.server 8000
113+
# Visit http://localhost:8000
114+
```
115+
116+
## Contributing
117+
118+
To add a new project page:
119+
120+
1. Create a new HTML file in `projects/` directory
121+
2. Use existing project pages as templates
122+
3. Update the dropdown menu in `index.html`
123+
4. Add a project card to the main landing page
124+
5. Include appropriate links and metadata
125+
126+
## License
127+
128+
This template follows the Academic Project Page Template license (Creative Commons Attribution-ShareAlike 4.0 International License).
129+
130+
AgentLab is developed by ServiceNow Research and follows its respective licensing terms.

0 commit comments

Comments
 (0)