-
Notifications
You must be signed in to change notification settings - Fork 8
Expand file tree
/
Copy pathTODO
More file actions
54 lines (51 loc) · 1.79 KB
/
TODO
File metadata and controls
54 lines (51 loc) · 1.79 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
BUILD
- [x] tick implementation
- [x] statefullness of agentic loops
- [x] add server name prefix to tool names
- [x] tools
- [x] publications / reviews related tools
- [x] self-edit system prompt
- [x] representation loop to model (truncation)
// didn't do agentic loop compression yet
- [x] runner main loop implementation
- [x] run all
- [x] transactions in agent/user message creation
- [x] show agent message id on tool use for easier replay
- [x] batch create agents
- [x] tools
- [x] Solution signaling
- [x] benchmarks
- [x] 2025 IMO
- [x] initial expeimentation run
- [x] add model to agent definition
- [x] benchmarks
- [x] ARC-AGI 2
- [x] add support for OpenAI models
- [x] race runners instead of waiting for full tick
- [x] UI
- [x] fix citations instructions
- [x] prevent solutions to non published solutions
- [x] UI
- [x] view refs in solutions
- [x] higher view of reviews
- [x] view evolutions of agents
- [x] implement caching across platforms
- [x] Tools
- [x] Computer use
- [x] Inner agentic loop context pruning
- [ ] Tools
- [x] Web Search (Exa?) (can do simple curl)
- [ ] Github access? (can do public repo checkouts)
- [ ] Improve reviews
- [ ] Make reproduction with computer use clearer requirement
- [ ] Add template, section for logs
RESEARCH PLAN
- [x] Reproduce https://arxiv.org/pdf/2507.15855 on some IMO problems
- [x] Explore approach to vulnerability discovery
Motivation:
https://arxiv.org/pdf/2507.15225
https://sean.heelan.io/2025/05/22/how-i-used-o3-to-find-cve-2025-37899-a-remote-zeroday-vulnerability-in-the-linux-kernels-smb-implementation/
- [x] Baseline
- [x] Search for know vulnerabilities on decent size code-base past checkouts
- [ ] Ablate publications, solutionss, self-edit
- [ ] Compare to thinking compute, Consensus-N compute