|
| 1 | +# How to develop code with AI agents? |
| 2 | + |
| 3 | +Developing with AI agents is similar to being a team lead or senior developer. You are responsible for researching the problem and exploring potential solutions, as well as ensuring that the AI agents deliver exceptional quality by guiding them through code reviews and designing appropriate safeguards. |
| 4 | + |
| 5 | +## What we aim to get out of the AI agents |
| 6 | + |
| 7 | +- **Development Velocity**: Developing with AI agents is significantly faster. We want to be more ambitious regarding what we can ship in a given time frame. |
| 8 | + |
| 9 | +- **Higher Codebase Quality**: With the agents being so fast at adding/refactoring code and writing documentation, all software projects can now attain a significantly higher codebase quality at all times. There are no execues for not having: |
| 10 | + |
| 11 | + * Extremely well-factored code. |
| 12 | + * Comprehensive READMEs. |
| 13 | + * Fully-specified development environment (with support for GitHub codespaces/devcontainers). |
| 14 | + * Great CI setup for tracking both correctness and performance over time. |
| 15 | + * Fully automated processes for shipping software with great release notes. |
| 16 | + |
| 17 | + Your primary goal as a software engineer in this new world is to develop your understanding of all aspects of software quality and to act as an uncompromising supervisor for the AI agents. Please read the following as a start: |
| 18 | + |
| 19 | + [Guide on Software Quality from OpenAI DeepResearch](https://chatgpt.com/share/68332152-0750-8009-846d-1e1dd017fac3#:~:text=Report%20content) |
| 20 | + |
| 21 | + All tech debt should be addressed immediately upon being recognised. You may argue that our human-centric understanding of quality is no longer relevant when AI agents write the software, but we are taking a safer bet. We want to keep the code understandable and easy-to-debug by ourselves when things go wrong. |
| 22 | + |
| 23 | +- **Extremely Well-Tested Software**: Having safety nets is crucial for making the agents effective. They work with limited context and they have to build their understanding of the code from scratch for every task. |
| 24 | + |
| 25 | + The agents will routinely produce incorrect code. The best way to get productivity out of them is to let them run through the compile-test-fix cycle in a sandbox environment, without interruption, all the way until they get working code ready for review. |
| 26 | + |
| 27 | + The throughput of the agent and the probability of producing correct code then largely depend on two factors: |
| 28 | + |
| 29 | + * **Modularity of the software and its test suite**: The agents will work faster when they have clear instructions how to test smaller components that fit in their context. |
| 30 | + * **Extent of test coverage**: We want only correct code to pass the tests. Thus, the test suite should have high [Mutation Coverage](https://en.wikipedia.org/wiki/Mutation_testing). |
| 31 | + |
| 32 | + You have to find creative ways to design tests that cover the entirety of the software. With the speed of coding agents, certain tests that used to be too laborious to create are now practical. By relying on mocks, synthesized data, vision-enabled MCP servers and other emerging techniques, we should be able to overcome this challenge. |
| 33 | + |
| 34 | + Besides correctness, test suites should also detect regressions in dimensions such as performance, security, accessibility and others. |
| 35 | + |
| 36 | + To succeed, you must familiarize yourself with a wide range of techniques for detecting defects and other software quality issues. Exploring the following resources will help identify gaps in your knowledge: |
| 37 | + |
| 38 | + https://en.wikipedia.org/wiki/Static_program_analysis |
| 39 | + https://en.wikipedia.org/wiki/Software_testing |
| 40 | + |
| 41 | +## How does it work in practice right now? |
| 42 | + |
| 43 | +Coding agents work best when you ask them to make small incremental changes to your codebase. Please read the following guide to get the big picture: |
| 44 | + |
| 45 | +https://harper.blog/2025/02/16/my-llm-codegen-workflow-atm/ |
| 46 | + |
| 47 | +We use our own workflow that records most of the interactions with the agents in the git history of the project, so you can learn from the prompting strategies of your team mates. See the README below for more details: |
| 48 | + |
| 49 | +https://github.com/blocksense-network/agents-workflow |
| 50 | + |
| 51 | +As an example, you can see the prompts that were used to develop `agents-workflow` itself [here](https://github.com/blocksense-network/agents-workflow/tree/main/.agents/tasks/2025/05). |
| 52 | + |
| 53 | +Your project should have a defined workspace in Codex and a devcontainer setup that will be used when invoking agents locally (e.g. Codex CLI, GitHub Copilot, Goose, OpenHands, etc). |
| 54 | + |
| 55 | + |
| 56 | +### Planning phase |
| 57 | + |
| 58 | +1) Research |
| 59 | + |
| 60 | +For quicker/simpler questions, have a conversation with a thinking model with access to web search (e.g. Gemini 2.5 Pro, OpenAI gpt-5-thinking, Grok 4, etc) to ask questions related to the problem that you are solving. |
| 61 | + |
| 62 | +For deeper problems where a lot of details have to be considered, use Gemini Deep Research (available when you log in with your company email) or ChatGPT DeepResearch. |
| 63 | + |
| 64 | +2) Create a Development Plan |
| 65 | + |
| 66 | +TBD |
| 67 | + |
| 68 | +### Implementation phase |
| 69 | + |
| 70 | +1) Ask a thinking model to generate prompts according to the plan or start writing them yourself. |
| 71 | + |
| 72 | +3) Submit tasks with `aw task`. |
| 73 | + |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | + |
| 79 | + |
| 80 | + |
| 81 | + |
0 commit comments