Skip to content

Discussion topics #4

@Sophist-UK

Description

@Sophist-UK

These are really a discussion topics, but discussions aren't enabled in this repo. And apologies in advance that this is a bit of a brain dump.

I have my own vision of what I am looking for, which in simple terms is a pre-existing open-source combination of 3 functionalities:

  1. Agentic implementation of a SDLC - I am less bothered whether it is Waterfall or Agile, just that it is best practice and scripted for AI to follow. This is about process for taking a one-line requirement, fleshing it out into a detailed requirements doc, turning that into a detailed technical design, and then planning out how to break all of that into smaller steps that can be built and tested before bringing everything back together (recomposition) into the system as a whole.

  2. A workflow CYCLE that starts with a single piece of the decomposition and creates it using defined steps, typically three:

    a. Analysis & design - take the requirements and technical design for this piece and working out how to deliver it using TDD. If it cannot be delivered and tested in a single step, then the Analysis step can decompose it into further smaller separate parts which are executed recursively. But in essence it defines tests to be written that will need to be passed but all of which fail at that point because the code that will successfully pass the tests hasn't been written, and code to be written that will need to make these tests pass.

    b. Build tests

    c. Verify that these tests all fail but that code reviews pass - loop back to b. to fix any issues, but backout and try again with a different strategy if they can't be fixed in a few cycles or escalate for human input.

    d. Build code

    e. Verify that these tests and e.g. automated code review tests now all pass - loop back to d. to fix any issues, but backout and try again with a different strategy if they can't be fixed in a few cycles or escalate for human input.

    f. Clean-up - which can be anything which doesn't change the code - so code linting, or documentation generation or git commits or reflection on whether there are lessons that can be learned and fed back into the process etc.

    NOTE: This is only vaguely a Ralph Wiggum loop - it is in the sense that it understands that the first run may not be perfect, but it isn't in the sense that it uses several different strategies for avoiding doom loops - because you cannot have a Dark Factory if you have any doom loops and because any sort of loop can soak up resources and increase AI costs hugely.

    I have a few thoughts on how to make this workflow cycle efficient:

    A. AI should only be used where needed. If something can be done using normal scripts / algorithmic code then it should do so to conserve AI resources.

    B. Every step should produce structured output that can be used for control / decisions, for audit of what was done and why etc. You shouldn't need to run an extra AI step to make decisions - any AI inference needed to give this structured input should be part of this step and the need for another AI step to make a decision should be avoided in the interests of efficiency. The structured output should include enumerated fields that have a limited set of pre-defined and known values that can be used to tell the workflow engine what to run next etc.

    C. Each AI invocation should aim to create perfection in the first shot - because IME having an AI do rework on generated code or documents is way way harder than having it do the generation in the first place. But if there has to be rework, then this needs a special workflow to make such rework efficient and avoid doom loops.

    D. We need tight context control to get great quality at each step. We need specialised workflows for different types of tasks and different steps in each task.

    E. To keep context under tight control, you need to avoid feeding the AI the entire code base each time. So you need a tool to answer questions about the existing code base, about how a piece of the design is supposed to interface with a different piece of the design etc. You probably want to record decisions so that other later workflows have access to them. For each workflow you want to save the detailed output for later analysis, but when you do that analysis you might want to focus on the summaries first and only deep dive when necessary. For all of these *.md flat files don't feel like the write solution - you probably want some type of associative memory i.e. graph or vector based. But equally you need these memories to be reviewable by humans using a tool. But equally, the key status of an individual workflow probably needs to be human viewable from the start and the AI probably needs to understand the full context of these files so they are suitable for *.md.

    F. We will want excellent observability and reviewability - because both AI and human review of efficiencies and effectiveness will be absolutely key to reducing the proportion of human input required.

    G. If we want parallelism for agents them we will need to plan for this from the start, specifically: i) giving each parallel agent its own version of the codebase ideally using an overlay file system or a file system with block cloning, ii) each parallel agent its own git branch; iii) having separate agent control files for each agent etc. etc.

    H. I am unclear exactly what memory is needed for this. I can see no reason why the control documentation couldn't be in human readable .md files. Access to the code base, language details and coding standards should IMO be using tools. I have a gut feeling that a graph/vector memory would also be useful (though much more useful if you are creating an AI assistant like OpenClaw) but I haven't identified a concrete need - but if you go looking there are far more AI memory systems that have been written than you could imagine there would be.

    I. I suspect that there will need to be some prompt / context engineering expertise applied both to workflows to keep the AI focused, stop it hallucinating, finding the optimum way for the workflow to create quality results and not to go past the point of maximum quality / value by "worrying" too much about the code it has just written but not yet output etc. etc. I am as yet unclear whether different models will need different prompt styles and if so how this would be handled.

  3. Separately, a standardised set of tools MCP, LSP etc. for specific types of coding e.g. Laravel.

So we are separating out the agentic orchestration from the SDLC from the coding language.

I believe that this is what the DarkFactory is aiming to achieve - we have shared visions of what is needed, and I believe that there are many many many many other people out there who want the same things, some of whom and also trying to invent this by themselves, and some who want to spend their time creating their actual app and not spend it creating the tools to create their app.

I believe that if everyone worked on a single solution together, we could get a result much faster.

I also believe that there are a lot of potential partial solutions out there if only you can find them.

For example, Pi.dev is a highly extensive agent orchestration infrastructure. But whilst it might give a foundation, it isn't yet exactly what is needed.

There are several sets of workflows / agents / skills for doing some of this stuff, but not architected to work as a whole across the whole lifecycle:

  • Pi.dev and its extensions
  • BMAD
  • Goose
  • Beads
  • GSD
  • OpenSpec
  • Clause skills and plugins
  • MCP servers
  • etc. etc. etc.

But right now the choice of these bits and pieces is overwhelming and they are not integrated into a whole - and each user needs to go through the selection and integration process themselves if they want an integrated environment.

TL;DR: How can the ChrisRoyse team and all the other people wanting something similar work together to create this once?

P.S. Pi.dev does seem to me to be a foundational base to build upon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions