compspec
diff --git a/‎README.md
Lines changed: 1 addition & 38 deletions b/‎README.md
Lines changed: 1 addition & 38 deletions
diff --git a/‎examples/agent/Dockerfile
Lines changed: 71 additions & 123 deletions b/‎examples/agent/Dockerfile
Lines changed: 71 additions & 123 deletions
diff --git a/‎examples/agent/README.md
Lines changed: 45 additions & 6 deletions b/‎examples/agent/README.md
Lines changed: 45 additions & 6 deletions
@@ -21,44 +21,7 @@ This part of the library is under development. There are three kinds of agents:
 The design is simple in that each agent is responding to state of error vs. success. In the case of a step agent, the return code determines to continue or try again. In the case of a helper, the input is typically an erroneous response (or something that needs changing) with respect to a goal.
 For a manager, we are making a choice based on a previous erroneous step.
 
-See [examples/agent](examples/agent) for an example.
-
-#### To do items
-
-- refactor manager to not handle prompt, just get step when retries come back.
-- then need to decide how to handle kubernetes job creating additional structures.
-- Get basic runner working
-- Add in ability to get log and optimize - the manager will need to use goal
-- We likely want the manager to be able to edit the prompt.
- - should be provided with entire prompt?
-- When pod pending, it can be due to resource issues (and will never start). Right now we will time out, but we should be able to catch that earlier.
-
-#### Research Questions
-
-**And experiment ideas**
-
-- How do we define stability?
-- What are the increments of change (e.g., "adding a library")? We should be able to keep track of times for each stage and what changed, and an analyzer LLM can look at result and understand (categorize) most salient contributions to change.
-  - We also can time the time it takes to do subsequent changes, when relevant. For example, if we are building, we should be able to use cached layers (and the build times speed up) if the LLM is changing content later in the Dockerfile.
-- We can also save the successful results (Dockerfile builds, for example) and compare for similarity. How consistent is the LLM?
-- How does specificity of the prompt influence the result?
-- For an experiment, we would want to do a build -> deploy and successful run for a series of apps and get distributions of attempts, reasons for failure, and a general sense of similarity / differences.
-- For the optimization experiment, we'd want to do the same, but understand gradients of change that led to improvement.
-
-#### Observations
-
-- Specifying cpu seems important - if you don't it wants to do GPU
-- If you ask for a specific example, it sometimes tries to download data (tell it where data is)
-- Always include common issues in the initial prompt
-- If you are too specific about instance types, it adds node selectors/affinity, and that often doesn't work.
-
-#### Ideas
-
-- The manager agent is currently generated an updated prompt AND choosing the step.
- - Arguably we should have a separation of responsibility so a step can ask to fix an error without a manager.
-- I think we need one more level of agent - a step agent should have helper agents that can:
- - take an error message and analyze to get a fix.
-
+See [examples/agent](examples/agent) for an example, along with observations, research questions, ideas, and experiment brainstorming!
 
 ### Job Specifications
 
 
@@ -1,123 +1,71 @@
-# Dockerfile for LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator)                                   
-# Target: Production HPC environment on Google Cloud                  
-# Strategy: Multi-stage build for a lean final image with MPI support.
-                                                                      
-# Use ARGs at the top to easily update versions of key components globally
-ARG LAMMPS_VERSION=stable_2Aug2023                                    
-ARG OPENMPI_VERSION=4.1.6                                             
-                                                                      
-# =====================================================================
-# Stage 1: Builder                                                    
-# This stage compiles Open MPI and LAMMPS from source. It contains all
-# the build-time dependencies, which will be discarded later.         
-# =====================================================================
-FROM debian:bullseye AS builder                                       
-                                                                      
-# Inherit ARGs from the global scope                                  
-ARG LAMMPS_VERSION                                                    
-ARG OPENMPI_VERSION                                                   
-                                                                      
-# Set environment variables for the Open MPI build location and path  
-ENV OMPI_DIR=/opt/openmpi-${OPENMPI_VERSION}                          
-ENV PATH=$OMPI_DIR/bin:$PATH                                          
-ENV LD_LIBRARY_PATH=$OMPI_DIR/lib                                     
-                                                                      
-# Prevent interactive prompts during package installation             
-ENV DEBIAN_FRONTEND=noninteractive                                    
-                                                                      
-# Install essential build tools and libraries for both Open MPI and LAMMPS      
-# Added ca-certificates to allow git and wget to verify SSL certificates securely.
-RUN apt-get update && apt-get install -y --no-install-recommends \    
-    build-essential \                                                 
-    ca-certificates \                                                 
-    cmake \                                                           
-    g++ \                                                             
-    gfortran \                                                        
-    git \                                                             
-    libevent-dev \                                                    
-    libhwloc-dev \                                                    
-    wget \                                                            
-    && rm -rf /var/lib/apt/lists/*                                    
-                                                                      
-# --- Build Open MPI from source ---                                  
-# Building from source gives control over the configuration, crucial for        
-# containerized HPC environments. We enable PMIx for modern process management. 
-WORKDIR /tmp                                                          
-RUN wget https://download.open-mpi.org/release/open-mpi/v${OPENMPI_VERSION%.*}/openmpi-${OPENMPI_VERSION}.tar.gz && \ 
-    tar -xzf openmpi-${OPENMPI_VERSION}.tar.gz                        
-                                                                      
-WORKDIR /tmp/openmpi-${OPENMPI_VERSION}                               
-RUN ./configure \                                                     
-    --prefix=${OMPI_DIR} \                                            
-    --with-pmix \                                                     
-    --disable-pty-support                                             
-RUN make -j$(nproc) all && make install                               
-                                                                      
-# --- Build LAMMPS from source ---                                    
-# Clone a specific stable release tag for reproducibility.            
-WORKDIR /opt                                                          
-RUN git clone --depth 1 --branch ${LAMMPS_VERSION} https://github.com/lammps/lammps.git lammps                        
-                                                                      
-# Use CMake to configure the LAMMPS build. Enable common packages.    
-WORKDIR /opt/lammps/build                                             
-RUN cmake ../cmake \                                                  
-    -D CMAKE_INSTALL_PREFIX=/usr/local \                              
-    -D BUILD_MPI=yes \                                                
-    -D PKG_KSPACE=yes \                                               
-    -D PKG_MOLECULE=yes \                                             
-    -D PKG_RIGID=yes \                                                
-    -D PKG_MANYBODY=yes \                                             
-    -D PKG_REPLICA=yes \                                              
-    -D CMAKE_BUILD_TYPE=Release \                                     
-    -D LAMMPS_EXCEPTIONS=yes                                          
-                                                                      
-# Compile and install LAMMPS                                          
-RUN make -j$(nproc) && make install                                   
-                                                                      
-# =====================================================================
-# Stage 2: Final Image                                                
-# This stage creates the lean, final image. It starts from a minimal  
-# base and only copies the necessary executables, libraries, and runtime        
-# dependencies from the builder stage.                                
-# =====================================================================
-FROM debian:bullseye-slim                                             
-                                                                      
-# Inherit ARG for version consistency                                 
-ARG OPENMPI_VERSION                                                   
-                                                                      
-# Set environment variables for Open MPI runtime                      
-ENV OMPI_DIR=/opt/openmpi-${OPENMPI_VERSION}                          
-ENV PATH=/usr/local/bin:$OMPI_DIR/bin:$PATH                           
-                                                                      
-# Install only the essential runtime dependencies.                    
-# libgfortran5 is required by the Fortran-compiled parts of LAMMPS.   
-RUN apt-get update && apt-get install -y --no-install-recommends \    
-    libevent-2.1-7 \                                                  
-    libgfortran5 \                                                    
-    libhwloc15 \                                                      
-    && rm -rf /var/lib/apt/lists/*                                    
-                                                                      
-# Copy the compiled Open MPI installation from the builder stage      
-COPY --from=builder ${OMPI_DIR} ${OMPI_DIR}                           
-                                                                      
-# Copy the entire LAMMPS installation (binary, libs, potentials) from the builder stage                               
-COPY --from=builder /usr/local /usr/local                             
-                                                                      
-# Configure the dynamic linker to find Open MPI and LAMMPS libraries. 
-# This is more robust than setting LD_LIBRARY_PATH.                   
-RUN echo "${OMPI_DIR}/lib" > /etc/ld.so.conf.d/openmpi.conf && \      
-    echo "/usr/local/lib" > /etc/ld.so.conf.d/lammps.conf && \        
-    ldconfig                                                          
-                                                                      
-# Create a dedicated, non-root user for running the application for security    
-RUN useradd --create-home --shell /bin/bash lammps                    
-USER lammps                                                           
-WORKDIR /home/lammps                                                  
-                                                                      
-# Set the entrypoint to the LAMMPS executable.                        
-# Allows running the container with LAMMPS args directly, e.g., `docker run <image> -in in.lj`                        
-ENTRYPOINT ["lmp"]                                                    
-                                                                      
-# Provide a default command to display help if no other args are provided.      
-CMD ["-h"] 
-# Generated by fractale build agent
+# Base image: Ubuntu 22.04 LTS for a stable and recent environment
+FROM ubuntu:22.04
+
+# Set non-interactive frontend for package managers to avoid prompts
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Configure OpenMPI for containerized environments
+# Allow running MPI as root, a requirement for this specific Dockerfile
+ENV OMPI_ALLOW_RUN_AS_ROOT=1
+ENV OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1
+# Force components to work over TCP, common in container orchestrators
+ENV OMPI_MCA_btl=self,tcp
+ENV OMPI_MCA_pml=ob1
+ENV OMPI_MCA_btl_tcp_if_include=eth0
+ENV OMPI_MCA_oob_tcp_if_include=eth0
+
+# Install build dependencies, git, cmake, and MPI libraries
+# Added python3 to satisfy the LAMMPS cmake build system dependency
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends \
+    build-essential \
+    cmake \
+    git \
+    wget \
+    ca-certificates \
+    g++ \
+    openmpi-bin \
+    libopenmpi-dev \
+    libfftw3-dev \
+    python3 && \
+    rm -rf /var/lib/apt/lists/*
+
+# Clone, build, and install LAMMPS
+# Using 'develop' branch as 'master' is no longer a valid branch in the LAMMPS repository
+# A selection of common CPU packages are enabled, including REAXFF as requested
+RUN git clone --depth 1 -b develop https://github.com/lammps/lammps.git /lammps && \
+    cd /lammps && \
+    mkdir build && \
+    cd build && \
+    cmake ../cmake \
+        -D CMAKE_INSTALL_PREFIX=/usr/local \
+        -D BUILD_MPI=yes \
+        -D BUILD_OMP=yes \
+        -D PKG_KSPACE=yes \
+        -D PKG_MOLECULE=yes \
+        -D PKG_RIGID=yes \
+        -D PKG_MANYBODY=yes \
+        -D PKG_REAXFF=yes \
+        -D PKG_MISC=yes \
+        -D PKG_EXTRA-COMPUTE=yes \
+        -D PKG_EXTRA-DUMP=yes \
+        -D PKG_EXTRA-FIX=yes \
+        -D PKG_EXTRA-MOLECULE=yes && \
+    make -j$(nproc) && \
+    make install
+
+# Set the working directory for the container
+WORKDIR /data
+
+# Copy the requested example files into the working directory
+# These files can be used for initial testing or as templates
+RUN cp /lammps/examples/reaxff/HNS/* /data/ && \
+    # Clean up the source code to reduce final image size
+    rm -rf /lammps
+
+# Set the default entrypoint to the LAMMPS executable
+# The executable is on the PATH due to the CMAKE_INSTALL_PREFIX
+ENTRYPOINT ["lmp"]
+
+# Default command can be overridden, e.g., docker run <image> -in in.script
+CMD ["-h"]
@@ -10,9 +10,10 @@ The build agent will use the Gemini API to generate a Dockerfile and then build
 Here is how to first ask the build agent to generate a lammps container for Google cloud.
 
 ```bash
-fractale agent build lammps --environment "google cloud CPU" --outfile Dockerfile.lammps
+fractale agent build lammps --environment "google cloud CPU" --outfile Dockerfile --details "Ensure all globbed files from examples/reaxff/HNS from the root of the lammps codebase are in the WORKDIR. Clone the latest branch of LAMMPS."
 ```
 
+Note that we are specific about the data and using CPU, which is something the builder agent would have to guess.
 That might generate the [Dockerfile](Dockerfile) here, and a container that defaults to the application name "lammps"
 
 ### Kubernetes Job
@@ -27,9 +28,20 @@ kind load docker-image lammps
 To start, we will assume a kind cluster running and tell the agent the image is loaded into it (and so the pull policy will be never). 
 
 ```bash
-fractale agent kubernetes-job lammps --environment "google cloud CPU" --context-file ./Dockerfile --no-pull
+fractale agent kubernetes-job lammps --environment "google cloud CPU" --context-file ./Dockerfile --no-pull --details "Run in.reaxff.hns in the pwd with lmp" --outfile ./job.yaml
 ```
 
+## With Cache
+
+The same steps can be run using a cache. This will save to a deterministic path in the present working directory, and means that you can run steps a la carte, and run a workflow later to re-use the context (and not wait again).
+Note that when you save a cache, you often don't need to save the output file, because it will be the result in the context.
+
+```bash
+fractale agent build lammps --environment "google cloud CPU"  --details "Ensure all globbed files from examples/reaxff/HNS from the root of the lammps codebase are in the WORKDIR. Clone the latest branch of LAMMPS." --use-cache
+```
+
+And then try running with the manager (below) with the cache to see it being used.
+
 ## Manager
 
 Let's run with a manager. Using a manager means we provide a plan along with a goal. The manager itself takes on a similar structure to a step agent, but it has a high level goal. The manager will follow the high level structure of the plan, and step
@@ -42,10 +54,37 @@ try again.
 
 ```bash
 fractale agent --plan ./plans/run-lammps.yaml
+
+# or try using with the cache
+fractale agent --plan ./plans/run-lammps.yaml --use-cache
 ```
 
-For this first design, we are taking an approach where we only re-assess the state and go back to a previous step given a last step failure. The assumption is that if a previous step fails, we keep trying until it succeeds. We only need to backtrack if the last step in a sequence is not successful, and it is due to failure at some stage in the process. But I do think we have a few options:
+We haven't hit the case yet where the manager needs to take over - that needs further development, along with being goal oriented (e.g., parsing a log and getting an output). 
+
+## Notes
+
+#### To do items
+
+- Figure out optimization agent (with some goal)
+
+#### Research Questions
+
+**And experiment ideas**
+
+- How do we define stability?
+- What are the increments of change (e.g., "adding a library")? We should be able to keep track of times for each stage and what changed, and an analyzer LLM can look at result and understand (categorize) most salient contributions to change.
+  - We also can time the time it takes to do subsequent changes, when relevant. For example, if we are building, we should be able to use cached layers (and the build times speed up) if the LLM is changing content later in the Dockerfile.
+- We can also save the successful results (Dockerfile builds, for example) and compare for similarity. How consistent is the LLM?
+- How does specificity of the prompt influence the result?
+- For an experiment, we would want to do a build -> deploy and successful run for a series of apps and get distributions of attempts, reasons for failure, and a general sense of similarity / differences.
+- For the optimization experiment, we'd want to do the same, but understand gradients of change that led to improvement.
+
+#### Observations
 
-1. Allow the manager to decide what to do on _every_ step (likely not ideal)
-2. Allow step managers to execute until success, always (too much issue if a step is failing because of dependency)
-3. Allow step managers to execute until success unless a limit is set, and then let the manager take over (in other words, too many failures means we hand it back to the manager to look.)
+- Specifying cpu seems important - if you don't it wants to do GPU
+- If you ask for a specific example, it sometimes tries to download data (tell it where data is)
+- There are issues that result from not enough information. E.g., if you don't tell it what to run / data, it can only guess. It will loop forever.
+ - As an example, we know where in a git clone is the data of interest. The LLM can only guess. It's easier to tell it exactly.
+ - An LLM has no sense of time with respect to versions. For example, the reax data changed from reaxc to reaxff in the same path, and which you get depends on the clone. Depending on when the LLM was trained with how to build lammps, it might select an older (or latest) branch. Instead of a juggling or guessing game (that again) would result in an infinite loop, we need to tell it the branch and data file explicitly.
+- Always include common issues in the initial prompt
+- If you are too specific about instance types, it adds node selectors/affinity, and that often doesn't work.