Skip to content

Commit bdfe02f

Browse files
authored
Merge pull request #2 from BNLNPPS/infra/baseline-v11
Infrastructure improvements (baseline-v11)
2 parents ea5a771 + 046e206 commit bdfe02f

19 files changed

+791
-123
lines changed

.gitignore

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -196,4 +196,13 @@ cython_debug/
196196
# Supervisord logs and runtime files
197197
logs/
198198
*.pid
199-
supervisord.pid
199+
supervisord.pid
200+
201+
# macOS
202+
.DS_Store
203+
.DS_Store?
204+
._*
205+
.Spotlight-V100
206+
.Trashes
207+
ehthumbs.db
208+
Thumbs.db

.vscode/mcp.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"servers": {
3+
"hf-mcp-server": {
4+
"url": "https://huggingface.co/mcp",
5+
"headers": {
6+
"Authorization": "Bearer $HUGGINGFACE_API_KEY"
7+
}
8+
}
9+
}
10+
}

CLAUDE-toplevel.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,29 +57,33 @@ cd swf-testbed && ./run_all_tests.sh
5757

5858
### System Initialization
5959
```bash
60-
cd swf-testbed
60+
cd $SWF_PARENT_DIR/swf-testbed
6161
source .venv/bin/activate
62-
pip install -e ../swf-common-lib ../swf-monitor .
63-
swf-testbed init
62+
pip install -e $SWF_PARENT_DIR/swf-common-lib $SWF_PARENT_DIR/swf-monitor .
63+
# CRITICAL: Set up Django environment
64+
cp $SWF_PARENT_DIR/swf-monitor/.env.example $SWF_PARENT_DIR/swf-monitor/.env
65+
# Edit .env to set DB_PASSWORD='your_db_password' and SECRET_KEY
66+
cd $SWF_PARENT_DIR/swf-monitor/src && python manage.py migrate
67+
cd $SWF_PARENT_DIR/swf-testbed && swf-testbed init
6468
```
6569

6670
### Infrastructure Services
6771
```bash
6872
# Start with Docker (recommended)
69-
cd swf-testbed && swf-testbed start
73+
cd $SWF_PARENT_DIR/swf-testbed && swf-testbed start
7074

7175
# Or start locally (requires PostgreSQL/ActiveMQ installed)
72-
cd swf-testbed && swf-testbed start-local
76+
cd $SWF_PARENT_DIR/swf-testbed && swf-testbed start-local
7377
```
7478

7579
### Testing
7680
```bash
7781
# Test entire ecosystem
78-
cd swf-testbed && ./run_all_tests.sh
82+
cd $SWF_PARENT_DIR/swf-testbed && ./run_all_tests.sh
7983

8084
# Test individual components
81-
cd swf-monitor && ./run_tests.sh
82-
cd swf-common-lib && ./run_tests.sh
85+
cd $SWF_PARENT_DIR/swf-monitor && ./run_tests.sh
86+
cd $SWF_PARENT_DIR/swf-common-lib && ./run_tests.sh
8387
```
8488

8589
## Repository-Specific Guidance
@@ -151,6 +155,9 @@ Each repository contains its own CLAUDE.md with detailed, repository-specific gu
151155
## Troubleshooting
152156

153157
### Common Issues
158+
- **Virtual Environment Persistence**: The shell environment, including the activated virtual environment, does **not** persist between `run_shell_command` calls. You **MUST** chain environment setup and the command that requires it in a single call.
159+
- **Correct**: `cd swf-testbed && source ./install.sh && cd ../swf-monitor && python3 src/manage.py migrate`
160+
- **Incorrect**: Running `source ./install.sh` in one call and `python3 src/manage.py migrate` in another.
154161
- **Core repository structure**: Ensure swf-testbed, swf-monitor, and swf-common-lib are siblings
155162
- **Environment variables**: Check SWF_HOME is set correctly (auto-configured by CLI)
156163
- **Database connections**: Verify PostgreSQL is running and accessible

CLAUDE.md

Lines changed: 83 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,47 @@
22

33
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
44

5+
## Critical Thinking Requirements
6+
7+
Before implementing ANY solution, Claude must explain:
8+
9+
1. **Complete Data Flow Analysis**
10+
- Where does data come from?
11+
- Where does it get stored?
12+
- Where does it get used?
13+
- What persists between runs?
14+
- What gets cached or reused?
15+
16+
2. **Problem Definition**
17+
- What is the actual problem vs what I think it is?
18+
- What assumptions am I making?
19+
- What evidence do I have that my understanding is correct?
20+
21+
3. **Solution Validation**
22+
- Why will this solution work?
23+
- What could go wrong?
24+
- How can I verify it worked?
25+
- What side effects might occur?
26+
27+
## DO NOT CODE UNTIL:
28+
- You can trace the complete data flow
29+
- You can explain why the current behavior is happening
30+
- You can explain exactly what needs to change
31+
- You have stated all assumptions explicitly
32+
33+
## Common Failure Patterns to Avoid:
34+
- Jumping to implementation without understanding the system
35+
- Assuming data behaves as expected without verification
36+
- Ignoring data persistence between script runs
37+
- Making changes without understanding their scope
38+
- Failing to clear cached/persistent data
39+
40+
## When Stuck:
41+
1. Stop coding
42+
2. Explain what you think is happening
43+
3. Ask for verification of your understanding
44+
4. Only proceed when understanding is confirmed
45+
546
## Development Environment
647

748
### Claude Code Setup
@@ -15,6 +56,9 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
1556
- `./run_tests.sh` - Run tests for swf-testbed only (uses pytest)
1657
- `./run_all_tests.sh` - Run tests across all swf-* repositories in parent directory
1758
- Tests are located in `tests/` directory and use pytest framework
59+
- **Auto-activation**: Test scripts automatically activate the virtual environment if needed
60+
- Just run `./run_all_tests.sh` directly - no manual setup required!
61+
- Scripts set up their own environment variables internally
1862

1963
### Testbed Management
2064
- `swf-testbed init` - Initialize environment (creates logs/ directory and supervisord.conf)
@@ -32,6 +76,18 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
3276
- `source .venv/bin/activate && pip install .[test]` - Install test dependencies
3377
- Virtual environment located at `.venv/` - ALWAYS activate before any Python commands
3478

79+
**Initial Setup**
80+
- Run `source install.sh` once when setting up the development environment
81+
- This installs all dependencies and creates the virtual environment
82+
- After initial setup, test scripts handle their own environment activation
83+
84+
**CRITICAL: Django .env Configuration Required**
85+
- Copy `.env.example` to `.env` in swf-monitor directory: `cp ../swf-monitor/.env.example ../swf-monitor/.env`
86+
- Update database password in `.env` to match Docker: `DB_PASSWORD='your_db_password'`
87+
- Set Django secret key: `SECRET_KEY='django-insecure-dev-key-for-testing-only-change-for-production-12345678901234567890'`
88+
- Run Django migrations: `cd ../swf-monitor/src && python manage.py migrate`
89+
- Without proper .env setup, Django tests will fail with authentication errors
90+
3591
## Architecture Overview
3692

3793
### Multi-Repository Structure
@@ -67,6 +123,10 @@ The system implements loosely coupled agents that communicate via ActiveMQ messa
67123
### Multi-Repository Development
68124
- **Always use infrastructure branches**: `infra/baseline-v1`, `infra/baseline-v2`, etc. for all development
69125
- Create coordinated branches with same name across all affected repositories
126+
- **CRITICAL: Always push with `-u` flag on first push**: `git push -u origin branch-name`
127+
- This sets up branch tracking which is essential for VS Code and git status
128+
- Without `-u`, branches appear "unpublished" even after pushing
129+
- Example: `git push -u origin infra/baseline-v10`
70130
- Document specific features and changes through descriptive commit messages
71131
- Never push directly to main - always use branches and pull requests
72132
- Run `./run_all_tests.sh` before merging infrastructure changes
@@ -110,4 +170,26 @@ This maintenance should be part of any commit that involves adding, removing, or
110170
- **Rucio**: Distributed data management system
111171
- **ActiveMQ**: Message broker for agent communication
112172
- **PostgreSQL**: Database for monitoring and metadata storage
113-
- **supervisord**: Process management for Python agents
173+
- **supervisord**: Process management for Python agents
174+
175+
## AI Development Guidelines
176+
177+
### Directory Awareness (Critical for Claude)
178+
- **ALWAYS use $SWF_PARENT_DIR for navigation** - Never use relative paths like `../swf-monitor`
179+
- **ALWAYS run `pwd` before any file operations** - Claude frequently loses track of current directory
180+
- **NEVER assume your location** - explicitly verify with `pwd` at start of file access attempts
181+
- **Use absolute paths**: `cd $SWF_PARENT_DIR/swf-testbed` not `cd swf-testbed`
182+
- **For file operations**: Use `$SWF_PARENT_DIR/swf-monitor/.env` not `../swf-monitor/.env`
183+
- This is a recurring Claude issue that causes confusion and wasted time
184+
185+
### Git Branch Management
186+
- **ALWAYS use `git push -u origin branch-name` on first push** - this is non-negotiable
187+
- After pushing, verify tracking with `git branch -vv` - should show `[origin/branch-name]`
188+
- If tracking is missing, fix immediately with: `git branch --set-upstream-to=origin/branch-name branch-name`
189+
- VS Code "Publish branch" button indicates missing tracking - this must be resolved
190+
191+
### Commit and Push Workflow
192+
1. Create commits with descriptive messages including Claude Code attribution
193+
2. First push: `git push -u origin branch-name` (sets up tracking)
194+
3. Subsequent pushes: `git push` (tracking already established)
195+
4. Always verify tracking is set up correctly before proceeding

GEMINI.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Gemini Guidance
2+
3+
This file provides critical operational guidance for the Gemini agent working within the SWF testbed ecosystem.
4+
5+
## **CRITICAL: Command Execution in the Virtual Environment**
6+
7+
**1. Virtual Environment Directory:**
8+
The virtual environment for this project is named `.venv` (a hidden directory), not `venv`. Always use this correct path.
9+
10+
**2. Execution Method:**
11+
To ensure commands run reliably, you **MUST** use the full, absolute path to the python executable within the `.venv` directory. This is the most robust method and avoids issues with shell environment persistence.
12+
13+
- **Python Executable Path:** `/Users/wenaus/github/swf-testbed/.venv/bin/python3`
14+
15+
First, ensure all dependencies are installed by running the `install.sh` script once.
16+
```bash
17+
# Run this once to set up or update dependencies
18+
cd /Users/wenaus/github/swf-testbed && source ./install.sh
19+
```
20+
21+
### Correct Procedure for Subsequent Commands:
22+
23+
Directly execute commands using the venv's python.
24+
25+
**Example: Running a Django migration in `swf-monitor`**
26+
```bash
27+
/Users/wenaus/github/swf-testbed/.venv/bin/python3 /Users/wenaus/github/swf-monitor/src/manage.py migrate
28+
```
29+
30+
**Example: Running the Django development server**
31+
```bash
32+
/Users/wenaus/github/swf-testbed/.venv/bin/python3 /Users/wenaus/github/swf-monitor/src/manage.py runserver 8001 &
33+
```
34+
35+
---
36+
37+
## **CRITICAL: Checklist for Renaming Components**
38+
39+
Renaming components has far-reaching side effects. A simple rename requires a systematic, multi-step check to ensure the application remains stable. The following checklist is based on recent failures and must be followed for any renaming task.
40+
41+
**Example Scenario:** Renaming a view from `old_name` to `new_name`.
42+
43+
1. **Rename the View Function:**
44+
* In `views.py`, change `def old_name(request):` to `def new_name(request):`.
45+
46+
2. **Update URL Configuration (`urls.py`):**
47+
* **Update Import:** Change `from .views import old_name` to `from .views import new_name`.
48+
* **Update `path()`:** Change `path('...', old_name, ...)` to `path('...', new_name, ...)`.
49+
* **Update URL Name:** Change `name='old_name'` to `name='new_name'`. This is critical for template tags.
50+
* **Check URL Parameters:** Ensure any captured URL parameters (e.g., `<str:table_name>`) match the arguments in the new view function's signature.
51+
52+
3. **Update Templates (`*.html`):**
53+
* **Find and Replace `{% url %}` tags:** Search all templates for `{% url 'monitor_app:old_name' %}` and replace it with `{% url 'monitor_app:new_name' %}`.
54+
* **Rename Template File:** If the view renders a template with a corresponding name (e.g., `old_name.html`), rename the file to `new_name.html`.
55+
56+
4. **Global Code Search:**
57+
* Perform a project-wide search for the string `"old_name"` to find any other references in Python code, JavaScript, or comments.
58+
59+
5. **Verification:**
60+
* **Run `manage.py check`:** This is the most important step. It will catch most `ImportError`, `NameError`, and `NoReverseMatch` issues without needing to run the server.
61+
* **Restart and Test:** Only after the check passes, restart the server and manually test the affected pages.

README.md

Lines changed: 21 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -612,6 +612,11 @@ cd swf-testbed && git checkout -b infra/baseline-v1
612612
cd ../swf-monitor && git checkout -b infra/baseline-v1
613613
cd ../swf-common-lib && git checkout -b infra/baseline-v1
614614
615+
# CRITICAL: Push branches to origin immediately to make them available remotely
616+
cd swf-testbed && git push origin infra/baseline-v1
617+
cd ../swf-monitor && git push origin infra/baseline-v1
618+
cd ../swf-common-lib && git push origin infra/baseline-v1
619+
615620
# Work freely across repositories
616621
# Commit frequently with descriptive messages
617622
# Let commit messages document the nature and progression of changes
@@ -631,18 +636,22 @@ For features that primarily affect a single repository:
631636
# Create feature branch in the primary repository
632637
git checkout -b feature/your-feature-name
633638
639+
# CRITICAL: Push branch to origin immediately to make it available remotely
640+
git push origin feature/your-feature-name
641+
634642
# Work, commit, and create pull request as normal
635643
# If cross-repo changes are needed, coordinate with infrastructure approach
636644
```
637645
638646
#### Development Guidelines
639647
640648
1. **Never push directly to main** - Always use branches and pull requests
641-
2. **Coordinate cross-repo changes** - Use matching branch names for related work
642-
3. **Test system integration** - Run `./run_all_tests.sh` before merging infrastructure changes
643-
4. **Maintain test coverage** - As you add functionality, extend the tests to ensure `./run_all_tests.sh` reliably evaluates system integrity
644-
5. **Document through commits** - Use descriptive commit messages to explain the progression of work
645-
6. **Maintain sibling structure** - Keep all `swf-*` repositories as siblings in the same parent directory
649+
2. **Push branches to origin immediately** - Always run `git push origin branch-name` right after creating a branch to make it available across all development machines
650+
3. **Coordinate cross-repo changes** - Use matching branch names for related work
651+
4. **Test system integration** - Run `./run_all_tests.sh` before merging infrastructure changes
652+
5. **Maintain test coverage** - As you add functionality, extend the tests to ensure `./run_all_tests.sh` reliably evaluates system integrity
653+
6. **Document through commits** - Use descriptive commit messages to explain the progression of work
654+
7. **Maintain sibling structure** - Keep all `swf-*` repositories as siblings in the same parent directory
646655
647656
#### Pull Request Process
648657
@@ -655,6 +664,13 @@ git checkout -b feature/your-feature-name
655664
This workflow ensures that the testbed remains stable and integrated while
656665
allowing for rapid infrastructure development and feature additions.
657666
667+
### Example Agent Implementations
668+
669+
For developers looking to create new agents or understand how to interact with
670+
the testbed's messaging and API services, standalone examples are provided in
671+
the `example_agents/` directory. These provide a clear, modern blueprint for
672+
agent development.
673+
658674
## Participants
659675

660676
At present the testbed is a project of the Nuclear and Particle Physics

current_issue.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Summary of Current Issue: `example_daqsim_agent.py` Hangs on Connection
2+
3+
## 1. High-Level Goal
4+
5+
The objective is to create a set of standalone, example agents in the `swf-testbed/example_agents/` directory. These agents should serve as a blueprint for real agents, communicating with the `swf-monitor` application via its REST API for logging/heartbeats and with ActiveMQ for messaging.
6+
7+
We began by implementing the `example_daqsim_agent.py` as the first example.
8+
9+
## 2. The Problem
10+
11+
The `example_daqsim_agent.py` script hangs indefinitely when executed. The script successfully starts, but it never proceeds past the ActiveMQ connection logic, and no errors are printed to the console.
12+
13+
## 3. Current State of the Code
14+
15+
### `swf-monitor` Repository (`infra/baseline-v10` branch)
16+
- A REST API has been established at `/api/v1/`.
17+
- Endpoints exist for `/logs/` and `/systemagents/`.
18+
- The `/systemagents/heartbeat/` endpoint was created to allow agents to register/update their status.
19+
- The API requires token authentication. A user `gemini` with token `39a564f5d3a2952813affa2146b9f4f6587e5273` was created for testing.
20+
- The API and its authentication have been successfully tested with `curl`.
21+
22+
### `swf-testbed` Repository (`infra/baseline-v10` branch)
23+
- A new directory `example_agents/` has been created.
24+
- `base_agent.py`: Contains a reusable `ExampleAgent` class to handle common logic.
25+
- `example_daqsim_agent.py`: A simple agent that inherits from `ExampleAgent`. Its purpose is to connect to ActiveMQ and (eventually) produce messages.
26+
- `requirements.txt`: Contains `requests` and `stomp.py`.
27+
28+
## 4. Debugging Steps Taken & Results
29+
30+
The following steps were taken to diagnose the hanging issue:
31+
32+
1. **Initial Run (Background):** The script was run in the background. **Result:** The agent never appeared in the API's list of system agents. No logs were visible.
33+
2. **Foreground Run:** The script was run in the foreground to observe errors. **Result:** The script hangs silently with no output and must be manually interrupted.
34+
3. **Verify ActiveMQ Service:** Checked if the ActiveMQ Docker container was running using `docker ps`. **Result:** The container is running correctly.
35+
4. **Verify Port Accessibility:** Used `nc -zv localhost 61616` to check if the STOMP port was open. **Result:** The port is open and the connection succeeds, ruling out firewall or port mapping issues.
36+
5. **Add STOMP Heartbeats:** Modified `base_agent.py` to include `heartbeats=(10000, 10000)` in the `stomp.Connection` constructor, as a lack of heartbeating is a common cause of hangs. **Result:** The script still hangs.
37+
38+
## 5. Current Hypothesis
39+
40+
- The issue is not with the `swf-monitor` API or basic network connectivity.
41+
- The problem lies specifically within the `stomp.py` connection logic in `base_agent.py`.
42+
- The hang occurs at the `self.conn.connect(self.mq_user, self.mq_password, wait=True)` line.
43+
- Since heartbeats did not solve it, the issue is likely a more subtle STOMP protocol-level problem (e.g., a version mismatch, an issue with vhost, or another parameter disagreement between the client and the Artemis broker) that is causing the handshake to never complete.
44+
45+
## 6. Last Proposed Action
46+
47+
My last action before being stopped was to propose enabling the `stomp.py` library's internal debug logging to print the raw STOMP frames being sent and received. This would provide a low-level view of the handshake and reveal exactly where it is failing.

docker-compose.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ services:
44
image: apache/activemq-artemis:latest-alpine
55
ports:
66
- "8161:8161" # Web console
7-
- "61616:61616" # Broker port
7+
- "61616:61616" # Core protocol port
8+
- "61613:61613" # STOMP protocol port
89
environment:
910
- ARTEMIS_USER=admin
1011
- ARTEMIS_PASSWORD=admin

0 commit comments

Comments
 (0)