In this framework, expert agents can be created dynamically using a dictionary containing {AGENT_NAME: SPECIALTY}
(see build_specialist_agents
in agents.py
) at agent creation time. Furthermore, there is a fallback function in case the agent responsible for selecting the expert fails, directing the agent to the fallback agent as a backup plan. After directing the question to the agent, it will generate the response, which will be forwarded to the personality layer before being sent to the user.
Expert agents consult a FAISS vector store composed of a set of pages (see URL_LIST
in config.yml
). This vector store is created at each initialization (REBUILD_VECTOR_STORE=True
, by default) within a few moments, containing the most recent information. However, to use static content, simply change REBUILD_VECTOR_STORE=False
in config.yml
. For each input, k documents are retrieved from the vector store and used as context for that input. The agent then uses this information to respond to the user's input.
The following agents were defined:
- ROUTER: Receives the customer's message (input) and directs it to the most appropriate agent.
- GENERIC: Generic agent for InfinityPay.
- MAQUININHA: Specialist in POS terminals.
- ONLINE_COBRANCA: Specialist in online billing.
- PDV_ECOMMERCE: Specialist in POS and ecommerce.
- CONTA_DIGITAL: Specialist in digital accounts, Pix, boleto, cards, etc.
- WEB_SEACHER: Agent to answer questions using the internet.
- Fallback: Generalist agent, in case the ROUTER fails to direct to other agents.
- The project development deadline was SEVEN HOURS.
Being a GPU-poorFaced with GPU restrictions for implementation, I preferred the use of Small Language Models. After testing, its results were reasonable. In addition, agent swarm frameworks are very bloat, and their documentation are incomplete or inconsistent. Given this, I implemented the main functionaliades (swarm creation of agents, router, etc.) and used the necessary utilities of Langchain.
The following subsections describe the project structure, how to use the application, how to access it, and the initial parameters of the application.
cloudwalk_swarm/
│
├── agents.py # Main logic for creating and routing agents
├── api.py # FastAPI HTTP endpoint exposing the agents
├── config.yaml # Application configurations (models, chunk size, etc.)
├── functions.py # Helper functions (e.g., vector store rebuilding)
├── requirements.txt # Python dependencies list
├── Dockerfile # Docker image build file
├── docker-compose.yml # Container orchestration
├── .env # Environment variables (tokens, keys, configs)
│
├── models/ # Stores LLMs and embedding models
│ ├── models--Qwen--Qwen3-0.6B/
│ └── models--google--gemma-2b-it/
│ └── etc.../
│
├── vector_store/ # Stores vector data
│ └── vs_base/ # Persistent FAISS vector database
│ ├── index.faiss
│ └── index.pkl
│
│
└── README.md # Project documentation
- Requirements:
- Python >= 3.9.
- All the dependencies are settes with fixed/freezed versions in the
requirements.txt
file.
- Clone this repository into your local development environment.
- Install
uv
with pip:pip install uv
- Create a .env file with the environment variables
HF_TOKEN
andSERPAPI_API_KEY
. - Create your virtual environment with the
uv venv .venv
. - Initialize your virtual environment:
- Linux/macOS:
source .venv/bin/activate
- Windows (cmd.exe):
.venv\Scripts\activate
- Windows (PowerShell):
.venv\Scripts\Activate.ps1
- Linux/macOS:
- Install the required libraries with
uv pip install -r requirements.txt
.
- Acess the deployd version on Huggingface Spaces.
- Initialize the API using the
uvicorn api:app --reload
command in the terminal. - There's an interactive interface (Swagger UI) where you can submit questions to the POST /ask route. Example of input JSON:
{
"question": "Como posso receber pagamentos com maquininha?"
"user": "user_123"
}
the answer will be a JSON with the agent and answer:
{
"response": personality_response,
"source_agent_response": raw_response,
"agent_workflow": [
{
"agent_name": "MAQUININHA",
"tool_calls": {
"MAQUININHA": "Você pode usar a maquininha com Wi-Fi ou chip..."
}
}
]
}
- To acess via postman, send an HTTP
POST
request tohttp://127.0.0.1:8000/ask
with the following body:
{
"question": "Quais são os benefícios da conta digital?",
"user_id": "usuario_123"
}
- To ask via terminal (curl), execute the command:
curl -X POST http://127.0.0.1:8000/ask \
-H "Content-Type: application/json" \
-d '{"message": "Quero saber sobre maquininhas", "user_id": "usuario_123"}'
- build the image executing the command
docker-compose build
, then inicialize the API withdocker-compose up
. The API will be available onhttp://localhost:8000/
; - To watch the logs in real time, use the command
docker-compose logs -f
; - To stop the container, use
docker-compose down
.
- Run the app locally with
python3 app-gradio.py
.
- LLM: Default model used is
"Qwen/Qwen3-0.6B"
, to change the model, just change theLLM_MODEL
parameter in theconfig.yaml
file. - Vector Store: The parameter
REBUILD_VECTOR_STORE
to build the vector stores that is the knowledge base is setTrue
by default, That is, every time the application is deployed or started locally, the process of creating and storing the vector store will be executed. To learn more, check thefunctions.py
file. - Other parameters related to vector store such as sites that serve as source (
URL_LIST
),CHUNK_SIZE
andCHUNK_OVERLAP
can be checked in theconfig.yaml
file.
If you want to contribute to this project with improvements or suggestions, feel free to open a pull request.