Skip to content

Latest commit

 

History

History
224 lines (158 loc) · 10.9 KB

File metadata and controls

224 lines (158 loc) · 10.9 KB

Strands Example

Complexity: 🟨 Intermediate

A minimal example showcasing a Strands agent that answers questions about Strands documentation using a curated URL knowledge base and the native Strands http_request tool.

Note

The CLI optimize workflow at the end of this example can take 10-20 minutes to run.

Table of Contents

Key Features

  • Strands framework integration: Demonstrates support for Strands Agents in the NeMo Agent Toolkit.
  • AgentCore Integration: Demonstrates an agent that can be run on Amazon Bedrock AgentCore runtime.
  • Evaluation and Performance Metrics: Runs dataset-driven evaluation and performance analysis via nat eval.
  • Support for Model Providers: Configuration includes NIM, OpenAI, and AWS Bedrock options.

Prerequisites

Local Development Tools

  • uv with Python 3.11-3.13: Python environment manager. After installing uv, run: uv pip install setuptools setuptools-scm
  • git: Version control
  • git Large File Storage (LFS): For handling large files in the repository

NeMo Agent Toolkit

Follow the official NeMo Agent Toolkit installation guide

Or see the Install Guide for installing from source.

API Keys

API keys as required by your chosen models. See Set Up API Keys below.

Installation and Setup

Install NeMo Agent Toolkit and Workflow

This command installs the workflow along with its dependencies, including the Strands Agents SDK:

uv pip install -e . # at NeMo-Agent-Toolkit root
uv pip install -e examples/frameworks/strands_demo

Set Up API Keys

Note

The NVIDIA_API_KEY is required only when using NVIDIA-hosted NIM endpoints (default configuration). If you are using a self-hosted NVIDIA NIM or model with OAI compatible endpoint and a custom base_url specified in your configuration file (such as in examples/frameworks/strands_demo/configs/sizing_config.yml), you do not need to set the NVIDIA_API_KEY.

export NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>

Optional: Set these only if you switch to different LLM providers in the config:

# For OpenAI models
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>

# For AWS Bedrock models
export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY>
export AWS_DEFAULT_REGION=us-east-1

Run the Workflow locally

The configs/ directory contains five ready-to-use configurations. Use the commands below.

1) Run the workflow (config.yml)

nat run --config_file examples/frameworks/strands_demo/configs/config.yml \
  --input "Use the provided tools and cite information about how to use the Strands API from the tool call results"

Expected Workflow Output The workflow produces a large amount of output, the end of the output should contain something similar to the following:

Workflow Result:
-----------------------------
Workflow Result:
['The provided information is about the Strands API and its usage. The Strands API is a platform for building conversational AI models, and it provides a range of tools and features for developers to create and deploy their own conversational AI models.\n\nTo use the Strands API, developers can start by creating an account on the Strands website and obtaining an API key. They can then use the API key to authenticate their requests to the Strands API.\n\nThe Strands API provides a range of endpoints for different tasks, such as creating and managing models, training and testing models, and deploying models to production. Developers can use these endpoints to build and deploy their own conversational AI models using the Strands API.\n\nIn addition to the API endpoints, the Strands API also provides a range of tools and features for developers, such as a model builder, a testing framework, and a deployment platform. These tools and features can help developers to build, test, and deploy their conversational AI models more efficiently and effectively.\n\nOverall, the Strands API is a powerful platform for building conversational AI models, and it provides a range of tools and features for developers to create and deploy their own conversational AI models.']
--------------------------------------------------

2) Evaluate accuracy and performance (eval_config.yml)

Runs the workflow over a dataset and computes evaluation and performance metrics. Refer to the evaluation and profiling guides in the documentation for more information.

nat eval --config_file examples/frameworks/strands_demo/configs/eval_config.yml

Note

If you hit rate limits, lower concurrency: --override eval.general.max_concurrency 1 Refer to the evaluation guide for more details on evaluation metrics and configuration options.

3) Optimize workflow parameters (optimizer_config.yml)

Automatically finds optimal LLM parameters (temperature, top_p, max_tokens) through systematic experimentation. The optimizer evaluates multiple parameter combinations across multiple trials and repetitions, balancing accuracy, groundedness, relevance, trajectory correctness, latency, and token efficiency.

nat optimize --config_file examples/frameworks/strands_demo/configs/optimizer_config.yml

What it optimizes:

  • temperature: Tests values from 0.0 to 0.6 (step: 0.2)
  • max_tokens: Tests values from 4096 to 8192 (step: 2048)

The optimizer runs a grid search with 3 repetitions each combination for statistical stability and generates a report showing the best parameter combination based on weighted multi-objective scoring.

Note

Optimization can take significant time. Reduce n_trials or adjust the search space in the config for faster experimentation. Refer to the optimizer guide for more details on optimization metrics and configuration options.

4) Determine GPU cluster sizing (sizing_config.yml)

Determines GPU cluster sizing requirements based on target users and workflow runtime. This configuration requires updating the base_url parameter to point to your self-hosted NVIDIA NIM or model with OAI compatible endpoint.

Step 1: Collect profiling data

First, update the base_url in examples/frameworks/strands_demo/configs/sizing_config.yml to point to your self-hosted NVIDIA NIM or model endpoint, then run the sizing profiler to collect performance metrics at different concurrency levels:

nat sizing calc --config_file examples/frameworks/strands_demo/configs/sizing_config.yml \
  --calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
  --concurrencies 1,2,4,8,16,32 \
  --num_passes 2

This command profiles the workflow at multiple concurrency levels (1, 2, 4, 8, 16, and 32 concurrent requests) with 2 passes for each level to establish baseline performance characteristics.

Step 2: Calculate GPU sizing for target workload

Use the profiling data to determine GPU requirements for your target user count and workflow runtime:

# For 100 concurrent users with 20-second target runtime
nat sizing calc --offline_mode \
  --calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
  --test_gpu_count 8 \
  --target_workflow_runtime 20 \
  --target_users 100

# For 25 concurrent users with 20-second target runtime
nat sizing calc --offline_mode \
  --calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
  --test_gpu_count 8 \
  --target_workflow_runtime 20 \
  --target_users 25

Parameters:

  • --offline_mode: Uses previously collected profiling data
  • --calc_output_dir: Directory containing the profiling results
  • --test_gpu_count: Number of GPUs used during profiling (8 in this example)
  • --target_workflow_runtime: Desired workflow completion time in seconds
  • --target_users: Number of concurrent users to support

The sizing calculator will output the recommended GPU count needed to meet your performance targets.

5) Test and serve AgentCore-compatible endpoints locally (agentcore_config.yml)

This configuration serves the workflow locally with the endpoints required by Amazon Bedrock AgentCore. This configuration is a general requirement for any workflow, regardless of whether it uses the Strands Agents framework.

nat serve --config_file examples/frameworks/strands_demo/configs/agentcore_config.yml

Test the endpoints:

In a separate terminal, verify the service is running with the health check endpoint:

curl http://localhost:8080/ping

Call the main workflow via the /invocations endpoint:

curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{"inputs": "What is the Strands agent loop?"}'

Next, to deploy the AgentCore-compatible NeMo Agent Toolkit workflow on Amazon Bedrock AgentCore, follow Running Strands with NeMo Agent Toolkit on AWS AgentCore.