Complexity: 🟨 Intermediate
A minimal example showcasing a Strands agent that answers questions about Strands documentation using a curated URL knowledge base and the native Strands http_request tool.
Note
The CLI optimize workflow at the end of this example can take 10-20 minutes to run.
- Strands framework integration: Demonstrates support for Strands Agents in the NeMo Agent Toolkit.
- AgentCore Integration: Demonstrates an agent that can be run on Amazon Bedrock AgentCore runtime.
- Evaluation and Performance Metrics: Runs dataset-driven evaluation and performance analysis via
nat eval. - Support for Model Providers: Configuration includes NIM, OpenAI, and AWS Bedrock options.
- uv with Python 3.11-3.13: Python environment manager. After installing uv, run:
uv pip install setuptools setuptools-scm - git: Version control
- git Large File Storage (LFS): For handling large files in the repository
Follow the official NeMo Agent Toolkit installation guide
Or see the Install Guide for installing from source.
API keys as required by your chosen models. See Set Up API Keys below.
This command installs the workflow along with its dependencies, including the Strands Agents SDK:
uv pip install -e . # at NeMo-Agent-Toolkit root
uv pip install -e examples/frameworks/strands_demoNote
The NVIDIA_API_KEY is required only when using NVIDIA-hosted NIM endpoints (default configuration). If you are using a self-hosted NVIDIA NIM or model with OAI compatible endpoint and a custom base_url specified in your configuration file (such as in examples/frameworks/strands_demo/configs/sizing_config.yml), you do not need to set the NVIDIA_API_KEY.
export NVIDIA_API_KEY=<YOUR_NVIDIA_API_KEY>Optional: Set these only if you switch to different LLM providers in the config:
# For OpenAI models
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
# For AWS Bedrock models
export AWS_ACCESS_KEY_ID=<YOUR_AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_AWS_SECRET_ACCESS_KEY>
export AWS_DEFAULT_REGION=us-east-1The configs/ directory contains five ready-to-use configurations. Use the commands below.
nat run --config_file examples/frameworks/strands_demo/configs/config.yml \
--input "Use the provided tools and cite information about how to use the Strands API from the tool call results"Expected Workflow Output The workflow produces a large amount of output, the end of the output should contain something similar to the following:
Workflow Result:
-----------------------------
Workflow Result:
['The provided information is about the Strands API and its usage. The Strands API is a platform for building conversational AI models, and it provides a range of tools and features for developers to create and deploy their own conversational AI models.\n\nTo use the Strands API, developers can start by creating an account on the Strands website and obtaining an API key. They can then use the API key to authenticate their requests to the Strands API.\n\nThe Strands API provides a range of endpoints for different tasks, such as creating and managing models, training and testing models, and deploying models to production. Developers can use these endpoints to build and deploy their own conversational AI models using the Strands API.\n\nIn addition to the API endpoints, the Strands API also provides a range of tools and features for developers, such as a model builder, a testing framework, and a deployment platform. These tools and features can help developers to build, test, and deploy their conversational AI models more efficiently and effectively.\n\nOverall, the Strands API is a powerful platform for building conversational AI models, and it provides a range of tools and features for developers to create and deploy their own conversational AI models.']
--------------------------------------------------Runs the workflow over a dataset and computes evaluation and performance metrics. Refer to the evaluation and profiling guides in the documentation for more information.
nat eval --config_file examples/frameworks/strands_demo/configs/eval_config.ymlNote
If you hit rate limits, lower concurrency: --override eval.general.max_concurrency 1
Refer to the evaluation guide for more details on evaluation metrics and configuration options.
Automatically finds optimal LLM parameters (temperature, top_p, max_tokens) through systematic experimentation. The optimizer evaluates multiple parameter combinations across multiple trials and repetitions, balancing accuracy, groundedness, relevance, trajectory correctness, latency, and token efficiency.
nat optimize --config_file examples/frameworks/strands_demo/configs/optimizer_config.ymlWhat it optimizes:
temperature: Tests values from 0.0 to 0.6 (step: 0.2)max_tokens: Tests values from 4096 to 8192 (step: 2048)
The optimizer runs a grid search with 3 repetitions each combination for statistical stability and generates a report showing the best parameter combination based on weighted multi-objective scoring.
Note
Optimization can take significant time. Reduce n_trials or adjust the search space in the config for faster experimentation.
Refer to the optimizer guide for more details on optimization metrics and configuration options.
Determines GPU cluster sizing requirements based on target users and workflow runtime. This configuration requires updating the base_url parameter to point to your self-hosted NVIDIA NIM or model with OAI compatible endpoint.
Step 1: Collect profiling data
First, update the base_url in examples/frameworks/strands_demo/configs/sizing_config.yml to point to your self-hosted NVIDIA NIM or model endpoint, then run the sizing profiler to collect performance metrics at different concurrency levels:
nat sizing calc --config_file examples/frameworks/strands_demo/configs/sizing_config.yml \
--calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
--concurrencies 1,2,4,8,16,32 \
--num_passes 2This command profiles the workflow at multiple concurrency levels (1, 2, 4, 8, 16, and 32 concurrent requests) with 2 passes for each level to establish baseline performance characteristics.
Step 2: Calculate GPU sizing for target workload
Use the profiling data to determine GPU requirements for your target user count and workflow runtime:
# For 100 concurrent users with 20-second target runtime
nat sizing calc --offline_mode \
--calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
--test_gpu_count 8 \
--target_workflow_runtime 20 \
--target_users 100
# For 25 concurrent users with 20-second target runtime
nat sizing calc --offline_mode \
--calc_output_dir /tmp/strands_demo/sizing_calc_run1/ \
--test_gpu_count 8 \
--target_workflow_runtime 20 \
--target_users 25Parameters:
--offline_mode: Uses previously collected profiling data--calc_output_dir: Directory containing the profiling results--test_gpu_count: Number of GPUs used during profiling (8 in this example)--target_workflow_runtime: Desired workflow completion time in seconds--target_users: Number of concurrent users to support
The sizing calculator will output the recommended GPU count needed to meet your performance targets.
This configuration serves the workflow locally with the endpoints required by Amazon Bedrock AgentCore. This configuration is a general requirement for any workflow, regardless of whether it uses the Strands Agents framework.
nat serve --config_file examples/frameworks/strands_demo/configs/agentcore_config.ymlTest the endpoints:
In a separate terminal, verify the service is running with the health check endpoint:
curl http://localhost:8080/pingCall the main workflow via the /invocations endpoint:
curl -X POST http://localhost:8080/invocations \
-H "Content-Type: application/json" \
-d '{"inputs": "What is the Strands agent loop?"}'Next, to deploy the AgentCore-compatible NeMo Agent Toolkit workflow on Amazon Bedrock AgentCore, follow Running Strands with NeMo Agent Toolkit on AWS AgentCore.