Skip to content

Commit dd9f327

Browse files
committed
[ate]: Enable client-side load balancing and update docs
This change introduces gRPC client-side load balancing to the ATE client library and documents its usage. ATE Client Library: - The client options now accept a `pa_target` in gRPC name-syntax format (e.g., "ipv4:host1:port,host2:port") to enable connecting to multiple server instances. - A `load_balancing_policy` option has been added to allow callers to select policies like "round_robin". - The client creation logic was updated to use `grpc::CreateCustomChannel` to apply the specified load balancing configuration. - Test programs and integration test scripts were updated to use the new `--pa_target` and `--load_balancing_policy` flags. Documentation (`docs/ate.md`): - Added a section on "Client-Side Load Balancing and Failover," explaining how to configure it and what to expect during partial and total server outages. - Added a "Client Lifecycle and Resource Management" section outlining best practices for creating and destroying the client instance. - Added a "Monitoring and Debugging" section that instructs users on how to enable gRPC tracing via environment variables to debug connection health and load balancer behavior. Signed-off-by: Miguel Osorio <miguelosorio@google.com>
1 parent 65d370d commit dd9f327

File tree

8 files changed

+200
-39
lines changed

8 files changed

+200
-39
lines changed

docs/ate.md

Lines changed: 132 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,149 @@
11
# Automated Test Equipment (ATE) Client
22

3-
Standard ATE is used in various silicion manufacturing stages, as well as
4-
for device provisioning. The ATE system generally runs on a PC system running
5-
the Windows operating system.
3+
The Automated Test Equipment (ATE) client library and associated test programs
4+
are used to drive provisioning flows for OpenTitan devices. The client
5+
communicates with one or more
6+
[Provisioning Appliance (PA)](https://github.com/lowRISC/opentitan-provisioning/wiki/pa)
7+
servers to perform secure provisioning operations.
68

7-
The ATE client connects to the [Provisioning Appliance](https://github.com/lowRISC/opentitan-provisioning/wiki/pa) to perform
8-
provisioning operations.
9+
## Client-Side Load Balancing and Failover
910

10-
## Developer Notes
11+
The ATE client library supports gRPC client-side load balancing, allowing it
12+
to distribute requests across multiple Provisioning Appliance (PA) server
13+
instances. This enhances reliability and scalability.
1114

12-
## Run ATE Client (Linux)
15+
### Enabling Load Balancing
1316

14-
Run the following steps before proceeding.
17+
To enable load balancing, you must provide a list of server addresses in a
18+
gRPC-compliant format via the `--pa_target` command-line argument when running
19+
a test program (e.g., `cp` or `ft`).
1520

16-
* Generate [enpoint certificates](https://github.com/lowRISC/opentitan-provisioning/wiki/auth#endpoint-certificates).
17-
* Start [PA server](https://github.com/lowRISC/opentitan-provisioning/wiki/pa#start-pa-server).
21+
* **Target URI Format**: The target should be specified using gRPC's
22+
name-syntax.
23+
* For IPv4: `ipv4:<ip_addr1>:<port1>,<ip_addr2>:<port2>,...`
24+
* For IPv6: `ipv6:[<ip_addr1>]:<port1>,[<ip_addr2>]:<port2>,...`
1825

19-
Take note of the PA server target address and port number. In the following
20-
command we start the client pointing to `localhost:5001`.
26+
Example: `--pa_target="ipv4:10.0.0.1:50051,10.0.0.2:50051"`
27+
28+
### Load Balancing Policies
29+
30+
You can select a load balancing policy using the `--load_balancing_policy`
31+
argument. If unspecified, gRPC's default (`pick_first`) is used.
32+
33+
* `pick_first` (Default): The client attempts to connect to the first
34+
address in the list. All RPCs are sent to this single server. If the
35+
connection fails, it will try the next address in the list. This policy
36+
provides basic failover but does not distribute load.
37+
* `round_robin`: The client connects to all servers in the list and
38+
distributes RPCs across them in a round-robin fashion. This policy
39+
provides both load balancing and high-availability failover.
40+
41+
### Failover Scenarios
42+
43+
The behavior of the client during server outages depends on the configured
44+
policy.
45+
46+
* **Partial Outage (with `round_robin`)**: If one server in the pool becomes
47+
unavailable, the gRPC runtime will automatically detect the failed
48+
connection and temporarily remove it from the pool of healthy endpoints.
49+
Subsequent API calls will be transparently routed to the remaining healthy
50+
servers. From the caller's perspective, the operations will continue to
51+
succeed without any errors.
52+
53+
* **Total Outage**: If all server endpoints become unavailable, any API call
54+
made through the library will fail.
55+
* The C API functions (e.g., `InitSession`, `DeriveTokens`) will return a
56+
non-zero status code. This code will correspond to the gRPC status
57+
code `UNAVAILABLE` (14).
58+
* Callers must check the return value of every function call to handle
59+
this scenario gracefully. A persistent failure with this status code
60+
indicates that the client cannot reach any of the configured
61+
provisioning servers.
62+
63+
## Client Lifecycle and Resource Management
64+
65+
The ATE client is designed to be a long-lived object that manages the
66+
underlying gRPC channel, including all network connections and load balancing
67+
state. To ensure optimal performance and efficient resource use, follow these
68+
best practices.
69+
70+
### Singleton Client Instance
71+
72+
It is strongly recommended to treat the `ate_client_ptr` as a singleton within
73+
your application. You should call `CreateClient` once when your program
74+
initializes and reuse that same client instance for all subsequent gRPC calls.
75+
76+
Repeatedly calling `CreateClient` and `DestroyClient` for different operations
77+
is an anti-pattern. Each call to `CreateClient` initializes a new gRPC channel,
78+
which involves setting up new TCP connections, performing TLS handshakes (if
79+
enabled), and resolving server addresses. This process is computationally
80+
expensive and introduces significant latency.
81+
82+
### When to Call `DestroyClient`
83+
84+
The `DestroyClient` function should only be called when you are certain that no
85+
more gRPC calls will be made for the remainder of the program's lifetime,
86+
typically during application shutdown. Calling `DestroyClient` will tear down
87+
all underlying network connections, and any subsequent attempt to use the
88+
client instance will result in an error.
89+
90+
## Monitoring and Debugging
91+
92+
While the ATE client library does not expose a direct API to query the health
93+
of individual server endpoints, it is possible to monitor the underlying gRPC
94+
channel's behavior using gRPC's built-in tracing capabilities. This is an
95+
effective method for debugging connection issues and observing the load
96+
balancer's real-time behavior.
97+
98+
### Enabling gRPC Tracing
99+
100+
You can enable detailed logging by setting environment variables in your shell
101+
before launching the application that uses the ATE client library.
102+
103+
```bash
104+
# Enable tracing for connectivity state, resolvers, and load balancing
105+
export GRPC_TRACE=connectivity_state,resolver,load_balancer
106+
107+
# Set the logging verbosity for maximum detail
108+
export GRPC_VERBOSITY=DEBUG
109+
```
110+
111+
### Interpreting the Output
112+
113+
When tracing is enabled, the gRPC runtime will print detailed logs to `stderr`.
114+
If a server in the load balancing pool becomes unavailable, you will see log
115+
entries showing the subchannel's state changing from `READY` to `CONNECTING`
116+
and then to `TRANSIENT_FAILURE`. When the server becomes available again, the
117+
logs will show the state transitioning back to `READY`.
118+
119+
This provides a definitive, real-time view of the connection health from the
120+
client's perspective and is an useful tool in active debugging sessions.
121+
122+
## Running an ATE Test Program
123+
124+
Before running, ensure you have:
125+
* Generated the required
126+
[endpoint certificates](https://github.com/lowRISC/opentitan-provisioning/wiki/auth#endpoint-certificates).
127+
* Started one or more
128+
[PA servers](https://github.com/lowRISC/opentitan-provisioning/wiki/pa#start-pa-server).
129+
130+
The following example shows how to run the `ft` test program with load
131+
balancing enabled against two PA servers.
21132

22133
```console
23-
bazelisk build //src/ate:ate_main
24-
bazel-bin/src/ate/ate_main \
25-
--target=localhost:5001 \
134+
# The specific test program can be :cp or :ft
135+
bazelisk run //src/ate/test_programs:cp -- \
136+
--pa_target="ipv4:localhost:5001,localhost:5002" \
137+
--load_balancing_policy="round_robin" \
26138
--enable_mtls \
27139
--client_key=$(pwd)/config/certs/out/ate-client-key.pem \
28140
--client_cert=$(pwd)/config/certs/out/ate-client-cert.pem \
29-
--ca_root_certs=$(pwd)/config/certs/out/ca-cert.pem
141+
--ca_root_certs=$(pwd)/config/certs/out/ca-cert.pem \
142+
--sku="sival" \
143+
--sku_auth_pw="test_password"
30144
```
31145

32146
## Read More
33147

34-
* [Provisioning Appliance](https://github.com/lowRISC/opentitan-provisioning/wiki/pa)
35-
* [Documentation index](https://github.com/lowRISC/opentitan-provisioning/wiki/Home)
148+
* [Provisioning Appliance](https://github.com/lowRISC/opentitan-provisioning/wiki/pa)
149+
* [Documentation index](https://github.com/lowRISC/opentitan-provisioning/wiki/Home)

run_integration_tests.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ for OTSKU in "${FPGA_SKUS[@]}"; do
6767
--client_cert="${DEPLOYMENT_DIR}/certs/out/ate-client-cert.pem" \
6868
--client_key="${DEPLOYMENT_DIR}/certs/out/ate-client-key.pem" \
6969
--ca_root_certs=${DEPLOYMENT_DIR}/certs/out/ca-cert.pem \
70-
--pa_socket="ipv4:${OTPROV_IP_PA}:${OTPROV_PORT_PA}" \
70+
--pa_target="ipv4:${OTPROV_IP_PA}:${OTPROV_PORT_PA}" \
7171
--sku="${OTSKU}" \
7272
--sku_auth_pw="test_password" \
7373
--fpga="${FPGA}" \
@@ -82,7 +82,7 @@ for OTSKU in "${FPGA_SKUS[@]}"; do
8282
--client_cert="${DEPLOYMENT_DIR}/certs/out/ate-client-cert.pem" \
8383
--client_key="${DEPLOYMENT_DIR}/certs/out/ate-client-key.pem" \
8484
--ca_root_certs=${DEPLOYMENT_DIR}/certs/out/ca-cert.pem \
85-
--pa_socket="ipv4:${OTPROV_IP_PA}:${OTPROV_PORT_PA}" \
85+
--pa_target="ipv4:${OTPROV_IP_PA}:${OTPROV_PORT_PA}" \
8686
--sku="${OTSKU}" \
8787
--sku_auth_pw="test_password" \
8888
--fpga="${FPGA}" \

src/ate/ate_api.h

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,9 @@ enum {
7878
* provisioning_data.h in the lowRISC/opentitan repo.
7979
*/
8080
kPersoBlobMaxSize = 8192,
81+
82+
/** Maximum length of an endpoint address string. */
83+
kEndpointAddressMaxSize = 256,
8184
};
8285

8386
/**
@@ -87,9 +90,16 @@ typedef struct {
8790
} * ate_client_ptr;
8891

8992
typedef struct {
90-
// Endpoint address in IP or DNS format including port number. For example:
91-
// "localhost:5000".
92-
const char* pa_socket;
93+
// Endpoint address in gRPC name-syntax format, including port number. For
94+
// example: "localhost:5000", "ipv4:127.0.0.1:5000,127.0.0.2:5000", or
95+
// "ipv6:[::1]:5000,[::1]:5001".
96+
// Using a single address will disable load balancing.
97+
const char* pa_target;
98+
99+
// gRPC load balancing policy. If not set, it will be selected by the gRPC
100+
// library. For example: "round_robin" or "pick_first". Leaving this field
101+
// empty will use the default policy.
102+
const char* load_balancing_policy;
93103

94104
// File containing the Client certificate in PEM format. Required when
95105
// `enable_mtls` set to true.

src/ate/ate_client.cc

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,12 @@ std::unique_ptr<AteClient> AteClient::Create(AteClient::Options options) {
9292
credentials = BuildCredentials(options);
9393
}
9494
// 2. create the grpc channel between the client and the targeted server
95-
auto channel = grpc::CreateChannel(options.pa_socket, credentials);
95+
grpc::ChannelArguments args;
96+
if (!options.load_balancing_policy.empty()) {
97+
args.SetLoadBalancingPolicyName(options.load_balancing_policy);
98+
}
99+
auto channel =
100+
grpc::CreateCustomChannel(options.pa_target, credentials, args);
96101
auto ate = absl::make_unique<AteClient>(
97102
ProvisioningApplianceService::NewStub(channel));
98103

@@ -189,7 +194,9 @@ Status AteClient::RegisterDevice(RegistrationRequest& request,
189194
// overloads operator<< for AteClient::Options objects printouts
190195
std::ostream& operator<<(std::ostream& os, const AteClient::Options& options) {
191196
// write obj to stream
192-
os << std::endl << "options.pa_socket = " << options.pa_socket << std::endl;
197+
os << std::endl << "options.pa_target = " << options.pa_target << std::endl;
198+
os << "options.load_balancing_policy = " << options.load_balancing_policy
199+
<< std::endl;
193200
os << "options.enable_mtls = " << options.enable_mtls << std::endl;
194201
os << "options.pem_cert_chain = " << options.pem_cert_chain << std::endl;
195202
os << "options.pem_private_key = " << options.pem_private_key << std::endl;

src/ate/ate_client.h

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,14 @@ namespace ate {
2020
class AteClient {
2121
public:
2222
struct Options {
23-
// Endpoint address in IP or DNS format including port number. For example:
24-
// "localhost:5000".
25-
std::string pa_socket;
23+
// Endpoint address in gRPC name-syntax format, including port number. For
24+
// example: "localhost:5000", "ipv4:127.0.0.1:5000,127.0.0.2:5000", or
25+
// "ipv6:[::1]:5000,[::1]:5001".
26+
std::string pa_target;
27+
28+
// gRPC load balancing policy. If not set, it will be selected by the gRPC
29+
// library. For example: "round_robin" or "pick_first".
30+
std::string load_balancing_policy;
2631

2732
// Set to true to enable mTLS connection. When set to false, the connection
2833
// is established with insecure credentials.

src/ate/ate_dll.cc

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,10 @@ DLLEXPORT int CreateClient(
178178

179179
// convert from ate_client_ptr to AteClient::Options
180180
o.enable_mtls = options->enable_mtls;
181-
o.pa_socket = options->pa_socket;
181+
o.pa_target = options->pa_target;
182+
if (options->load_balancing_policy != nullptr) {
183+
o.load_balancing_policy = options->load_balancing_policy;
184+
}
182185
if (o.enable_mtls) {
183186
// Load the PEM data from the pointed files
184187
absl::Status s =

src/ate/test_programs/cp.cc

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,15 @@ ABSL_FLAG(std::string, cp_sram_elf, "", "CP SRAM ELF (device binary).");
3939
/**
4040
* PA configuration flags.
4141
*/
42-
ABSL_FLAG(std::string, pa_socket, "", "host:port of the PA server.");
42+
ABSL_FLAG(std::string, pa_target, "",
43+
"Endpoint address in gRPC name-syntax format, including port "
44+
"number. For example: \"localhost:5000\", "
45+
"\"ipv4:127.0.0.1:5000,127.0.0.2:5000\", or "
46+
"\"ipv6:[::1]:5000,[::1]:5001\".");
47+
ABSL_FLAG(std::string, load_balancing_policy, "",
48+
"gRPC load balancing policy. If not set, it will be selected by "
49+
"the gRPC library. For example: \"round_robin\" or "
50+
"\"pick_first\".");
4351
ABSL_FLAG(std::string, sku, "", "SKU string to initialize the PA session.");
4452
ABSL_FLAG(std::string, sku_auth_pw, "",
4553
"SKU authorization password string to initialize the PA session.");
@@ -62,14 +70,17 @@ using provisioning::test_programs::DutLib;
6270
absl::StatusOr<ate_client_ptr> AteClientNew(void) {
6371
client_options_t options;
6472

65-
std::string pa_socket = absl::GetFlag(FLAGS_pa_socket);
66-
if (pa_socket.empty()) {
73+
std::string pa_target = absl::GetFlag(FLAGS_pa_target);
74+
if (pa_target.empty()) {
6775
return absl::InvalidArgumentError(
68-
"--pa_socket not set. This is a required argument.");
76+
"--pa_target not set. This is a required argument.");
6977
}
70-
options.pa_socket = pa_socket.c_str();
78+
options.pa_target = pa_target.c_str();
7179
options.enable_mtls = absl::GetFlag(FLAGS_enable_mtls);
7280

81+
std::string lb_policy = absl::GetFlag(FLAGS_load_balancing_policy);
82+
options.load_balancing_policy = lb_policy.c_str();
83+
7384
std::string pem_private_key = absl::GetFlag(FLAGS_client_key);
7485
std::string pem_cert_chain = absl::GetFlag(FLAGS_client_cert);
7586
std::string pem_root_certs = absl::GetFlag(FLAGS_ca_root_certs);

src/ate/test_programs/ft.cc

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,15 @@ ABSL_FLAG(std::string, ft_fw_bundle_bin, "",
4242
/**
4343
* PA configuration flags.
4444
*/
45-
ABSL_FLAG(std::string, pa_socket, "", "host:port of the PA server.");
45+
ABSL_FLAG(std::string, pa_target, "",
46+
"Endpoint address in gRPC name-syntax format, including port "
47+
"number. For example: \"localhost:5000\", "
48+
"\"ipv4:127.0.0.1:5000,127.0.0.2:5000\", or "
49+
"\"ipv6:[::1]:5000,[::1]:5001\".");
50+
ABSL_FLAG(std::string, load_balancing_policy, "",
51+
"gRPC load balancing policy. If not set, it will be selected by "
52+
"the gRPC library. For example: \"round_robin\" or "
53+
"\"pick_first\".");
4654
ABSL_FLAG(std::string, sku, "", "SKU string to initialize the PA session.");
4755
ABSL_FLAG(std::string, sku_auth_pw, "",
4856
"SKU authorization password string to initialize the PA session.");
@@ -65,14 +73,17 @@ using provisioning::test_programs::DutLib;
6573
absl::StatusOr<ate_client_ptr> AteClientNew(void) {
6674
client_options_t options;
6775

68-
std::string pa_socket = absl::GetFlag(FLAGS_pa_socket);
69-
if (pa_socket.empty()) {
76+
std::string pa_target = absl::GetFlag(FLAGS_pa_target);
77+
if (pa_target.empty()) {
7078
return absl::InvalidArgumentError(
71-
"--pa_socket not set. This is a required argument.");
79+
"--pa_target not set. This is a required argument.");
7280
}
73-
options.pa_socket = pa_socket.c_str();
81+
options.pa_target = pa_target.c_str();
7482
options.enable_mtls = absl::GetFlag(FLAGS_enable_mtls);
7583

84+
std::string lb_policy = absl::GetFlag(FLAGS_load_balancing_policy);
85+
options.load_balancing_policy = lb_policy.c_str();
86+
7687
std::string pem_private_key = absl::GetFlag(FLAGS_client_key);
7788
std::string pem_cert_chain = absl::GetFlag(FLAGS_client_cert);
7889
std::string pem_root_certs = absl::GetFlag(FLAGS_ca_root_certs);

0 commit comments

Comments
 (0)