This class provides the ability to make remote calls to the backing service through method + * calls that map to API methods. Sample code to get started: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * GenerateOptimizedManifestRequest request =
+ * GenerateOptimizedManifestRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setAcceleratorType("acceleratorType-82462651")
+ * .setKubernetesNamespace("kubernetesNamespace-1862862667")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setStorageConfig(StorageConfig.newBuilder().build())
+ * .build();
+ * GenerateOptimizedManifestResponse response =
+ * gkeInferenceQuickstartClient.generateOptimizedManifest(request);
+ * }
+ * }
+ *
+ * Note: close() needs to be called on the GkeInferenceQuickstartClient object to clean up + * resources such as threads. In the example above, try-with-resources is used, which automatically + * calls close(). + * + *
| Method | + *Description | + *Method Variants | + *
|---|---|---|
FetchModels |
+ * Fetches available models. Open-source models follow the Huggingface Hub `owner/model_name` format. |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
FetchModelServers |
+ * Fetches available model servers. Open-source model servers use simplified, lowercase names (e.g., `vllm`). |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
FetchModelServerVersions |
+ * Fetches available model server versions. Open-source servers use their own versioning schemas (e.g., `vllm` uses semver like `v1.0.0`). + * Some model servers have different versioning schemas depending on the accelerator. For example, `vllm` uses semver on GPUs, but returns nightly build tags on TPUs. All available versions will be returned when different schemas are present. |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
FetchProfiles |
+ * Fetches available profiles. A profile contains performance metrics and cost information for a specific model server setup. Profiles can be filtered by parameters. If no filters are provided, all profiles are returned. + * Profiles display a single value per performance metric based on the provided performance requirements. If no requirements are given, the metrics represent the inflection point. See [Run best practice inference with GKE Inference Quickstart recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart#how) for details. |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
GenerateOptimizedManifest |
+ * Generates an optimized deployment manifest for a given model and model server, based on the specified accelerator, performance targets, and configurations. See [Run best practice inference with GKE Inference Quickstart recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart) for deployment details. |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
FetchBenchmarkingData |
+ * Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of the performance metrics available for a given model server setup on a given instance type. |
+ *
+ * Request object method variants only take one parameter, a request object, which must be constructed before the call. + *
Callable method variants take no parameters and return an immutable API callable object, which can be used to initiate calls to the service. + *
|
+ *
See the individual methods for example code. + * + *
Many parameters require resource names to be formatted in a particular way. To assist with + * these names, this class includes a format method for each type of name, and additionally a parse + * method to extract the individual identifiers contained within names that are returned. + * + *
This class can be customized by passing in a custom instance of GkeInferenceQuickstartSettings + * to create(). For example: + * + *
To customize credentials: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
+ * GkeInferenceQuickstartSettings.newBuilder()
+ * .setCredentialsProvider(FixedCredentialsProvider.create(myCredentials))
+ * .build();
+ * GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
+ * }
+ *
+ * To customize the endpoint: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
+ * GkeInferenceQuickstartSettings.newBuilder().setEndpoint(myEndpoint).build();
+ * GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
+ * }
+ *
+ * To use REST (HTTP1.1/JSON) transport (instead of gRPC) for sending and receiving requests over + * the wire: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
+ * GkeInferenceQuickstartSettings.newHttpJsonBuilder().build();
+ * GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create(gkeInferenceQuickstartSettings);
+ * }
+ *
+ * Please refer to the GitHub repository's samples for more quickstart code snippets. + */ +@Generated("by gapic-generator-java") +public class GkeInferenceQuickstartClient implements BackgroundResource { + private final GkeInferenceQuickstartSettings settings; + private final GkeInferenceQuickstartStub stub; + + /** Constructs an instance of GkeInferenceQuickstartClient with default settings. */ + public static final GkeInferenceQuickstartClient create() throws IOException { + return create(GkeInferenceQuickstartSettings.newBuilder().build()); + } + + /** + * Constructs an instance of GkeInferenceQuickstartClient, using the given settings. The channels + * are created based on the settings passed in, or defaults for any settings that are not set. + */ + public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartSettings settings) + throws IOException { + return new GkeInferenceQuickstartClient(settings); + } + + /** + * Constructs an instance of GkeInferenceQuickstartClient, using the given stub for making calls. + * This is for advanced usage - prefer using create(GkeInferenceQuickstartSettings). + */ + public static final GkeInferenceQuickstartClient create(GkeInferenceQuickstartStub stub) { + return new GkeInferenceQuickstartClient(stub); + } + + /** + * Constructs an instance of GkeInferenceQuickstartClient, using the given settings. This is + * protected so that it is easy to make a subclass, but otherwise, the static factory methods + * should be preferred. + */ + protected GkeInferenceQuickstartClient(GkeInferenceQuickstartSettings settings) + throws IOException { + this.settings = settings; + this.stub = ((GkeInferenceQuickstartStubSettings) settings.getStubSettings()).createStub(); + } + + protected GkeInferenceQuickstartClient(GkeInferenceQuickstartStub stub) { + this.settings = null; + this.stub = stub; + } + + public final GkeInferenceQuickstartSettings getSettings() { + return settings; + } + + public GkeInferenceQuickstartStub getStub() { + return stub; + } + + // AUTO-GENERATED DOCUMENTATION AND METHOD. + /** + * Fetches available models. Open-source models follow the Huggingface Hub `owner/model_name` + * format. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelsRequest request =
+ * FetchModelsRequest.newBuilder()
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * for (String element : gkeInferenceQuickstartClient.fetchModels(request).iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final FetchModelsPagedResponse fetchModels(FetchModelsRequest request) {
+ return fetchModelsPagedCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Fetches available models. Open-source models follow the Huggingface Hub `owner/model_name`
+ * format.
+ *
+ * Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelsRequest request =
+ * FetchModelsRequest.newBuilder()
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.fetchModelsPagedCallable().futureCall(request);
+ * // Do something.
+ * for (String element : future.get().iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelsRequest request =
+ * FetchModelsRequest.newBuilder()
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * while (true) {
+ * FetchModelsResponse response =
+ * gkeInferenceQuickstartClient.fetchModelsCallable().call(request);
+ * for (String element : response.getModelsList()) {
+ * // doThingsWith(element);
+ * }
+ * String nextPageToken = response.getNextPageToken();
+ * if (!Strings.isNullOrEmpty(nextPageToken)) {
+ * request = request.toBuilder().setPageToken(nextPageToken).build();
+ * } else {
+ * break;
+ * }
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServersRequest request =
+ * FetchModelServersRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * for (String element : gkeInferenceQuickstartClient.fetchModelServers(request).iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final FetchModelServersPagedResponse fetchModelServers(FetchModelServersRequest request) {
+ return fetchModelServersPagedCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Fetches available model servers. Open-source model servers use simplified, lowercase names
+ * (e.g., `vllm`).
+ *
+ * Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServersRequest request =
+ * FetchModelServersRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.fetchModelServersPagedCallable().futureCall(request);
+ * // Do something.
+ * for (String element : future.get().iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServersRequest request =
+ * FetchModelServersRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * while (true) {
+ * FetchModelServersResponse response =
+ * gkeInferenceQuickstartClient.fetchModelServersCallable().call(request);
+ * for (String element : response.getModelServersList()) {
+ * // doThingsWith(element);
+ * }
+ * String nextPageToken = response.getNextPageToken();
+ * if (!Strings.isNullOrEmpty(nextPageToken)) {
+ * request = request.toBuilder().setPageToken(nextPageToken).build();
+ * } else {
+ * break;
+ * }
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSome model servers have different versioning schemas depending on the accelerator. For + * example, `vllm` uses semver on GPUs, but returns nightly build tags on TPUs. All available + * versions will be returned when different schemas are present. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServerVersionsRequest request =
+ * FetchModelServerVersionsRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * for (String element :
+ * gkeInferenceQuickstartClient.fetchModelServerVersions(request).iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final FetchModelServerVersionsPagedResponse fetchModelServerVersions(
+ FetchModelServerVersionsRequest request) {
+ return fetchModelServerVersionsPagedCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Fetches available model server versions. Open-source servers use their own versioning schemas
+ * (e.g., `vllm` uses semver like `v1.0.0`).
+ *
+ * Some model servers have different versioning schemas depending on the accelerator. For + * example, `vllm` uses semver on GPUs, but returns nightly build tags on TPUs. All available + * versions will be returned when different schemas are present. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServerVersionsRequest request =
+ * FetchModelServerVersionsRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.fetchModelServerVersionsPagedCallable().futureCall(request);
+ * // Do something.
+ * for (String element : future.get().iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSome model servers have different versioning schemas depending on the accelerator. For + * example, `vllm` uses semver on GPUs, but returns nightly build tags on TPUs. All available + * versions will be returned when different schemas are present. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchModelServerVersionsRequest request =
+ * FetchModelServerVersionsRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * while (true) {
+ * FetchModelServerVersionsResponse response =
+ * gkeInferenceQuickstartClient.fetchModelServerVersionsCallable().call(request);
+ * for (String element : response.getModelServerVersionsList()) {
+ * // doThingsWith(element);
+ * }
+ * String nextPageToken = response.getNextPageToken();
+ * if (!Strings.isNullOrEmpty(nextPageToken)) {
+ * request = request.toBuilder().setPageToken(nextPageToken).build();
+ * } else {
+ * break;
+ * }
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableProfiles display a single value per performance metric based on the provided performance + * requirements. If no requirements are given, the metrics represent the inflection point. See + * [Run best practice inference with GKE Inference Quickstart + * recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart#how) + * for details. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchProfilesRequest request =
+ * FetchProfilesRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setModelServerVersion("modelServerVersion77054828")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * for (Profile element : gkeInferenceQuickstartClient.fetchProfiles(request).iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final FetchProfilesPagedResponse fetchProfiles(FetchProfilesRequest request) {
+ return fetchProfilesPagedCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Fetches available profiles. A profile contains performance metrics and cost information for a
+ * specific model server setup. Profiles can be filtered by parameters. If no filters are
+ * provided, all profiles are returned.
+ *
+ * Profiles display a single value per performance metric based on the provided performance + * requirements. If no requirements are given, the metrics represent the inflection point. See + * [Run best practice inference with GKE Inference Quickstart + * recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart#how) + * for details. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchProfilesRequest request =
+ * FetchProfilesRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setModelServerVersion("modelServerVersion77054828")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.fetchProfilesPagedCallable().futureCall(request);
+ * // Do something.
+ * for (Profile element : future.get().iterateAll()) {
+ * // doThingsWith(element);
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableProfiles display a single value per performance metric based on the provided performance + * requirements. If no requirements are given, the metrics represent the inflection point. See + * [Run best practice inference with GKE Inference Quickstart + * recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart#how) + * for details. + * + *
Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchProfilesRequest request =
+ * FetchProfilesRequest.newBuilder()
+ * .setModel("model104069929")
+ * .setModelServer("modelServer475157452")
+ * .setModelServerVersion("modelServerVersion77054828")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setPageSize(883849137)
+ * .setPageToken("pageToken873572522")
+ * .build();
+ * while (true) {
+ * FetchProfilesResponse response =
+ * gkeInferenceQuickstartClient.fetchProfilesCallable().call(request);
+ * for (Profile element : response.getProfileList()) {
+ * // doThingsWith(element);
+ * }
+ * String nextPageToken = response.getNextPageToken();
+ * if (!Strings.isNullOrEmpty(nextPageToken)) {
+ * request = request.toBuilder().setPageToken(nextPageToken).build();
+ * } else {
+ * break;
+ * }
+ * }
+ * }
+ * }
+ */
+ public final UnaryCallableSample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * GenerateOptimizedManifestRequest request =
+ * GenerateOptimizedManifestRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setAcceleratorType("acceleratorType-82462651")
+ * .setKubernetesNamespace("kubernetesNamespace-1862862667")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setStorageConfig(StorageConfig.newBuilder().build())
+ * .build();
+ * GenerateOptimizedManifestResponse response =
+ * gkeInferenceQuickstartClient.generateOptimizedManifest(request);
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final GenerateOptimizedManifestResponse generateOptimizedManifest(
+ GenerateOptimizedManifestRequest request) {
+ return generateOptimizedManifestCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Generates an optimized deployment manifest for a given model and model server, based on the
+ * specified accelerator, performance targets, and configurations. See [Run best practice
+ * inference with GKE Inference Quickstart
+ * recipes](https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/inference-quickstart)
+ * for deployment details.
+ *
+ * Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * GenerateOptimizedManifestRequest request =
+ * GenerateOptimizedManifestRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setAcceleratorType("acceleratorType-82462651")
+ * .setKubernetesNamespace("kubernetesNamespace-1862862667")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setStorageConfig(StorageConfig.newBuilder().build())
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.generateOptimizedManifestCallable().futureCall(request);
+ * // Do something.
+ * GenerateOptimizedManifestResponse response = future.get();
+ * }
+ * }
+ */
+ public final UnaryCallableSample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchBenchmarkingDataRequest request =
+ * FetchBenchmarkingDataRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setInstanceType("instanceType-737655441")
+ * .setPricingModel("pricingModel1050892035")
+ * .build();
+ * FetchBenchmarkingDataResponse response =
+ * gkeInferenceQuickstartClient.fetchBenchmarkingData(request);
+ * }
+ * }
+ *
+ * @param request The request object containing all of the parameters for the API call.
+ * @throws com.google.api.gax.rpc.ApiException if the remote call fails
+ */
+ public final FetchBenchmarkingDataResponse fetchBenchmarkingData(
+ FetchBenchmarkingDataRequest request) {
+ return fetchBenchmarkingDataCallable().call(request);
+ }
+
+ // AUTO-GENERATED DOCUMENTATION AND METHOD.
+ /**
+ * Fetches all of the benchmarking data available for a profile. Benchmarking data returns all of
+ * the performance metrics available for a given model server setup on a given instance type.
+ *
+ * Sample code: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * FetchBenchmarkingDataRequest request =
+ * FetchBenchmarkingDataRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setInstanceType("instanceType-737655441")
+ * .setPricingModel("pricingModel1050892035")
+ * .build();
+ * ApiFuture future =
+ * gkeInferenceQuickstartClient.fetchBenchmarkingDataCallable().futureCall(request);
+ * // Do something.
+ * FetchBenchmarkingDataResponse response = future.get();
+ * }
+ * }
+ */
+ public final UnaryCallableThe default instance has everything set to sensible defaults: + * + *
The builder of this class is recursive, so contained classes are themselves builders. When + * build() is called, the tree of builders is called to create the complete settings object. + * + *
For example, to set the + * [RetrySettings](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings) + * of generateOptimizedManifest: + * + *
{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * GkeInferenceQuickstartSettings.Builder gkeInferenceQuickstartSettingsBuilder =
+ * GkeInferenceQuickstartSettings.newBuilder();
+ * gkeInferenceQuickstartSettingsBuilder
+ * .generateOptimizedManifestSettings()
+ * .setRetrySettings(
+ * gkeInferenceQuickstartSettingsBuilder
+ * .generateOptimizedManifestSettings()
+ * .getRetrySettings()
+ * .toBuilder()
+ * .setInitialRetryDelayDuration(Duration.ofSeconds(1))
+ * .setInitialRpcTimeoutDuration(Duration.ofSeconds(5))
+ * .setMaxAttempts(5)
+ * .setMaxRetryDelayDuration(Duration.ofSeconds(30))
+ * .setMaxRpcTimeoutDuration(Duration.ofSeconds(60))
+ * .setRetryDelayMultiplier(1.3)
+ * .setRpcTimeoutMultiplier(1.5)
+ * .setTotalTimeoutDuration(Duration.ofSeconds(300))
+ * .build());
+ * GkeInferenceQuickstartSettings gkeInferenceQuickstartSettings =
+ * gkeInferenceQuickstartSettingsBuilder.build();
+ * }
+ *
+ * Please refer to the [Client Side Retry
+ * Guide](https://github.com/googleapis/google-cloud-java/blob/main/docs/client_retries.md) for
+ * additional support in setting retries.
+ */
+@Generated("by gapic-generator-java")
+public class GkeInferenceQuickstartSettings extends ClientSettingsNote: This method does not support applying settings to streaming methods.
+ */
+ public Builder applyToAllUnaryMethods(
+ ApiFunction The interfaces provided are listed below, along with usage samples.
+ *
+ * ======================= GkeInferenceQuickstartClient =======================
+ *
+ * Service Description: GKE Inference Quickstart (GIQ) service provides profiles with performance
+ * metrics for popular models and model servers across multiple accelerators. These profiles help
+ * generate optimized best practices for running inference on GKE.
+ *
+ * Sample for GkeInferenceQuickstartClient:
+ *
+ * This class is for advanced usage and reflects the underlying API directly.
+ */
+@Generated("by gapic-generator-java")
+public abstract class GkeInferenceQuickstartStub implements BackgroundResource {
+
+ public UnaryCallable The default instance has everything set to sensible defaults:
+ *
+ * The builder of this class is recursive, so contained classes are themselves builders. When
+ * build() is called, the tree of builders is called to create the complete settings object.
+ *
+ * For example, to set the
+ * [RetrySettings](https://cloud.google.com/java/docs/reference/gax/latest/com.google.api.gax.retrying.RetrySettings)
+ * of generateOptimizedManifest:
+ *
+ * Note: This method does not support applying settings to streaming methods.
+ */
+ public Builder applyToAllUnaryMethods(
+ ApiFunction This class is for advanced usage.
+ */
+@Generated("by gapic-generator-java")
+public class GrpcGkeInferenceQuickstartCallableFactory implements GrpcStubCallableFactory {
+
+ @Override
+ public This class is for advanced usage and reflects the underlying API directly.
+ */
+@Generated("by gapic-generator-java")
+public class GrpcGkeInferenceQuickstartStub extends GkeInferenceQuickstartStub {
+ private static final MethodDescriptor This class is for advanced usage.
+ */
+@Generated("by gapic-generator-java")
+public class HttpJsonGkeInferenceQuickstartCallableFactory
+ implements HttpJsonStubCallableFactory This class is for advanced usage and reflects the underlying API directly.
+ */
+@Generated("by gapic-generator-java")
+public class HttpJsonGkeInferenceQuickstartStub extends GkeInferenceQuickstartStub {
+ private static final TypeRegistry typeRegistry = TypeRegistry.newBuilder().build();
+
+ private static final ApiMethodDescriptor{@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * try (GkeInferenceQuickstartClient gkeInferenceQuickstartClient =
+ * GkeInferenceQuickstartClient.create()) {
+ * GenerateOptimizedManifestRequest request =
+ * GenerateOptimizedManifestRequest.newBuilder()
+ * .setModelServerInfo(ModelServerInfo.newBuilder().build())
+ * .setAcceleratorType("acceleratorType-82462651")
+ * .setKubernetesNamespace("kubernetesNamespace-1862862667")
+ * .setPerformanceRequirements(PerformanceRequirements.newBuilder().build())
+ * .setStorageConfig(StorageConfig.newBuilder().build())
+ * .build();
+ * GenerateOptimizedManifestResponse response =
+ * gkeInferenceQuickstartClient.generateOptimizedManifest(request);
+ * }
+ * }
+ */
+@Generated("by gapic-generator-java")
+package com.google.cloud.gkerecommender.v1;
+
+import javax.annotation.Generated;
diff --git a/java-gkerecommender/google-cloud-gkerecommender/src/main/java/com/google/cloud/gkerecommender/v1/stub/GkeInferenceQuickstartStub.java b/java-gkerecommender/google-cloud-gkerecommender/src/main/java/com/google/cloud/gkerecommender/v1/stub/GkeInferenceQuickstartStub.java
new file mode 100644
index 000000000000..8742a3d47fef
--- /dev/null
+++ b/java-gkerecommender/google-cloud-gkerecommender/src/main/java/com/google/cloud/gkerecommender/v1/stub/GkeInferenceQuickstartStub.java
@@ -0,0 +1,99 @@
+/*
+ * Copyright 2025 Google LLC
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package com.google.cloud.gkerecommender.v1.stub;
+
+import static com.google.cloud.gkerecommender.v1.GkeInferenceQuickstartClient.FetchModelServerVersionsPagedResponse;
+import static com.google.cloud.gkerecommender.v1.GkeInferenceQuickstartClient.FetchModelServersPagedResponse;
+import static com.google.cloud.gkerecommender.v1.GkeInferenceQuickstartClient.FetchModelsPagedResponse;
+import static com.google.cloud.gkerecommender.v1.GkeInferenceQuickstartClient.FetchProfilesPagedResponse;
+
+import com.google.api.gax.core.BackgroundResource;
+import com.google.api.gax.rpc.UnaryCallable;
+import com.google.cloud.gkerecommender.v1.FetchBenchmarkingDataRequest;
+import com.google.cloud.gkerecommender.v1.FetchBenchmarkingDataResponse;
+import com.google.cloud.gkerecommender.v1.FetchModelServerVersionsRequest;
+import com.google.cloud.gkerecommender.v1.FetchModelServerVersionsResponse;
+import com.google.cloud.gkerecommender.v1.FetchModelServersRequest;
+import com.google.cloud.gkerecommender.v1.FetchModelServersResponse;
+import com.google.cloud.gkerecommender.v1.FetchModelsRequest;
+import com.google.cloud.gkerecommender.v1.FetchModelsResponse;
+import com.google.cloud.gkerecommender.v1.FetchProfilesRequest;
+import com.google.cloud.gkerecommender.v1.FetchProfilesResponse;
+import com.google.cloud.gkerecommender.v1.GenerateOptimizedManifestRequest;
+import com.google.cloud.gkerecommender.v1.GenerateOptimizedManifestResponse;
+import javax.annotation.Generated;
+
+// AUTO-GENERATED DOCUMENTATION AND CLASS.
+/**
+ * Base stub class for the GkeInferenceQuickstart service API.
+ *
+ *
+ *
+ *
+ * {@code
+ * // This snippet has been automatically generated and should be regarded as a code template only.
+ * // It will require modifications to work:
+ * // - It may require correct/in-range values for request initialization.
+ * // - It may require specifying regional endpoints when creating the service client as shown in
+ * // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
+ * GkeInferenceQuickstartStubSettings.Builder gkeInferenceQuickstartSettingsBuilder =
+ * GkeInferenceQuickstartStubSettings.newBuilder();
+ * gkeInferenceQuickstartSettingsBuilder
+ * .generateOptimizedManifestSettings()
+ * .setRetrySettings(
+ * gkeInferenceQuickstartSettingsBuilder
+ * .generateOptimizedManifestSettings()
+ * .getRetrySettings()
+ * .toBuilder()
+ * .setInitialRetryDelayDuration(Duration.ofSeconds(1))
+ * .setInitialRpcTimeoutDuration(Duration.ofSeconds(5))
+ * .setMaxAttempts(5)
+ * .setMaxRetryDelayDuration(Duration.ofSeconds(30))
+ * .setMaxRpcTimeoutDuration(Duration.ofSeconds(60))
+ * .setRetryDelayMultiplier(1.3)
+ * .setRpcTimeoutMultiplier(1.5)
+ * .setTotalTimeoutDuration(Duration.ofSeconds(300))
+ * .build());
+ * GkeInferenceQuickstartStubSettings gkeInferenceQuickstartSettings =
+ * gkeInferenceQuickstartSettingsBuilder.build();
+ * }
+ *
+ * Please refer to the [Client Side Retry
+ * Guide](https://github.com/googleapis/google-cloud-java/blob/main/docs/client_retries.md) for
+ * additional support in setting retries.
+ */
+@Generated("by gapic-generator-java")
+public class GkeInferenceQuickstartStubSettings
+ extends StubSettings