triton-inference-server
diff --git a/‎README.md‎
Lines changed: 5 additions & 2 deletions b/‎README.md‎
Lines changed: 5 additions & 2 deletions
diff --git a/‎docs/config.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/config.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/config_search.md‎
Lines changed: 24 additions & 7 deletions b/‎docs/config_search.md‎
Lines changed: 24 additions & 7 deletions
diff --git a/‎model_analyzer/config/generate/base_model_config_generator.py‎
Lines changed: 16 additions & 9 deletions b/‎model_analyzer/config/generate/base_model_config_generator.py‎
Lines changed: 16 additions & 9 deletions
diff --git a/‎model_analyzer/config/generate/model_profile_spec.py‎
Lines changed: 4 additions & 0 deletions b/‎model_analyzer/config/generate/model_profile_spec.py‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎model_analyzer/config/generate/quick_plus_concurrency_sweep_run_config_generator.py‎
Lines changed: 6 additions & 6 deletions b/‎model_analyzer/config/generate/quick_plus_concurrency_sweep_run_config_generator.py‎
Lines changed: 6 additions & 6 deletions
@@ -18,7 +18,7 @@ limitations under the License.
 
 # Triton Model Analyzer
 
-Triton Model Analyzer is a CLI tool which can help you find a more optimal configuration, on a given piece of hardware, for single, multiple, or ensemble models running on a [Triton Inference Server](https://github.com/triton-inference-server/server/). Model Analyzer will also generate reports to help you better understand the trade-offs of the different configurations along with their compute and memory requirements.
+Triton Model Analyzer is a CLI tool which can help you find a more optimal configuration, on a given piece of hardware, for single, multiple, ensemble, or BLS models running on a [Triton Inference Server](https://github.com/triton-inference-server/server/). Model Analyzer will also generate reports to help you better understand the trade-offs of the different configurations along with their compute and memory requirements.
 <br><br>
 
 # Features
@@ -40,7 +40,10 @@ Triton Model Analyzer is a CLI tool which can help you find a more optimal confi
 ### Model Types
 
 - [Ensemble Model Search](docs/config_search.md#ensemble-model-search): Model Analyzer can help you find the optimal
-  settings when profiling a non-BLS ensemble model, utilizing the [Quick Search](docs/config_search.md#quick-search-mode) algorithm
+  settings when profiling an ensemble model, utilizing the [Quick Search](docs/config_search.md#quick-search-mode) algorithm
+
+- [BLS Model Search](docs/config_search.md#bls-model-search): Model Analyzer can help you find the optimal
+  settings when profiling a BLS model, utilizing the [Quick Search](docs/config_search.md#quick-search-mode) algorithm
 
 - [Multi-Model Search](docs/config_search.md#multi-model-search-mode): **EARLY ACCESS** - Model Analyzer can help you
   find the optimal settings when profiling multiple concurrent models, utilizing the [Quick Search](docs/config_search.md#quick-search-mode) algorithm
 
@@ -89,6 +89,9 @@ model_repository: <string>
 # List of the model names to be profiled
 profile_models: <comma-delimited-string-list>
 
+# List of composing models for BLS models
+bls_composing_models: <comma-delimited-string-list>
+
 # Full path to directory to which to read and write checkpoints and profile data
 [ checkpoint_directory: <string> | default: './checkpoints' ]
 
@@ -252,6 +255,9 @@ The following config options are supported **only by the YAML** config file.
 # YAML config section for each model to be profiled
 profile_models: <comma-delimited-string-list|list|profile_model>
 
+# List of composing models for BLS models
+bls_composing_models: <comma-delimited-string-list>
+
 # List of constraints placed on the config search results
 [ constraints: <constraint> ]
 
 
@@ -23,6 +23,7 @@ limitations under the License.
   - [Manual Brute Search](#manual-brute-search)
 - [Quick Search Mode](#quick-search-mode)
 - [Ensemble Model Search](#ensemble-model-search)
+- [BLS Model Search](#bls-model-search)
 - [Multi-Model Search Mode](#multi-model-search-mode)
 
 <br>
@@ -36,13 +37,14 @@ Model Analyzer's `profile` subcommand supports multiple modes when searching to
 - [Brute Force Search](config_search.md#brute-search-mode)
   - **Search type:** Brute-force sweep of the cross product of all possible configurations
   - **Default for:**
-    - Single non-ensemble models
+    - Single models, which are not ensemble or BLS
     - Multiple models being profiled sequentially
   - **Command:** `--run-config-search-mode brute`
 - [Quick Search](config_search.md#quick-search-mode)
   - **Search type:** Heuristic sweep using a hill-climbing algorithm to find an optimal configuration
   - **Default for:**
     - Single ensemble models
+    - Single BLS models
     - Multiple models being profiled concurrently
   - **Command:** `--run-config-search-mode quick`
 
@@ -54,19 +56,19 @@ Model Analyzer's default search mode depends on the type of model and if you are
 
 - [Sequential (single or multi-model) Search](config_search.md#brute-search-mode)
   - **Default Search type:** [Brute Force Search](config_search.md#brute-search-mode)
-  - **Command:** N/A
 - [Concurrent / Multi-model Search](config_search.md#multi-model-search-mode)
   - **Default Search type:** [Quick Search](config_search.md#quick-search-mode)
   - **Command:** `--run-config-profile-models-concurrently-enable`
 - [Ensemble Model Search](config_search.md#ensemble-model-search):
   - **Default Search type:** [Quick Search](config_search.md#quick-search-mode)
-  - **Command:** N/A
+- [BLS Model Search](config_search.md#bls-model-search):
+  - **Default Search type:** [Quick Search](config_search.md#quick-search-mode)
 
 ---
 
 ## Brute Search Mode
 
-**Default search mode when profiling non-ensemble models sequentially**
+**Default search mode when profiling non-ensemble/BLS models sequentially**
 
 Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. <br>
 It has two modes:
@@ -225,7 +227,7 @@ manual sweep:
 
 ## Quick Search Mode
 
-**Default search mode when profiling ensemble models or multiple models concurrently**
+**Default search mode when profiling ensemble models, BLS models, or multiple models concurrently**
 
 This mode uses a hill climbing algorithm to search the configuration space, looking for
 the maximal objective value within the specified constraints. In the majority of cases
@@ -278,8 +280,23 @@ _This mode has the following limitations:_
 - Can only be run in `quick` search mode
 - Only supports up to four composing models
 - Does not support `cpu_only` option for composing models
+- Composing models cannot be ensemble or BLS models
+
+Ensemble models can be optimized using the Quick Search mode's hill climbing algorithm to search the composing models' configuration spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for ensembles that contain up to four composing models.
+
+After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
+
+---
+
+## BLS Model Search
+
+_This mode has the following limitations:_
+
+- Can only be run in `quick` search mode
+- Only supports up to four composing models
+- Composing models cannot be ensemble or BLS models
 
-Ensemble models can be optimized using the Quick Search mode's hill climbing algorithm to search the ensemble sub-model's configuration spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for ensembles with up to four composing models.
+BLS models can be optimized using the Quick Search mode's hill climbing algorithm to search the BLS composing models' configuration spaces, as well as the BLS model's instance count, in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes under one hour (compared to the days it would take a brute force run to complete) for BLS models that contain up to four composing models.
 
 After Model Analyzer has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the concurrency range before generation of the summary reports.
 
@@ -318,7 +335,7 @@ profile_models:
 
 ### **Model Weighting**
 
-In additon to setting a model's objectives or constraints, in multi-model search mode, you have the ability to set a model's weighting. By default each model is set for equal weighting (value of 1), but in the YAML you can specify `weighting: <int>` which will bias that model's objectives when evaluating for an optimal result.
+In addition to setting a model's objectives or constraints, in multi-model search mode, you have the ability to set a model's weighting. By default each model is set for equal weighting (value of 1), but in the YAML you can specify `weighting: <int>` which will bias that model's objectives when evaluating for an optimal result.
 
 ---
 
 
@@ -22,6 +22,7 @@
 from model_analyzer.constants import LOGGER_NAME
 from model_analyzer.triton.model.model_config import ModelConfig
 from .model_profile_spec import ModelProfileSpec
+from copy import deepcopy
 import abc
 import logging
 
@@ -233,15 +234,6 @@ def make_ensemble_model_config(
         model_config_dict['name'] = variant_name
         model_config = ModelConfig.create_from_dictionary(model_config_dict)
 
-        for composing_model_config in ensemble_composing_model_configs:
-            variant_name = composing_model_config.get_field("name")
-            composing_model_name = BaseModelConfigGenerator.extract_model_name_from_variant_name(
-                variant_name)
-
-            model_config.set_composing_model_variant_name(
-                composing_model_name=composing_model_name,
-                variant_name=variant_name)
-
         return model_config
 
     @staticmethod
@@ -283,6 +275,21 @@ def extract_model_name_from_variant_name(variant_name: str) -> str:
         """
         return variant_name[:variant_name.find("_config_")]
 
+    @staticmethod
+    def create_original_config_from_variant(
+            variant_config: ModelConfig) -> ModelConfig:
+        """
+        Removes 'config_#/default' from the variant config and returns
+        a new model config
+        """
+        original_config = deepcopy(variant_config)
+
+        original_config.set_model_name(
+            BaseModelConfigGenerator.extract_model_name_from_variant_name(
+                variant_config.get_field("name")))
+
+        return original_config
+
     @staticmethod
     def _apply_value_to_dict(key: Any, value: Any, dict_in: Dict) -> None:
         """
 
@@ -52,3 +52,7 @@ def supports_dynamic_batching(self) -> bool:
         if "sequence_batching" in self._default_model_config:
             supports_dynamic_batching = False
         return supports_dynamic_batching
+
+    def is_ensemble(self) -> bool:
+        """ Returns true if the model is an ensemble """
+        return ("ensemble_scheduling" in self._default_model_config)
@@ -47,8 +47,8 @@ class QuickPlusConcurrencySweepRunConfigGenerator(ConfigGeneratorInterface):
     def __init__(self, search_config: SearchConfig,
                  config: ConfigCommandProfile, gpus: List[GPUDevice],
                  models: List[ModelProfileSpec],
-                 ensemble_composing_models: Dict[str, List[ModelProfileSpec]],
-                 client: TritonClient, result_manager: ResultManager,
+                 composing_models: List[ModelProfileSpec], client: TritonClient,
+                 result_manager: ResultManager,
                  model_variant_name_manager: ModelVariantNameManager):
         """
         Parameters
@@ -60,8 +60,8 @@ def __init__(self, search_config: SearchConfig,
         gpus: List of GPUDevices
         models: List of ModelProfileSpec
             List of models to profile
-        ensemble_composing_models: Dict of List of ModelProfileSpec
-            Dict indexed by model name of list of composing models to profile
+        composing_models: List of ModelProfileSpec
+            List of composing models that exist inside of the supplied models
         client: TritonClient
         result_manager: ResultManager
             The object that handles storing and sorting the results from the perf analyzer
@@ -74,7 +74,7 @@ def __init__(self, search_config: SearchConfig,
         self._config = config
         self._gpus = gpus
         self._models = models
-        self._ensemble_composing_models = ensemble_composing_models
+        self._composing_models = composing_models
         self._client = client
         self._result_manager = result_manager
         self._model_variant_name_manager = model_variant_name_manager
@@ -118,7 +118,7 @@ def _create_quick_run_config_generator(self) -> QuickRunConfigGenerator:
             config=self._config,
             gpus=self._gpus,
             models=self._models,
-            ensemble_composing_models=self._ensemble_composing_models,
+            composing_models=self._composing_models,
             client=self._client,
             model_variant_name_manager=self._model_variant_name_manager)