-
Notifications
You must be signed in to change notification settings - Fork 80
Open
Description
I was trying to run model analyzer with triton launch - local (default)
the below commad is run the container (model-analyzer - imagename)
docker run -it --rm --gpus all -v $(pwd):/workspace --net=host model-analyzer
sweep.yaml given below
model_repository: /workspace/model_repositories
triton_launch_mode: local
profile_models:
- minilm
perf_analyzer_flags:
input-data: "random"
triton_server_flags:
log_verbose: True
exit_timeout_secs: 120
and i using this commad to run model-analyzer in container
model-analyzer profile -f sweep.yaml
ISSUE
[Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
Traceback (most recent call last):
File "/workspace/model_analyzer/entrypoint.py", line 198, in create_output_model_repository
os.mkdir(config.output_model_repository_path)
FileExistsError: [Errno 17] File exists: '/workspace/output_model_repository'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in <module>
sys.exit(main())
File "/workspace/model_analyzer/entrypoint.py", line 266, in main
create_output_model_repository(config)
File "/workspace/model_analyzer/entrypoint.py", line 201, in create_output_model_repository
raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Path "/workspace/output_model_repository" already exists. Please set or modify "--output-model-repository-path" flag or remove this directory. You can also allow overriding of the output directory using the "--override-output-model-repository" flag.
root@test-MS-7D70:/workspace# rm -rf /workspace/output_model_repository
root@test-MS-7D70:/workspace# model-analyzer profile -m examples/quick-start --profile-models add_sub
[Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA RTX A4000 with UUID GPU-3fca6544-2e5c-de67-d283-a37b68e716bb
[Model Analyzer] Using GPU 1 NVIDIA RTX A4000 with UUID GPU-6dced96e-d063-1bf2-dcb8-f5d94e67f6a9
[Model Analyzer] Starting a local Triton Server
[Model Analyzer] Loaded checkpoint from file /workspace/checkpoints/6.ckpt
[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition
[Model Analyzer]
[Model Analyzer] Starting automatic brute search
[Model Analyzer]
[Model Analyzer] Creating model config: add_sub_config_default
[Model Analyzer]
[Model Analyzer] Saved checkpoint to /workspace/checkpoints/7.ckpt
Traceback (most recent call last):
File "/workspace/model_analyzer/triton/client/client.py", line 60, in wait_for_server_ready
if self._client.is_server_ready():
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 344, in is_server_ready
raise_error_grpc(rpc_error)
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc
raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/model-analyzer", line 8, in <module>
sys.exit(main())
File "/workspace/model_analyzer/entrypoint.py", line 278, in main
analyzer.profile(
File "/workspace/model_analyzer/analyzer.py", line 131, in profile
self._profile_models()
File "/workspace/model_analyzer/analyzer.py", line 251, in _profile_models
self._model_manager.run_models(models=[model])
File "/workspace/model_analyzer/model_manager.py", line 154, in run_models
measurement = self._metrics_manager.execute_run_config(run_config)
File "/workspace/model_analyzer/record/metrics_manager.py", line 238, in execute_run_config
if not self._load_model_variants(run_config):
File "/workspace/model_analyzer/record/metrics_manager.py", line 452, in _load_model_variants
if not self._load_model_variant(variant_config=mrc.model_config_variant()):
File "/workspace/model_analyzer/record/metrics_manager.py", line 467, in _load_model_variant
retval = self._do_load_model_variant(variant_config)
File "/workspace/model_analyzer/record/metrics_manager.py", line 474, in _do_load_model_variant
self._client.wait_for_server_ready(
File "/workspace/model_analyzer/triton/client/client.py", line 72, in wait_for_server_ready
raise TritonModelAnalyzerException(e)
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNKNOWN: ipv6:%5B::1%5D:8001: Failed to connect to remote host: Timeout occurred: FD Shutdown
need help to fix this issue
Metadata
Metadata
Assignees
Labels
No labels