You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# It allocates the model to every machine learning node.
26
27
#
27
-
# @option arguments [String] :model_id The unique identifier of the trained model. (*Required*)
28
-
# @option arguments [String] :cache_size A byte-size value for configuring the inference cache size. For example, 20mb.
29
-
# @option arguments [String] :deployment_id The Id of the new deployment. Defaults to the model_id if not set.
30
-
# @option arguments [Integer] :number_of_allocations The total number of allocations this model is assigned across machine learning nodes.
31
-
# @option arguments [Integer] :threads_per_allocation The number of threads used by each model allocation during inference.
28
+
# @option arguments [String] :model_id The unique identifier of the trained model. Currently, only PyTorch models are supported. (*Required*)
29
+
# @option arguments [Integer, String] :cache_size The inference cache size (in memory outside the JVM heap) per node for the model.
30
+
# The default value is the same size as the +model_size_bytes+. To disable the cache,
31
+
# +0b+ can be provided.
32
+
# @option arguments [String] :deployment_id A unique identifier for the deployment of the model.
33
+
# @option arguments [Integer] :number_of_allocations The number of model allocations on each node where the model is deployed.
34
+
# All allocations on a node share the same copy of the model in memory but use
35
+
# a separate set of threads to evaluate the model.
36
+
# Increasing this value generally increases the throughput.
37
+
# If this setting is greater than the number of hardware threads
38
+
# it will automatically be changed to a value less than the number of hardware threads.
39
+
# If adaptive_allocations is enabled, do not set this value, because it’s automatically set. Server default: 1.
32
40
# @option arguments [String] :priority The deployment priority.
33
-
# @option arguments [Integer] :queue_capacity Controls how many inference requests are allowed in the queue at a time.
34
-
# @option arguments [Time] :timeout Controls the amount of time to wait for the model to deploy.
35
-
# @option arguments [String] :wait_for The allocation status for which to wait (options: starting, started, fully_allocated)
41
+
# @option arguments [Integer] :queue_capacity Specifies the number of inference requests that are allowed in the queue. After the number of requests exceeds
42
+
# this value, new requests are rejected with a 429 error. Server default: 1024.
43
+
# @option arguments [Integer] :threads_per_allocation Sets the number of threads used by each model allocation during inference. This generally increases
44
+
# the inference speed. The inference process is a compute-bound process; any number
45
+
# greater than the number of available hardware threads on the machine does not increase the
46
+
# inference speed. If this setting is greater than the number of hardware threads
47
+
# it will automatically be changed to a value less than the number of hardware threads. Server default: 1.
48
+
# @option arguments [Time] :timeout Specifies the amount of time to wait for the model to deploy. Server default: 20s.
49
+
# @option arguments [String] :wait_for Specifies the allocation status to wait for before returning. Server default: started.
0 commit comments