You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* This class provides core computational operations such as RMS normalization and
22
-
* forward passes through model layers. It supports both CPU and GPU implementations.
24
+
* This class provides core computational operations such as RMS normalization and forward passes through model layers. It supports both CPU and GPU implementations.
23
25
* </p>
24
26
*
25
27
* <p>
@@ -308,6 +310,117 @@ public static FloatTensor forwardJavaQwen3(Model model, State state, int token,
* Orchestrates the complete inference process: ingests prompt tokens, then generates
22
-
* new tokens until a stop condition is met. Supports both CPU and GPU execution.
22
+
* Orchestrates the complete inference process: ingests prompt tokens, then generates new tokens until a stop condition is met. Supports both CPU and GPU execution.
23
23
* </p>
24
24
*
25
25
* <p>
@@ -42,19 +42,26 @@ private InferenceEngine() {
42
42
* LLM generation entry point, ingest prompt tokens and generates new tokens.
43
43
*
44
44
* <p>
45
-
* All prompt tokens are ingested first, then inference starts, until a stop token is found.
46
-
* The returned tokens only include generated/inferred tokens.
45
+
* All prompt tokens are ingested first, then inference starts, until a stop token is found. The returned tokens only include generated/inferred tokens.
47
46
*
48
-
* @param model model to run inference (including weights, configuration, tokenizer ...)
49
-
* @param state state of the model e.g. key/value caches ... this is mutated by this call
50
-
* @param startPosition start prompt ingestion + inference at this position in the context e.g. useful if state was kept across calls (chained generation). 0 implies run with no previous context.
51
-
* @param promptTokens prompt tokens to ingest, all the prompt tokens will be ingested, given there's enough capacity left in the context
52
-
* @param stopTokens set of tokens that abort generation during inference, stop tokens do not affect prompt ingestion
53
-
* @param maxTokens maximum number of tokens (can go up to {@link Configuration#contextLength context length}
54
-
* if this value is negative or greater than {@link Configuration#contextLength context length}
55
-
* @param sampler {@link Sampler strategy} used to select tokens
56
-
* @param echo debugging flag, prints ALL, prompt and inferred tokens, to {@link System#err stderr}
57
-
* @param onTokenGenerated callback, if non-null, it's called every time a token is inferred e.g. it's not called when ingesting prompt tokens
47
+
* @param model
48
+
* model to run inference (including weights, configuration, tokenizer ...)
49
+
* @param state
50
+
* state of the model e.g. key/value caches ... this is mutated by this call
51
+
* @param startPosition
52
+
* start prompt ingestion + inference at this position in the context e.g. useful if state was kept across calls (chained generation). 0 implies run with no previous context.
53
+
* @param promptTokens
54
+
* prompt tokens to ingest, all the prompt tokens will be ingested, given there's enough capacity left in the context
55
+
* @param stopTokens
56
+
* set of tokens that abort generation during inference, stop tokens do not affect prompt ingestion
57
+
* @param maxTokens
58
+
* maximum number of tokens (can go up to {@link Configuration#contextLength context length} if this value is negative or greater than {@link Configuration#contextLength context length}
59
+
* @param sampler
60
+
* {@link Sampler strategy} used to select tokens
61
+
* @param echo
62
+
* debugging flag, prints ALL, prompt and inferred tokens, to {@link System#err stderr}
63
+
* @param onTokenGenerated
64
+
* callback, if non-null, it's called every time a token is inferred e.g. it's not called when ingesting prompt tokens
58
65
* @return list of generated/inferred tokens, including the stop token, if any e.g. does not include any token from the prompt
0 commit comments