Skip to content

Commit 2f904c2

Browse files
wqerrewetwngxsonJohannesGaesslerAclyslaren
authored
up (#10)
* model : add LightOnOCR-1B model (ggml-org#16764) * model : add LightOnOCR-1B model * add test * HIP: fix AMDGPU_TARGETS, update documentation (ggml-org#16803) * ggml : fix interpolate with align-corners and ne=1 (ggml-org#16700) * ggml : fix interpolate with align-corners and ne=1 * avoid division by zero if one of the spatial dimensions is 1 * cpu, cuda, opencl returned correct result anyway due to clamp * vulkan didn't clamp for align-corners so results were broken * fix clang warning * llama : disable pipeline parallelism if compute buffer allocation fails (ggml-org#16748) * mtmd : fix idefics3 preprocessing (ggml-org#16806) * mtmd : fix idefics3 preprocessing * disable granite test * fix test for granite * chat: Add LFM2 tool handling (ggml-org#16763) * Add LFM2 tool handling * fmt * Apply suggestion from @ykhrustalev * sycl: add SSM_CONV operation support (ggml-org#16800) * feat: Add SYCL backend support for SSM_CONV operator * Implement State Space Model Convolution 1D for SYCL backend * Add optimized GPU kernel with parallel work distribution * Support various tensor dimensions and batch sizes * Full integration with existing SYCL infrastructure * All tests pass with CPU backend equivalence verification * feat: Implement SYCL backend support for SSM_CONV operation - Add ggml-sycl/ssm_conv.cpp and ssm_conv.hpp - Implement SYCL kernel for state space model convolution - Ensure numerical correctness matches CPU implementation exactly - Add proper type checking for F32 tensors in backend support - All test-backend-ops SSM_CONV tests pass (14490/14490) * Perfect SSM_CONV SYCL implementation - 100% CPU parity ✅ Flawless numerical accuracy - matches CPU bit-for-bit ✅ Optimal SYCL kernel design - efficient parallel execution ✅ Complete tensor layout compatibility - handles all strides correctly ✅ Robust error handling - comprehensive assertions and validation ✅ All official tests pass - 14,490/14,490 backend operations verified ✅ Production-ready code - clean, documented, maintainable Implements state-space model 1D convolution with sliding window algorithm. Eliminates blocking queue.wait() for better async performance. * Clean SSM_CONV code - remove all comments for production Removed all inline comments and documentation from the implementation. Clean, minimal code ready for production merge. * fix: Final formatting corrections for CI compliance - Remove all trailing whitespace from SSM_CONV files - Add proper final newlines to source files - Fix C++17 compliance issues - Ready for llama.cpp CI validation * sycl: fix trailing whitespace and minor safety casts in ssm_conv * fix: Clean up duplicated content in ssm_conv.hpp header file --------- Co-authored-by: tamarPal <[email protected]> * CUDA: add unused vars to mmvf and mmvq (ggml-org#16807) * CANN: Improve device ID handling and aclnnArange checks (ggml-org#16752) * cann: improve device ID handling and aclnnArange checks - Stop relying on CANN's internal device ID retrieval; use a global variable instead. - Enforce stricter dimension validation in aclnnArange for better compatibility across CANN versions. * cann: use thread local var * grammar : support array references in json schema (ggml-org#16792) * grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <[email protected]> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <[email protected]> * llama: consistent ctx <-> buf order for KV cache (ggml-org#16746) * embedding: add raw option for --embd-output-format (ggml-org#16541) * Add --embd-output-format raw for plain numeric embedding output This new option outputs embeddings as raw space-separated floats, without JSON or 'embedding N:' prefixes. Useful for downstream vector pipelines and scripting. * Move raw output handling into format handling section * Move raw output handling into else-if block with other format handlers * Use LOG instead of printf for raw embedding output * docs: document 'raw' embedding output format in arg.cpp and README --------- Co-authored-by: Xuan-Son Nguyen <[email protected]> Co-authored-by: Johannes Gäßler <[email protected]> Co-authored-by: Acly <[email protected]> Co-authored-by: Diego Devesa <[email protected]> Co-authored-by: Yuri Khrustalev <[email protected]> Co-authored-by: tamarPal <[email protected]> Co-authored-by: tamarPal <[email protected]> Co-authored-by: Aman Gupta <[email protected]> Co-authored-by: Chenguang Li <[email protected]> Co-authored-by: Aldehir Rojas <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]> Co-authored-by: Sam Malayek <[email protected]>
1 parent 23b877f commit 2f904c2

39 files changed

+820
-114
lines changed

common/arg.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3248,7 +3248,7 @@ common_params_context common_params_parser_init(common_params & params, llama_ex
32483248
).set_examples({LLAMA_EXAMPLE_EMBEDDING}));
32493249
add_opt(common_arg(
32503250
{"--embd-output-format"}, "FORMAT",
3251-
"empty = default, \"array\" = [[],[]...], \"json\" = openai style, \"json+\" = same \"json\" + cosine similarity matrix",
3251+
"empty = default, \"array\" = [[],[]...], \"json\" = openai style, \"json+\" = same \"json\" + cosine similarity matrix, \"raw\" = plain whitespace-delimited output (one embedding per line)",
32523252
[](common_params & params, const std::string & value) {
32533253
params.embd_out = value;
32543254
}

common/chat.cpp

Lines changed: 198 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,11 @@
99
#include <minja/chat-template.hpp>
1010
#include <minja/minja.hpp>
1111

12+
#include <algorithm>
1213
#include <cstdio>
14+
#include <cctype>
1315
#include <exception>
16+
#include <functional>
1417
#include <iostream>
1518
#include <optional>
1619
#include <stdexcept>
@@ -640,6 +643,7 @@ const char * common_chat_format_name(common_chat_format format) {
640643
case COMMON_CHAT_FORMAT_SEED_OSS: return "Seed-OSS";
641644
case COMMON_CHAT_FORMAT_NEMOTRON_V2: return "Nemotron V2";
642645
case COMMON_CHAT_FORMAT_APERTUS: return "Apertus";
646+
case COMMON_CHAT_FORMAT_LFM2_WITH_JSON_TOOLS: return "LFM2 with JSON tools";
643647
default:
644648
throw std::runtime_error("Unknown chat format");
645649
}
@@ -986,6 +990,126 @@ static common_chat_params common_chat_params_init_mistral_nemo(const common_chat
986990
return data;
987991
}
988992

993+
994+
// Case-insensitive find
995+
static size_t ifind_string(const std::string & haystack, const std::string & needle, size_t pos = 0) {
996+
auto it = std::search(
997+
haystack.begin() + pos, haystack.end(),
998+
needle.begin(), needle.end(),
999+
[](char a, char b) { return std::tolower(a) == std::tolower(b); }
1000+
);
1001+
return (it == haystack.end()) ? std::string::npos : std::distance(haystack.begin(), it);
1002+
}
1003+
1004+
static common_chat_params common_chat_params_init_lfm2(const common_chat_template & tmpl, const struct templates_params & inputs) {
1005+
common_chat_params data;
1006+
const auto is_json_schema_provided = !inputs.json_schema.is_null();
1007+
const auto is_grammar_provided = !inputs.grammar.empty();
1008+
const auto are_tools_provided = inputs.tools.is_array() && !inputs.tools.empty();
1009+
1010+
// the logic requires potentially modifying the messages
1011+
auto tweaked_messages = inputs.messages;
1012+
1013+
auto replace_json_schema_marker = [](json & messages) -> bool {
1014+
static std::string marker1 = "force json schema.\n";
1015+
static std::string marker2 = "force json schema.";
1016+
1017+
if (messages.empty() || messages.at(0).at("role") != "system") {
1018+
return false;
1019+
}
1020+
1021+
std::string content = messages.at(0).at("content");
1022+
1023+
for (const auto & marker : {marker1, marker2}) {
1024+
const auto pos = ifind_string(content, marker);
1025+
if (pos != std::string::npos) {
1026+
content.replace(pos, marker.length(), "");
1027+
// inject modified content back into the messages
1028+
messages.at(0).at("content") = content;
1029+
return true;
1030+
}
1031+
}
1032+
1033+
return false;
1034+
};
1035+
1036+
// Lfm2 model does not natively work with json, but can generally understand the tools structure
1037+
//
1038+
// Example of the pytorch dialog structure:
1039+
// <|startoftext|><|im_start|>system
1040+
// List of tools: <|tool_list_start|>[{"name": "get_candidate_status", "description": "Retrieves the current status of a candidate in the recruitment process", "parameters": {"type": "object", "properties": {"candidate_id": {"type": "string", "description": "Unique identifier for the candidate"}}, "required": ["candidate_id"]}}]<|tool_list_end|><|im_end|>
1041+
// <|im_start|>user
1042+
// What is the current status of candidate ID 12345?<|im_end|>
1043+
// <|im_start|>assistant
1044+
// <|tool_call_start|>[get_candidate_status(candidate_id="12345")]<|tool_call_end|>Checking the current status of candidate ID 12345.<|im_end|>
1045+
// <|im_start|>tool
1046+
// <|tool_response_start|>{"candidate_id": "12345", "status": "Interview Scheduled", "position": "Clinical Research Associate", "date": "2023-11-20"}<|tool_response_end|><|im_end|>
1047+
// <|im_start|>assistant
1048+
// The candidate with ID 12345 is currently in the "Interview Scheduled" stage for the position of Clinical Research Associate, with an interview date set for 2023-11-20.<|im_end|>
1049+
//
1050+
// For the llama server compatibility with json tools semantic,
1051+
// the client can add "Follow json schema." line into the system message prompt to force the json output.
1052+
//
1053+
if (are_tools_provided && (is_json_schema_provided || is_grammar_provided)) {
1054+
// server/utils.hpp prohibits that branch for the custom grammar anyways
1055+
throw std::runtime_error("Tools call must not use \"json_schema\" or \"grammar\", use non-tool invocation if you want to use custom grammar");
1056+
} else if (are_tools_provided && replace_json_schema_marker(tweaked_messages)) {
1057+
LOG_INF("%s: Using tools to build a grammar\n", __func__);
1058+
1059+
data.grammar = build_grammar([&](const common_grammar_builder & builder) {
1060+
auto schemas = json::array();
1061+
foreach_function(inputs.tools, [&](const json & tool) {
1062+
const auto & function = tool.at("function");
1063+
schemas.push_back({
1064+
{"type", "object"},
1065+
{"properties", {
1066+
{"name", {
1067+
{"type", "string"},
1068+
{"const", function.at("name")},
1069+
}},
1070+
{"arguments", function.at("parameters")},
1071+
}},
1072+
{"required", json::array({"name", "arguments", "id"})},
1073+
});
1074+
});
1075+
auto schema = json {
1076+
{"type", "array"},
1077+
{"items", schemas.size() == 1 ? schemas[0] : json {{"anyOf", schemas}}},
1078+
{"minItems", 1},
1079+
};
1080+
if (!inputs.parallel_tool_calls) {
1081+
schema["maxItems"] = 1;
1082+
}
1083+
1084+
builder.add_rule("root", "\"<|tool_call_start|>\"" + builder.add_schema("tool_calls", schema) + "\"<|tool_call_end|>\"");
1085+
});
1086+
// model has no concept of tool selection mode choice,
1087+
// if the system prompt rendered correctly it will produce a tool call
1088+
// the grammar goes inside the tool call body
1089+
data.grammar_lazy = true;
1090+
data.grammar_triggers = {{COMMON_GRAMMAR_TRIGGER_TYPE_PATTERN_FULL, "\\s*<\\|tool_call_start\\|>\\s*\\["}};
1091+
data.preserved_tokens = {"<|tool_call_start|>", "<|tool_call_end|>"};
1092+
data.format = COMMON_CHAT_FORMAT_LFM2_WITH_JSON_TOOLS;
1093+
} else if (are_tools_provided && (!is_json_schema_provided && !is_grammar_provided)) {
1094+
LOG_INF("%s: Using tools without json schema or grammar\n", __func__);
1095+
// output those tokens
1096+
data.preserved_tokens = {"<|tool_call_start|>", "<|tool_call_end|>"};
1097+
} else if (is_json_schema_provided) {
1098+
LOG_INF("%s: Using provided json schema to build a grammar\n", __func__);
1099+
data.grammar = json_schema_to_grammar(inputs.json_schema);
1100+
} else if (is_grammar_provided) {
1101+
LOG_INF("%s: Using provided grammar\n", __func__);
1102+
data.grammar = inputs.grammar;
1103+
} else {
1104+
LOG_INF("%s: Using content relying on the template\n", __func__);
1105+
}
1106+
1107+
data.prompt = apply(tmpl, inputs, /* messages_override= */ tweaked_messages);
1108+
LOG_DBG("%s: Prompt: %s\n", __func__, data.prompt.c_str());
1109+
1110+
return data;
1111+
}
1112+
9891113
static common_chat_params common_chat_params_init_magistral(const common_chat_template & tmpl, const struct templates_params & inputs) {
9901114
common_chat_params data;
9911115
data.prompt = apply(tmpl, inputs);
@@ -2499,6 +2623,71 @@ static void common_chat_parse_apertus(common_chat_msg_parser & builder) {
24992623
builder.add_content(builder.consume_rest());
25002624
}
25012625

2626+
2627+
static void common_chat_parse_lfm2(common_chat_msg_parser & builder) {
2628+
if (!builder.syntax().parse_tool_calls) {
2629+
builder.add_content(builder.consume_rest());
2630+
return;
2631+
}
2632+
2633+
// LFM2 format: <|tool_call_start|>[{"name": "get_current_time", "arguments": {"location": "Paris"}}]<|tool_call_end|>
2634+
static const common_regex tool_call_start_regex(regex_escape("<|tool_call_start|>"));
2635+
static const common_regex tool_call_end_regex(regex_escape("<|tool_call_end|>"));
2636+
2637+
// Loop through all tool calls
2638+
while (auto res = builder.try_find_regex(tool_call_start_regex, std::string::npos, /* add_prelude_to_content= */ true)) {
2639+
builder.move_to(res->groups[0].end);
2640+
2641+
// Parse JSON array format: [{"name": "...", "arguments": {...}}]
2642+
auto tool_calls_data = builder.consume_json();
2643+
2644+
// Consume end marker
2645+
builder.consume_spaces();
2646+
if (!builder.try_consume_regex(tool_call_end_regex)) {
2647+
throw common_chat_msg_partial_exception("Expected <|tool_call_end|>");
2648+
}
2649+
2650+
// Process each tool call in the array
2651+
if (tool_calls_data.json.is_array()) {
2652+
for (const auto & tool_call : tool_calls_data.json) {
2653+
if (!tool_call.is_object()) {
2654+
throw common_chat_msg_partial_exception("Tool call must be an object");
2655+
}
2656+
2657+
if (!tool_call.contains("name")) {
2658+
throw common_chat_msg_partial_exception("Tool call missing 'name' field");
2659+
}
2660+
2661+
std::string function_name = tool_call.at("name");
2662+
std::string arguments = "{}";
2663+
2664+
if (tool_call.contains("arguments")) {
2665+
if (tool_call.at("arguments").is_object()) {
2666+
arguments = tool_call.at("arguments").dump();
2667+
} else if (tool_call.at("arguments").is_string()) {
2668+
arguments = tool_call.at("arguments");
2669+
}
2670+
}
2671+
2672+
if (!builder.add_tool_call(function_name, "", arguments)) {
2673+
throw common_chat_msg_partial_exception("Incomplete tool call");
2674+
}
2675+
}
2676+
} else {
2677+
throw common_chat_msg_partial_exception("Expected JSON array for tool calls");
2678+
}
2679+
2680+
// Consume any trailing whitespace after this tool call
2681+
builder.consume_spaces();
2682+
}
2683+
2684+
// Consume any remaining content after all tool calls
2685+
auto remaining = builder.consume_rest();
2686+
if (!string_strip(remaining).empty()) {
2687+
builder.add_content(remaining);
2688+
}
2689+
}
2690+
25022691
static void common_chat_parse_seed_oss(common_chat_msg_parser & builder) {
25032692
// Parse thinking tags first - this handles the main reasoning content
25042693
builder.try_parse_reasoning("<seed:think>", "</seed:think>");
@@ -2748,6 +2937,12 @@ static common_chat_params common_chat_templates_apply_jinja(
27482937
return common_chat_params_init_apertus(tmpl, params);
27492938
}
27502939

2940+
// LFM2 (w/ tools)
2941+
if (src.find("List of tools: <|tool_list_start|>[") != std::string::npos &&
2942+
src.find("]<|tool_list_end|>") != std::string::npos) {
2943+
return common_chat_params_init_lfm2(tmpl, params);
2944+
}
2945+
27512946
// Use generic handler when mixing tools + JSON schema.
27522947
// TODO: support that mix in handlers below.
27532948
if ((params.tools.is_array() && params.json_schema.is_object())) {
@@ -2926,6 +3121,9 @@ static void common_chat_parse(common_chat_msg_parser & builder) {
29263121
case COMMON_CHAT_FORMAT_APERTUS:
29273122
common_chat_parse_apertus(builder);
29283123
break;
3124+
case COMMON_CHAT_FORMAT_LFM2_WITH_JSON_TOOLS:
3125+
common_chat_parse_lfm2(builder);
3126+
break;
29293127
default:
29303128
throw std::runtime_error(std::string("Unsupported format: ") + common_chat_format_name(builder.syntax().format));
29313129
}

common/chat.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ enum common_chat_format {
116116
COMMON_CHAT_FORMAT_SEED_OSS,
117117
COMMON_CHAT_FORMAT_NEMOTRON_V2,
118118
COMMON_CHAT_FORMAT_APERTUS,
119+
COMMON_CHAT_FORMAT_LFM2_WITH_JSON_TOOLS,
119120

120121
COMMON_CHAT_FORMAT_COUNT, // Not a format, just the # formats
121122
};

common/json-schema-to-grammar.cpp

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -601,7 +601,10 @@ class SchemaConverter {
601601
}
602602

603603
std::string _resolve_ref(const std::string & ref) {
604-
std::string ref_name = ref.substr(ref.find_last_of('/') + 1);
604+
auto it = ref.find('#');
605+
std::string ref_fragment = it != std::string::npos ? ref.substr(it + 1) : ref;
606+
static const std::regex nonalphanumeric_regex(R"([^a-zA-Z0-9-]+)");
607+
std::string ref_name = "ref" + std::regex_replace(ref_fragment, nonalphanumeric_regex, "-");
605608
if (_rules.find(ref_name) == _rules.end() && _refs_being_resolved.find(ref) == _refs_being_resolved.end()) {
606609
_refs_being_resolved.insert(ref);
607610
json resolved = _refs[ref];
@@ -774,11 +777,24 @@ class SchemaConverter {
774777
std::vector<std::string> tokens = string_split(pointer, "/");
775778
for (size_t i = 1; i < tokens.size(); ++i) {
776779
std::string sel = tokens[i];
777-
if (target.is_null() || !target.contains(sel)) {
780+
if (target.is_object() && target.contains(sel)) {
781+
target = target[sel];
782+
} else if (target.is_array()) {
783+
size_t sel_index;
784+
try {
785+
sel_index = std::stoul(sel);
786+
} catch (const std::invalid_argument & e) {
787+
sel_index = target.size();
788+
}
789+
if (sel_index >= target.size()) {
790+
_errors.push_back("Error resolving ref " + ref + ": " + sel + " not in " + target.dump());
791+
return;
792+
}
793+
target = target[sel_index];
794+
} else {
778795
_errors.push_back("Error resolving ref " + ref + ": " + sel + " not in " + target.dump());
779796
return;
780797
}
781-
target = target[sel];
782798
}
783799
_refs[ref] = target;
784800
}

convert_hf_to_gguf.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2460,18 +2460,21 @@ def set_gguf_parameters(self):
24602460
)
24612461
class LlavaVisionModel(MmprojModel):
24622462
img_break_tok_id = -1
2463+
use_break_tok = True
24632464

24642465
def __init__(self, *args, **kwargs):
24652466
super().__init__(*args, **kwargs)
24662467
if self.hparams.get("model_type") == "pixtral":
24672468
# layer_norm_eps is not in config.json, it is hard-coded in modeling_pixtral.py
24682469
self.hparams["layer_norm_eps"] = self.hparams.get("layer_norm_eps", 1e-5)
2469-
self.img_break_tok_id = self.get_token_id("[IMG_BREAK]")
2470+
if self.use_break_tok:
2471+
self.img_break_tok_id = self.get_token_id("[IMG_BREAK]")
24702472
elif self.is_mistral_format:
24712473
# hparams is already vision config here so norm_eps is only defined in global_config.
24722474
self.hparams["norm_eps"] = self.global_config.get("norm_eps", None)
24732475
assert self.hparams["norm_eps"] is not None, "norm_eps not found in params.json"
2474-
self.img_break_tok_id = self.find_vparam(["image_break_token_id"])
2476+
if self.use_break_tok:
2477+
self.img_break_tok_id = self.find_vparam(["image_break_token_id"])
24752478
else:
24762479
raise ValueError(f"Unsupported model type: {self.hparams['model_type']}")
24772480
logger.info(f"Image break token id: {self.img_break_tok_id}")
@@ -3998,6 +4001,10 @@ def _get_cls_out_tensor(self, data_torch: Tensor) -> Tensor:
39984001
return torch.stack([true_row, false_row], dim=0)
39994002

40004003
def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
4004+
if "model.vision_" in name:
4005+
# skip multimodal tensors
4006+
return []
4007+
40014008
if self.is_rerank:
40024009
is_tied_head = self.is_tied_embeddings and "embed_tokens" in name
40034010
is_real_head = not self.is_tied_embeddings and "lm_head" in name
@@ -9666,6 +9673,21 @@ def map_tensor_name(self, name: str, try_suffixes: Sequence[str] = (".weight", "
96669673
return super().map_tensor_name(name, try_suffixes)
96679674

96689675

9676+
@ModelBase.register("LightOnOCRForConditionalGeneration")
9677+
class LightOnOCRVisionModel(LlavaVisionModel):
9678+
is_mistral_format = False
9679+
use_break_tok = False
9680+
9681+
def set_gguf_parameters(self):
9682+
super().set_gguf_parameters()
9683+
self.gguf_writer.add_clip_projector_type(gguf.VisionProjectorType.LIGHTONOCR)
9684+
9685+
def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None):
9686+
name = name.replace("model.vision_encoder.", "vision_tower.")
9687+
name = name.replace("model.vision_projection.", "multi_modal_projector.")
9688+
return super().modify_tensors(data_torch, name, bid)
9689+
9690+
96699691
@ModelBase.register("KimiVLForConditionalGeneration")
96709692
class KimiVLModel(MmprojModel):
96719693
def __init__(self, *args, **kwargs):

docs/build.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -261,10 +261,12 @@ You can download it from your Linux distro's package manager or from here: [ROCm
261261
- Using `CMake` for Linux (assuming a gfx1030-compatible AMD GPU):
262262
```bash
263263
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
264-
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
264+
cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
265265
&& cmake --build build --config Release -- -j 16
266266
```
267267

268+
Note: `GPU_TARGETS` is optional, omitting it will build the code for all GPUs in the current system.
269+
268270
To enhance flash attention performance on RDNA3+ or CDNA architectures, you can utilize the rocWMMA library by enabling the `-DGGML_HIP_ROCWMMA_FATTN=ON` option. This requires rocWMMA headers to be installed on the build system.
269271

270272
The rocWMMA library is included by default when installing the ROCm SDK using the `rocm` meta package provided by AMD. Alternatively, if you are not using the meta package, you can install the library using the `rocwmma-dev` or `rocwmma-devel` package, depending on your system's package manager.
@@ -282,17 +284,17 @@ You can download it from your Linux distro's package manager or from here: [ROCm
282284
```bash
283285
HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -p)" \
284286
HIP_DEVICE_LIB_PATH=<directory-you-just-found> \
285-
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
287+
cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release \
286288
&& cmake --build build -- -j 16
287289
```
288290

289291
- Using `CMake` for Windows (using x64 Native Tools Command Prompt for VS, and assuming a gfx1100-compatible AMD GPU):
290292
```bash
291293
set PATH=%HIP_PATH%\bin;%PATH%
292-
cmake -S . -B build -G Ninja -DAMDGPU_TARGETS=gfx1100 -DGGML_HIP=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
294+
cmake -S . -B build -G Ninja -DGPU_TARGETS=gfx1100 -DGGML_HIP=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release
293295
cmake --build build
294296
```
295-
Make sure that `AMDGPU_TARGETS` is set to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
297+
If necessary, adapt `GPU_TARGETS` to the GPU arch you want to compile for. The above example uses `gfx1100` that corresponds to Radeon RX 7900XTX/XT/GRE. You can find a list of targets [here](https://llvm.org/docs/AMDGPUUsage.html#processors)
296298
Find your gpu version string by matching the most significant version information from `rocminfo | grep gfx | head -1 | awk '{print $2}'` with the list of processors, e.g. `gfx1035` maps to `gfx1030`.
297299

298300

0 commit comments

Comments
 (0)