Skip to content

Conversation

OuadiElfarouki
Copy link
Owner

Updated & fixed SYCL Readme with an emphasis on the multi-targets (intel GPU & Nvidia GPU so far) capability of the SYCL backend in llama.cpp

Copy link
Collaborator

@Alcpz Alcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Suggestions are mostly spaces before colons and NITs.

Copy link
Collaborator

@AidanBeltonS AidanBeltonS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly NITS

OuadiElfarouki and others added 15 commits March 18, 2024 14:35
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: AidanBeltonS <[email protected]>
Co-authored-by: AidanBeltonS <[email protected]>
Co-authored-by: AidanBeltonS <[email protected]>
Co-authored-by: AidanBeltonS <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
Co-authored-by: Alberto Cabrera Pérez <[email protected]>
OuadiElfarouki pushed a commit that referenced this pull request Aug 6, 2024
* [example] batched-bench "segmentation fault"

When `llama-batched-bench` is invoked _without_ setting `-npl`, "number
of parallel prompts", it segfaults.

The segfault is caused by invoking `max_element()` on a zero-length
vector, `n_pl`

This commit addresses that by first checking to see if the number of
parallel prompts is zero, and if so sets the maximum sequence size to 1;
otherwise, sets it to the original, the result of `max_element()`.

Fixes, when running `lldb build/bin/llama-batched-bench -- -m models/Meta-Llama-3-8B.gguf`

```
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010000366c llama-batched-bench`main(argc=3, argv=0x000000016fdff268) at batched-bench.cpp:72:28
   69  	    llama_context_params ctx_params = llama_context_params_from_gpt_params(params);
   70
   71  	    // ensure enough sequences are available
-> 72  	    ctx_params.n_seq_max = *std::max_element(n_pl.begin(), n_pl.end());
```

* Update examples/batched-bench/batched-bench.cpp

Co-authored-by: compilade <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: compilade <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants