Releases · ServeurpersoCom/llama.cpp

01 Oct 18:34

4201dea

b6660

common: introduce http.h for httplib-based client (#16373)

* common: introduce http.h for httplib-based client

This change moves cpp-httplib based URL parsing and client setup into
a new header `common/http.h`, and integrates it in `arg.cpp` and `run.cpp`.

It is an iteration towards removing libcurl, while intentionally
minimizing changes to existing code to guarantee the same behavior when
`LLAMA_CURL` is used.

Signed-off-by: Adrien Gallouët <[email protected]>

* tools : add missing WIN32_LEAN_AND_MEAN

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>
Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

01 Oct 03:56

github-actions

b6653

e74c92e

b6653

model : support GLM 4.6 (make a few NextN/MTP tensors not required) (…

Assets 15

30 Sep 18:39

github-actions

b6651

bf6f3b3

b6651

common : disable progress bar without a tty (#16352)

* common : disable progress bar without a tty

Signed-off-by: Adrien Gallouët <[email protected]>

* Add missing headers

Signed-off-by: Adrien Gallouët <[email protected]>

---------

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

30 Sep 17:49

github-actions

b6648

8d78cd2

b6648

ggml webgpu: support for rope,div,sub,glu,scale,cont operators (#16187)

* Work on rope

* Simplify inplace operation generation and combine mul/add generation

* Work on rope variants

* implement neox rope

* rope complete

* Add sub,div,glu operators

* implement scale op

* Update cpy shader to handle cont/more types

* formatting

* Update test vars printing for rope,rms_norm

* Avoid ROPE hardcoded constants

* Add TODO to change ROPE constants to enum

Co-authored-by: Georgi Gerganov <[email protected]>

* fix TODO comment

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Assets 15

30 Sep 16:30

github-actions

b6646

364a7a6

b6646

common : remove common_has_curl() (#16351)

`test-arg-parser.cpp` has been updated to work consistently,
regardless of whether CURL or SSL support is available, and
now always points to `ggml.ai`.

The previous timeout test has been removed, but it can be
added back by providing a dedicated URL under `ggml.ai`.

Signed-off-by: Adrien Gallouët <[email protected]>

Assets 15

30 Sep 11:23

github-actions

b6644

075c015

b6644

ggml : bump version to 0.9.4 (ggml/1363)

Assets 15

30 Sep 08:52

github-actions

b6643

a014310

b6643

cuda : Enable CUDA Graph usage for Nemotron Nano v2 (NemotronH) (#16328)

* Fix Nemotron Nano v2 9B not executing as CUDA Graph on NVIDIA GPUs

* fix to ensure test-backend-ops check passes

Assets 15

30 Sep 06:44

github-actions

b6639

de41f2b

b6639

codeowners: add codeowners for opencl backend (#16344)

Assets 15

29 Sep 15:16

github-actions

b6636

d72f5f7

b6636

ci : add AMD runners and workflows (#16249)

* ci : add AMD runners and workflows

* ci : move AMD jobs to separate workflow

* cont : fix paths

Assets 15

29 Sep 12:04

github-actions

b6628

02463ab

b6628

ggml-backend : add root cause in error message if loading backend lib…

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ServeurpersoCom/llama.cpp

b6660

Uh oh!

b6653

Uh oh!

b6651

Uh oh!

b6648

Uh oh!

b6646

Uh oh!

b6644

Uh oh!

b6643

Uh oh!

b6639

Uh oh!

b6636

Uh oh!

b6628

Uh oh!