Workflow runs · ggml-org/llama.cpp

Actions

All workflows
Workflows
- Build Actions Cache Build Actions Cache
- Build on Linux using cross-compiler Build on Linux using cross-compiler
- Build on RISCV Linux Machine by Cloud-V Build on RISCV Linux Machine by Cloud-V
- Build relocatable cmake package Build relocatable cmake package
- Check Pre-Tokenizer Hashes Check Pre-Tokenizer Hashes
- Check vendor Check vendor
- CI CI
- CI (AMD) CI (AMD)
- Close inactive issues Close inactive issues
- Copilot code review Copilot code review
Management
- Caches

All workflows

Actions

Loading...
Loading

Showing runs from all workflows

34,344 workflow run results

Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B Server #13662: Pull request #13386 opened by hjc4869

1h 18m 7s hjc4869:no_op_offload

hjc4869:no_op_offload

1h 18m 7s

Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B CI #22302: Pull request #13386 opened by hjc4869

2h 47m 28s hjc4869:no_op_offload

hjc4869:no_op_offload

2h 47m 28s

Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B EditorConfig Checker #25002: Pull request #13386 opened by hjc4869

16m 12s hjc4869:no_op_offload

hjc4869:no_op_offload

16m 12s

Add --no-op-offload to improve -ot pp perf in MoE models like llama4 400B Pull Request Labeler #10952: Pull request #13386 opened by hjc4869

46m 0s

arg : add model catalog CI #22301: Pull request #13385 opened by ngxson

1h 58m 1s ngxson:xsn/arg_add_catalog

ngxson:xsn/arg_add_catalog

1h 58m 1s

arg : add model catalog EditorConfig Checker #25001: Pull request #13385 opened by ngxson

46m 44s ngxson:xsn/arg_add_catalog

ngxson:xsn/arg_add_catalog

46m 44s

arg : add model catalog Server #13661: Pull request #13385 opened by ngxson

58m 33s ngxson:xsn/arg_add_catalog

ngxson:xsn/arg_add_catalog

58m 33s

arg : add model catalog Pull Request Labeler #10951: Pull request #13385 opened by ngxson

47m 45s

kv-cache : add SWA support EditorConfig Checker #25000: Pull request #13194 synchronize by ggerganov

41m 58s gg/swa

gg/swa

41m 58s

kv-cache : add SWA support CI #22300: Pull request #13194 synchronize by ggerganov

2h 33m 41s gg/swa

gg/swa

2h 33m 41s

kv-cache : add SWA support Server #13660: Pull request #13194 synchronize by ggerganov

51m 23s gg/swa

gg/swa

51m 23s

kv-cache : add SWA support Pull Request Labeler #10950: Pull request #13194 synchronize by ggerganov

36m 17s

kv-cache : prepare for SWA (wip) Python check requirements.txt #3055: Commit 1c69466 pushed by ggerganov

10m 58s gg/swa

gg/swa

10m 58s

kv-cache : prepare for SWA (wip) Python Type-Check #2575: Commit 1c69466 pushed by ggerganov

52m 51s gg/swa

gg/swa

52m 51s

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel Server #13659: Pull request #13053 synchronize by eddnjjn

44m 13s eddnjjn:feature/fp32_sme_kernel

eddnjjn:feature/fp32_sme_kernel

44m 13s

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel CI #22299: Pull request #13053 synchronize by eddnjjn

2h 18m 40s eddnjjn:feature/fp32_sme_kernel

eddnjjn:feature/fp32_sme_kernel

2h 18m 40s

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel EditorConfig Checker #24999: Pull request #13053 synchronize by eddnjjn

7m 10s eddnjjn:feature/fp32_sme_kernel

eddnjjn:feature/fp32_sme_kernel

7m 10s

ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel Pull Request Labeler #10949: Pull request #13053 synchronize by eddnjjn

12m 39s

CUDA: fix crash on large batch size for MoE models CI #22298: Pull request #13384 opened by JohannesGaessler

1h 51m 50s JohannesGaessler:cuda-fix-moe-max-ub

JohannesGaessler:cuda-fix-moe-max-ub

1h 51m 50s

CUDA: fix crash on large batch size for MoE models EditorConfig Checker #24998: Pull request #13384 opened by JohannesGaessler

15m 18s JohannesGaessler:cuda-fix-moe-max-ub

JohannesGaessler:cuda-fix-moe-max-ub

15m 18s

CUDA: fix crash on large batch size for MoE models Server #13658: Pull request #13384 opened by JohannesGaessler

39m 57s JohannesGaessler:cuda-fix-moe-max-ub

JohannesGaessler:cuda-fix-moe-max-ub

39m 57s

CUDA: fix crash on large batch size for MoE models Pull Request Labeler #10948: Pull request #13384 opened by JohannesGaessler

15m 29s

vulkan: scalar flash attention implementation CI #22297: Pull request #13324 synchronize by jeffbolznv

1h 27m 40s jeffbolznv:scalar_fa_3

jeffbolznv:scalar_fa_3

1h 27m 40s

vulkan: scalar flash attention implementation Pull Request Labeler #10947: Pull request #13324 synchronize by jeffbolznv

18m 28s

vulkan: scalar flash attention implementation EditorConfig Checker #24997: Pull request #13324 synchronize by jeffbolznv

19m 8s jeffbolznv:scalar_fa_3

jeffbolznv:scalar_fa_3

19m 8s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Actions

Workflows

Management

All workflows

Actions

Loading...
Loading

All workflows

Uh oh!

Filter by Event

Sorry, something went wrong.

Sorry, something went wrong.

No matching events.

Filter by Status

Sorry, something went wrong.

Sorry, something went wrong.

No matching statuses.

Filter by Branch

Sorry, something went wrong.

Sorry, something went wrong.

No matching branches.

Filter by Actor

Sorry, something went wrong.

Sorry, something went wrong.

No matching users.

Actions: ggml-org/llama.cpp

Actions

All workflows All workflows Actions Loading... Loading Sorry, something went wrong. Uh oh! There was an error while loading. Please reload this page.

All workflows

All workflows

Actions

Loading...
Loading