Skip to content

scx.full: 1.0.12 -> 1.0.13#416262

Merged
JohnRTitor merged 4 commits intoNixOS:masterfrom
Gliczy:scx-1.0.13
Jun 13, 2025
Merged

scx.full: 1.0.12 -> 1.0.13#416262
JohnRTitor merged 4 commits intoNixOS:masterfrom
Gliczy:scx-1.0.13

Conversation

@Gliczy
Copy link
Member

@Gliczy Gliczy commented Jun 12, 2025

https://github.com/sched-ext/scx/releases/tag/v1.0.13

  • Added llvmPackages.libllvm to cscheds' buildInputs because llvm-strip is now required
  • Enabled doCheck for scx_cscheds
  • Added myself as maintainer

scx_rustscheds' tests fails if doCheck is enabled, so I kept it disabled for now.
Here's the failing tests result:

scx_rustscheds> running 7 tests
scx_rustscheds> libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
scx_rustscheds> libbpf: failed to find valid kernel BTF
scx_rustscheds> test build_id::tests::test_cargo_ver ... ok
scx_rustscheds> test build_id::tests::test_full_ver ... ok
scx_rustscheds> test compat::tests::test_ksym_exists ... FAILED
scx_rustscheds> test compat::tests::test_struct_has_field ... FAILED
scx_rustscheds> test compat::tests::test_read_enum ... FAILED
scx_rustscheds> test bpf_builder::tests::test_vmlinux_h_ver_sha1 ... ok
scx_rustscheds> test bpf_builder::tests::test_bpf_builder_new ... ok
scx_rustscheds> 
scx_rustscheds> failures:
scx_rustscheds> 
scx_rustscheds> ---- compat::tests::test_ksym_exists stdout ----
scx_rustscheds> 
scx_rustscheds> thread 'compat::tests::test_ksym_exists' panicked at rust/scx_utils/src/compat.rs:48:9:
scx_rustscheds> btf__load_vmlinux_btf() returned NULL, was CONFIG_DEBUG_INFO_BTF enabled?
scx_rustscheds> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
scx_rustscheds> 
scx_rustscheds> ---- compat::tests::test_struct_has_field stdout ----
scx_rustscheds> 
scx_rustscheds> thread 'compat::tests::test_struct_has_field' panicked at /build/scx_rustscheds-1.0.13-vendor/lazy_static-1.5.0/src/inline_lazy.rs:30:16:
scx_rustscheds> Once instance has previously been poisoned
scx_rustscheds> 
scx_rustscheds> ---- compat::tests::test_read_enum stdout ----
scx_rustscheds> 
scx_rustscheds> thread 'compat::tests::test_read_enum' panicked at /build/scx_rustscheds-1.0.13-vendor/lazy_static-1.5.0/src/inline_lazy.rs:30:16:
scx_rustscheds> Once instance has previously been poisoned
scx_rustscheds> 
scx_rustscheds> 
scx_rustscheds> failures:
scx_rustscheds>     compat::tests::test_ksym_exists
scx_rustscheds>     compat::tests::test_read_enum
scx_rustscheds>     compat::tests::test_struct_has_field
scx_rustscheds> 
scx_rustscheds> test result: FAILED. 4 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.08s

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • Nixpkgs 25.11 Release Notes (or backporting 24.11 and 25.05 Nixpkgs Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
  • NixOS 25.11 Release Notes (or backporting 24.11 and 25.05 NixOS Release notes)
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other contributing documentation in corresponding paths.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. labels Jun 12, 2025
@nix-owners nix-owners bot requested a review from JohnRTitor June 12, 2025 21:43
@Gliczy
Copy link
Member Author

Gliczy commented Jun 12, 2025

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 416262
Commit: 79d24a6e5c727bd60eadd213f697d58cc6d48223


x86_64-linux

✅ 7 packages built:
  • scx.cscheds
  • scx.cscheds.bin
  • scx.cscheds.dev
  • scx.full
  • scx.full.bin
  • scx.full.dev
  • scx.rustscheds

Copy link
Member

@JohnRTitor JohnRTitor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you amend the commits such that:

  1. First one is only a version bump commit, with the required changes for it to build
  2. Second one is enabling/skipping tests
  3. This one is fine

Note that you should use the actual attribute like: scx.full scx.cscheds scx.rustscheds for Ofborg to pick it up for building.

@JohnRTitor
Copy link
Member

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 416262 --package scx.full.passthru.tests --package scx.full
Commit: 49256cbd429702f2c821f8d284bbddfaef14fb31


x86_64-linux

✅ 4 packages built:
  • scx.full
  • scx.full.bin (scx.full.bin.bin, scx.full.bin.dev)
  • scx.full.dev (scx.full.dev.bin, scx.full.dev.dev)
  • scx.full.passthru.tests.basic

Signed-off-by: John Titor <50095635+JohnRTitor@users.noreply.github.com>
@JohnRTitor
Copy link
Member

I think I am getting some weird behaviour with scx_central, could you check on your end?

1st run:

Details
❯  sudo ./result-bin/bin/scx_central
libbpf: struct_ops central_ops: member priv not found in kernel, skipping it as it's set to zero
[SEQ 0]
total   :        13    local:         0   queued:         0  lost:         0
timer   :        45 dispatch:        29 mismatch:        11 retry:         0
overflow:         0

DEBUG DUMP
================================================================================

chrome[120529] triggered exit kind 1024:
  runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled chrome[120529] from CPU 3 to 0)

Backtrace:
  task_can_run_on_remote_rq+0xfe/0x160
  dispatch_to_local_dsq+0x5f/0x210
  flush_dispatch_buf+0x140/0x190
  balance_one+0x1ed/0x530
  balance_scx+0x37/0x160
  prev_balance+0x43/0xc0
  __schedule+0x952/0x23d0
  schedule+0x27/0xd0
  schedule_timeout+0x87/0x110
  rcu_gp_fqs_loop+0x119/0xfb0
  rcu_gp_kthread+0xdb/0x1a0
  kthread+0xf9/0x240
  ret_from_fork+0x31/0x50
  ret_from_fork_asm+0x1a/0x30

CPU states
----------

CPU 0   : nr_run=1 flags=0x3 cpu_rel=0 ops_qseq=563241 pnt_seq=1030857
          curr=chrome[120529] class=ext_sched_class

 *R chrome[120529] +0ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      dsq_vtime=1161345731 slice=18446744073709551615 weight=244
      cpus=fff

    scx_dump_state+0x750/0xa50
    scx_ops_error_irq_workfn+0x46/0x50
    irq_work_run_list+0x50/0x90
    irq_work_run+0x18/0x50
    __sysvec_irq_work+0x1c/0xd0
    sysvec_irq_work+0x6c/0x90
    asm_sysvec_irq_work+0x1a/0x20
    finish_task_switch.isra.0+0xa2/0x2f0
    irqentry_exit_to_user_mode+0x1c1/0x220
    asm_sysvec_apic_timer_interrupt+0x1a/0x20

Event counters
--------------
              SCX_EV_SELECT_CPU_FALLBACK:                0
       SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE:                0
               SCX_EV_DISPATCH_KEEP_LAST:                0
                 SCX_EV_ENQ_SKIP_EXITING:                0
      SCX_EV_ENQ_SKIP_MIGRATION_DISABLED:                0
                    SCX_EV_ENQ_SLICE_DFL:                8
                  SCX_EV_BYPASS_DURATION:         45439946
                  SCX_EV_BYPASS_DISPATCH:                8
                  SCX_EV_BYPASS_ACTIVATE:                1

================================================================================

EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled chrome[120529] from CPU 3 to 0)

2nd run:

Details
[SEQ 96]
total   :    473028    local:      6358   queued:         0  lost:         0
timer   :     95914 dispatch:   1014634 mismatch:      1275 retry:         0
overflow:         0
[SEQ 97]
total   :    476328    local:      6411   queued:         0  lost:         0
timer   :     96912 dispatch:   1021117 mismatch:      1287 retry:         0
overflow:         0
[SEQ 98]
total   :    479587    local:      6437   queued:         0  lost:         0
timer   :     97911 dispatch:   1027805 mismatch:      1297 retry:         0
overflow:         0
[SEQ 99]
total   :    482710    local:      6475   queued:         0  lost:         0
timer   :     98909 dispatch:   1033958 mismatch:      1307 retry:         0
overflow:         0

DEBUG DUMP
================================================================================

threaded-ml[2760] triggered exit kind 1024:
  runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled threaded-ml[2760] from CPU 4 to 0)

Backtrace:
  task_can_run_on_remote_rq+0xfe/0x160
  dispatch_to_local_dsq+0x5f/0x210
  flush_dispatch_buf+0x140/0x190
  balance_one+0x1ed/0x530
  balance_scx+0x37/0x160
  prev_balance+0x43/0xc0
  __schedule+0x952/0x23d0
  schedule_idle+0x23/0x40
  cpu_startup_entry+0x29/0x30
  rest_init+0xcc/0xd0
  start_kernel+0x989/0x990
  x86_64_start_reservations+0x24/0x30
  x86_64_start_kernel+0x95/0xa0
  common_startup_64+0x13e/0x141

CPU states
----------

CPU 0   : nr_run=1 flags=0x3 cpu_rel=0 ops_qseq=957449 pnt_seq=1053705
          curr=threaded-ml[2760] class=ext_sched_class

 *R threaded-ml[2760] +0ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      dsq_vtime=764103569 slice=0 weight=244
      cpus=fff

    scx_dump_state+0x750/0xa50
    scx_ops_error_irq_workfn+0x46/0x50
    irq_work_run_list+0x50/0x90
    irq_work_run+0x18/0x50
    __sysvec_irq_work+0x1c/0xd0
    sysvec_irq_work+0x6c/0x90
    asm_sysvec_irq_work+0x1a/0x20
    finish_task_switch.isra.0+0xa2/0x2f0
    preempt_schedule_thunk+0x16/0x30
    unix_write_space+0x5b/0x90
    sock_wfree+0x6a/0x1d0
    unix_destruct_scm+0x88/0xb0
    skb_release_head_state+0x32/0xa0
    consume_skb+0x30/0xe0
    unix_stream_read_generic+0xd27/0xe40
    unix_stream_recvmsg+0x95/0xa0
    ____sys_recvmsg+0x21d/0x230
    ___sys_recvmsg+0x14a/0x200
    __sys_recvmsg+0x82/0xe0
    do_syscall_64+0x82/0x7c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

Event counters
--------------
              SCX_EV_SELECT_CPU_FALLBACK:                0
       SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE:                0
               SCX_EV_DISPATCH_KEEP_LAST:                0
                 SCX_EV_ENQ_SKIP_EXITING:                3
      SCX_EV_ENQ_SKIP_MIGRATION_DISABLED:               53
                    SCX_EV_ENQ_SLICE_DFL:               56
                  SCX_EV_BYPASS_DURATION:         44966551
                  SCX_EV_BYPASS_DISPATCH:                0
                  SCX_EV_BYPASS_ACTIVATE:                1

================================================================================

EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled threaded-ml[2760] from CPU 4 to 0)

Looks like nixos tests are still passing, ie, not being affected.

@JohnRTitor
Copy link
Member

CC @JakeHillion an upstream dev.

And I noticed something else too, unrelated to the above issue.

❯  sudo ./result-bin/bin/scx_rusty
18:30:03 [INFO] Running scx_rusty (build ID: 1.0.13 x86_64-unknown-linux-gnu)
18:30:03 [INFO] NODE[00] mask= fff
18:30:03 [INFO]  DOM[00] mask= fff
18:30:03 [INFO] libbpf: struct_ops rusty: member priv not found in kernel, skipping it as it's so

18:30:03 [WARN] libbpf: map 'rusty': BPF skeleton version is old, skipping map auto-attachment...

I am seeing BPF skeleton version is old warning, even though we are fetching the same libbpf commit as upstream wants (as you can see in version.json) and building it with upstream's script.
https://github.com/sched-ext/scx/blob/f4f0e1804e40756507003ad395d0856e10789a72/meson.build#L141

@Gliczy
Copy link
Member Author

Gliczy commented Jun 13, 2025

I'm getting the same output as you with scx_central, I tried it again from 1.0.12 and it crashes with the same output.

Copy link
Member

@JohnRTitor JohnRTitor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait a bit for Jake's reply.

But I am leaning towards merging this. Doesn't look like people use scx_central at all on NixOS as far as I can see with a simple GitHub search.

https://github.com/search?q=scx_central+path%3A*.nix&type=code

@JohnRTitor
Copy link
Member

I tried it again from 1.0.12 and it crashes with the same output.

yeah I just tried it as well. Weirdly enough, I remember this working fine when I tested the PR for 1.0.12. Perhaps something changed in the kernel implementation of scx?

@JakeHillion
Copy link
Contributor

CC @JakeHillion an upstream dev.

And I noticed something else too, unrelated to the above issue.

❯  sudo ./result-bin/bin/scx_rusty
18:30:03 [INFO] Running scx_rusty (build ID: 1.0.13 x86_64-unknown-linux-gnu)
18:30:03 [INFO] NODE[00] mask= fff
18:30:03 [INFO]  DOM[00] mask= fff
18:30:03 [INFO] libbpf: struct_ops rusty: member priv not found in kernel, skipping it as it's so

18:30:03 [WARN] libbpf: map 'rusty': BPF skeleton version is old, skipping map auto-attachment...

I am seeing BPF skeleton version is old warning, even though we are fetching the same libbpf commit as upstream wants (as you can see in version.json) and building it with upstream's script. https://github.com/sched-ext/scx/blob/f4f0e1804e40756507003ad395d0856e10789a72/meson.build#L141

Hey, thanks for the tag. I've been seeing this for a while on my dev machine too and haven't gotten to the bottom of it. I don't think it's a big deal other than being annoying, but will check with the other maintainers and get back to you.

RE scx_central, the cscheds generally have weaker maintenance than the rust scheds, but we can take a look. Could we get your uname -r along with it? It won't be fixed quickly though so it going in like this is fine.

@JohnRTitor
Copy link
Member

I remember this working fine when I tested the PR for 1.0.12. Perhaps something changed in the kernel implementation of scx?

Logs from 1.0.12.

Details
total   :    588437    local:     11354   queued:         0  lost:         0
timer   :    182796 dispatch:   1572277 mismatch:      2175 retry:         0
overflow:         0
[SEQ 184]
total   :    594908    local:     11407   queued:         0  lost:         0
timer   :    183795 dispatch:   1589391 mismatch:      2190 retry:         0
overflow:         0
[SEQ 185]
total   :    598416    local:     11449   queued:         0  lost:         0
timer   :    184794 dispatch:   1598695 mismatch:      2202 retry:         0
overflow:         0
[SEQ 186]
total   :    607791    local:     11585   queued:         0  lost:         0
timer   :    185792 dispatch:   1621502 mismatch:      2211 retry:         0
overflow:         0

DEBUG DUMP
================================================================================

chrome[2870] triggered exit kind 1024:
  runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled chrome[2870] from CPU 4 to 0)

Backtrace:
  task_can_run_on_remote_rq+0xfe/0x160
  dispatch_to_local_dsq+0x5f/0x210
  flush_dispatch_buf+0x140/0x190
  balance_one+0x1ed/0x530
  balance_scx+0x37/0x160
  prev_balance+0x43/0xc0
  __schedule+0x952/0x23d0
  schedule_idle+0x23/0x40
  cpu_startup_entry+0x29/0x30
  rest_init+0xcc/0xd0
  start_kernel+0x989/0x990
  x86_64_start_reservations+0x24/0x30
  x86_64_start_kernel+0x95/0xa0
  common_startup_64+0x13e/0x141

CPU states
----------

CPU 0   : nr_run=2 flags=0x3 cpu_rel=0 ops_qseq=1859240 pnt_seq=1317678
          curr=chrome[2870] class=ext_sched_class
  idle_to_kick   : 002

 *R chrome[2870] +0ms
      scx_state/flags=3/0x5 dsq_flags=0x0 ops_state/qseq=0/0
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      dsq_vtime=17379802818 slice=0 weight=244
      cpus=fff

    scx_dump_state+0x750/0xa50
    scx_ops_error_irq_workfn+0x46/0x50
    irq_work_run_list+0x50/0x90
    irq_work_run+0x18/0x50
    __sysvec_irq_work+0x1c/0xd0
    sysvec_irq_work+0x6c/0x90
    asm_sysvec_irq_work+0x1a/0x20
    finish_task_switch.isra.0+0xa2/0x2f0
    asm_sysvec_apic_timer_interrupt+0x1a/0x20
    amdgpu_ttm_clear_buffer+0x7ae/0x7f0 [amdgpu]
    amdgpu_bo_create+0x451/0x520 [amdgpu]
    amdgpu_bo_create_user+0x3d/0x70 [amdgpu]
    amdgpu_gem_create_ioctl+0x17c/0x3d0 [amdgpu]
    drm_ioctl_kernel+0xb3/0x110
    drm_ioctl+0x294/0x500
    amdgpu_drm_ioctl+0x4b/0x90 [amdgpu]
    __x64_sys_ioctl+0xa0/0xe0
    do_syscall_64+0x82/0x7c0
    entry_SYSCALL_64_after_hwframe+0x76/0x7e

  R kworker/u48:2[135205] +0ms
      scx_state/flags=3/0x9 dsq_flags=0x0 ops_state/qseq=2/1859239
      sticky/holding_cpu=-1/-1 dsq_id=(n/a)
      dsq_vtime=12319911429 slice=18446744073709551615 weight=100
      cpus=fff

    kthread+0xf9/0x240
    ret_from_fork+0x31/0x50
    ret_from_fork_asm+0x1a/0x30

Event counters
--------------
              SCX_EV_SELECT_CPU_FALLBACK:                0
       SCX_EV_DISPATCH_LOCAL_DSQ_OFFLINE:                0
               SCX_EV_DISPATCH_KEEP_LAST:                0
                 SCX_EV_ENQ_SKIP_EXITING:             1676
      SCX_EV_ENQ_SKIP_MIGRATION_DISABLED:               45
                    SCX_EV_ENQ_SLICE_DFL:             1723
                  SCX_EV_BYPASS_DURATION:         53769036
                  SCX_EV_BYPASS_DISPATCH:                2
                  SCX_EV_BYPASS_ACTIVATE:                1

================================================================================

EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled chrome[2870] from CPU 4 to 0)

uname output.

❯  uname -a
Linux Ainz-NIX 6.15.1-cachyos #1-NixOS SMP PREEMPT_DYNAMIC Wed Jun  4 12:46:27 UTC 2025 x86_64 GNU/Linux

@JohnRTitor JohnRTitor merged commit 3c56f04 into NixOS:master Jun 13, 2025
8 of 12 checks passed
@Gliczy Gliczy deleted the scx-1.0.13 branch June 13, 2025 13:25
@github-actions github-actions bot added the 12.approvals: 1 This PR was reviewed and approved by one person. label Jun 13, 2025
@JohnRTitor
Copy link
Member

JohnRTitor commented Jun 13, 2025 via email

@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jun 13, 2025

Successfully created backport PR for release-25.05:

@github-actions github-actions bot added the 8.has: port to stable This PR already has a backport to the stable release. label Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants