Skip to content

Odd behaviour with socketcall multiplexer handling #476

@alip

Description

@alip

Hello kind people,

I am the main author of syd which thankfully uses libseccomp to provide a portable sandbox. In my testing I have noticed a few oddities with architectures which both have the socketcall(2) system call and newer non-multiplexed versions of the system calls as well. One example is ppc64:

$ syd-sys -a ppc64 socketcall
socketcall      102
$ syd-sys -a ppc64 send
send    334
sendto  335
sendmsg 341
sendmmsg        349
...

Now assume we want to install a portable filter that denies the MSG_OOB flag for the send(2) and recv(2) families. See the section Denying MSG_OOB Flag in send/recv System Calls on why this is relevant for a security boundary. For socketcall(2) we have no option but to divert the handling to userspace with the notify action and that's completely fine. However given you install a filter like this (excuse my rust but the idea should be fairly obvious):

            if restrict_oob {
                let oob = libc::MSG_OOB as u64;
                for (idx, sysname) in [
                    "recvmsg", "sendmsg", "send", "sendto", "sendmmsg", "recv", "recvfrom",
                    "recvmmsg",
                ]
                .iter()
                .enumerate()
                {
                    // MsgFlags is arg==2 for {recv,send}msg, and
                    //             arg==3 for send/recv, sendto/recvfrom, and sendmmsg/recvmmsg.
                    let sys = if let Ok(sys) = ScmpSyscall::from_name(sysname) {
                        sys
                    } else {
                        continue;
                    };
                    let idx = if idx <= 1 { 2 } else { 3 };
                    let err = ScmpAction::Errno(libc::EOPNOTSUPP);
                    let cmp = ScmpArgCompare::new(idx, ScmpCompareOp::MaskedEqual(oob), oob);
                    ctx.add_rule_conditional(err, sys, &[cmp])?;
                }
            }

One would expect, the non-multiplexed version of the send(2) family would be included in the filter, but it is not with the latest libseccomp and our MSG_OOB tests fails on such architectures (ppc64, x86, ...) because of this.

I have also encountered a similar problem where it is not directly possible to add notify actions to the non-multiplexed versions of the socket systemcalls. That, however, was possible to workaround:

    /// Insert a system call handler.
    #[expect(clippy::cognitive_complexity)]
    #[expect(clippy::disallowed_methods)]
    fn insert_handler(
        handlers: &mut HandlerMap,
        syscall_name: &'static str,
        handler: impl Fn(UNotifyEventRequest) -> ScmpNotifResp + Clone + Send + Sync + 'static,
    ) {
        for arch in SCMP_ARCH {
            if let Ok(sys) = ScmpSyscall::from_name_by_arch(syscall_name, *arch) {
                #[expect(clippy::disallowed_methods)]
                handlers
                    .insert(
                        Sydcall(sys, scmp_arch_raw(*arch)),
                        Arc::new(Box::new(handler.clone())),
                    )
                    .unwrap();
            } else {
                info!("ctx": "confine", "op": "hook_syscall",
                    "msg": format!("invalid or unsupported syscall {syscall_name}"));
            }

            // Support the new non-multiplexed ipc syscalls.
            if IPC_ARCH.contains(arch) {
                let sys_ipc = match syscall_name {
                    "shmat" => Some(397),
                    "msgctl" => Some(402),
                    "semctl" => Some(394),
                    "shmctl" => Some(396),
                    "msgget" => Some(399),
                    "semget" => Some(393),
                    "shmget" => Some(395),
                    _ => None,
                };

                if let Some(sys) = sys_ipc {
                    #[expect(clippy::disallowed_methods)]
                    handlers
                        .insert(
                            Sydcall(ScmpSyscall::from(sys), scmp_arch_raw(*arch)),
                            Arc::new(Box::new(handler.clone())),
                        )
                        .unwrap();
                    continue;
                }
            }

            // Support the new non-multiplexed network syscalls on MIPS, PPC, S390 & X86.
            let sys = match *arch {
                ScmpArch::M68k => match syscall_name {
                    "socket" => 356,
                    "bind" => 358,
                    // no accept on m68k.
                    "accept4" => 361,
                    "connect" => 359,
                    "getpeername" => 365,
                    "getsockname" => 364,
                    "getsockopt" => 362,
                    "recvfrom" => 368,
                    "sendto" => 366,
                    "sendmsg" => 367,
                    "sendmmsg" => 372,
                    _ => continue,
                },
                ScmpArch::Mips | ScmpArch::Mipsel => match syscall_name {
                    "socket" => 183,
                    "bind" => 169,
                    "accept" => 168,
                    "accept4" => 334,
                    "connect" => 170,
                    "getpeername" => 171,
                    "getsockname" => 172,
                    "getsockopt" => 173,
                    "recvfrom" => 176,
                    "sendto" => 180,
                    "sendmsg" => 179,
                    "sendmmsg" => 343,
                    _ => continue,
                },
                ScmpArch::Ppc | ScmpArch::Ppc64 | ScmpArch::Ppc64Le => match syscall_name {
                    "socket" => 326,
                    "bind" => 327,
                    "accept" => 330,
                    "accept4" => 344,
                    "connect" => 328,
                    "getpeername" => 332,
                    "getsockname" => 331,
                    "getsockopt" => 340,
                    "recvfrom" => 337,
                    "sendto" => 335,
                    "sendmsg" => 341,
                    "sendmmsg" => 349,
                    _ => continue,
                },
                ScmpArch::S390X | ScmpArch::S390 => match syscall_name {
                    "socket" => 359,
                    "bind" => 361,
                    // no accept on s390x.
                    "accept4" => 364,
                    "connect" => 362,
                    "getpeername" => 368,
                    "getsockname" => 367,
                    "getsockopt" => 365,
                    "recvfrom" => 371,
                    "sendto" => 369,
                    "sendmsg" => 370,
                    "sendmmsg" => 358,
                    _ => continue,
                },
                ScmpArch::X86 => match syscall_name {
                    "socket" => 359,
                    "bind" => 361,
                    // no accept on x86.
                    "accept4" => 364,
                    "connect" => 362,
                    "getpeername" => 368,
                    "getsockname" => 367,
                    "getsockopt" => 365,
                    "recvfrom" => 371,
                    "sendto" => 369,
                    "sendmsg" => 370,
                    "sendmmsg" => 345,
                    _ => continue,
                },
                _ => continue,
            };

            handlers
                .insert(
                    Sydcall(ScmpSyscall::from(sys), scmp_arch_raw(*arch)),
                    Arc::new(Box::new(handler.clone())),
                )
                .unwrap();

            #[expect(clippy::arithmetic_side_effects)]
            if matches!(*arch, ScmpArch::Mips | ScmpArch::Mipsel) {
                // This is a libseccomp oddity,
                // it could be a bug in the syscall multiplexer.
                // TODO: Investigate and submit a bug report.
                handlers
                    .insert(
                        Sydcall(ScmpSyscall::from(sys + 4000), scmp_arch_raw(*arch)),
                        Arc::new(Box::new(handler.clone())),
                    )
                    .unwrap();
            }
        }
    }

Admittedly, it's a bit annoying to hardcode all these but it works.

I do not know whether this oddity is a bug but I would expect the socketcall(2) and ipc(2) multiplexing handling in libseccomp to handle these behind me. Is this possible? Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions