Skip to content

[libclang] Python binding detects parsed function's arguments incorrectly #107097

@ghost

Description

Hi everyone,

I'm trying to parse some header files to extract signatures using clang.cindex in Python. Sometimes it detects types as names wrongly.

The following code from musl is an example:

ssize_t recvfrom (int, void *__restrict, size_t, int, struct sockaddr *__restrict, socklen_t *__restrict);

from: https://git.musl-libc.org/cgit/musl/tree/include/sys/socket.h#n397

And I get size_t as the name of the next argument! ->

ssize_t recvfrom(int , void *restrict , int size_t, int , struct sockaddr *restrict , socklen_t *restrict )

My code:

import sys
import clang.cindex


header_file = sys.argv[1]

index = clang.cindex.Index.create()
tu = index.parse(header_file)

for cursor in tu.cursor.get_children():
    if cursor.kind == clang.cindex.CursorKind.FUNCTION_DECL:

        print(cursor.result_type.spelling, cursor.spelling, end='(')

        is_first_arg = False
        for arg in cursor.get_arguments():
            if not is_first_arg:
                is_first_arg = True
            else:
                print(', ', end='')

            print(arg.type.spelling, arg.spelling, end='')

        print(')')

My command:

python poc.py musl-1.2.5/include/sys/socket.h

The output:

__uint16_t __bswap_16(__uint16_t __bsx)
__uint32_t __bswap_32(__uint32_t __bsx)
__uint64_t __bswap_64(__uint64_t __bsx)
__uint16_t __uint16_identity(__uint16_t __x)
__uint32_t __uint32_identity(__uint32_t __x)
__uint64_t __uint64_identity(__uint64_t __x)
int select(int __nfds, fd_set *restrict __readfds, fd_set *restrict __writefds, fd_set *restrict __exceptfds, struct timeval *restrict __timeout)
int pselect(int __nfds, fd_set *restrict __readfds, fd_set *restrict __writefds, fd_set *restrict __exceptfds, const struct timespec *restrict __timeout, const __sigset_t *restrict __sigmask)
struct cmsghdr * __cmsg_nxthdr(struct msghdr * __mhdr, struct cmsghdr * __cmsg)
int socket(int , int , int )
int socketpair(int , int , int , int[2] )
int shutdown(int , int )
int bind(int , const struct sockaddr * , socklen_t )
int connect(int , const struct sockaddr * , socklen_t )
int listen(int , int )
int accept(int , struct sockaddr *restrict , socklen_t *restrict )
int accept4(int , struct sockaddr *restrict , socklen_t *restrict , int )
int getsockname(int , struct sockaddr *restrict , socklen_t *restrict )
int getpeername(int , struct sockaddr *restrict , socklen_t *restrict )
ssize_t send(int , const void * , int size_t, int )
ssize_t recv(int , void * , int size_t, int )
ssize_t sendto(int , const void * , int size_t, int , const struct sockaddr * , socklen_t )
ssize_t recvfrom(int , void *restrict , int size_t, int , struct sockaddr *restrict , socklen_t *restrict )
ssize_t sendmsg(int , const struct msghdr * , int )
ssize_t recvmsg(int , struct msghdr * , int )
int getsockopt(int , int , int , void *restrict , socklen_t *restrict )
int setsockopt(int , int , int , const void * , socklen_t )
int sockatmark(int )

The result of clang looks to be fine when I try this command:

clang -Xclang -ast-dump=json -fsyntax-only include/sys/socket.h

The issue seems to be in clang_getCursorSpelling() in clang/tools/libclang/CIndex.cpp not the Python binding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:as-a-librarylibclang and C++ APIquestionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions