Skip to content

Highlight error message at the end of log#71

Merged
theihor merged 1 commit intolibbpf:masterfrom
jordalgo:error_message
Sep 2, 2025
Merged

Highlight error message at the end of log#71
theihor merged 1 commit intolibbpf:masterfrom
jordalgo:error_message

Conversation

@jordalgo
Copy link
Copy Markdown
Collaborator

Screenshot 2025-08-29 at 9 07 06 PM

@theihor
Copy link
Copy Markdown
Collaborator

theihor commented Sep 2, 2025

@jordalgo Thanks for working on this.

I don't like that we are updating a ParsedLine type depending on the line position, doesn't look reliable.

I am thinking a better approach is to directly pattern-match known messages, similarly to how we determine helper function calls, with bpf-helpers.json. But that requires some C parsing.

I made a script (with LLM help) parsing current verifier.c with tree-sitter and printing an argument of verbose() call.

#!/usr/bin/env python3
"""
Python script to find all verbose() function calls in a C file using tree-sitter.
"""

import sys
import os
try:
    import tree_sitter_c
    from tree_sitter import Language, Parser
except ImportError:
    print("Error: tree-sitter library not installed. Install with: pip install tree-sitter tree-sitter-c")
    sys.exit(1)

def setup_c_parser():
    """Set up tree-sitter parser for C language."""
    try:
        # Get the C language
        c_language = tree_sitter_c.language()

        # Create a Language object from the PyCapsule
        lang_obj = Language(c_language)

        # Create parser
        parser = Parser()
        parser.language = lang_obj
        return parser
    except Exception as e:
        print(f"Error setting up C parser: {e}")
        return None

def find_verbose_calls(node, source_lines, file_path=""):
    """
    Recursively traverse the AST to find all verbose() function calls.

    Args:
        node: Current AST node
        source_lines: List of source code lines for context
        file_path: Path to the source file for error reporting

    Returns:
        List of dictionaries containing call information
    """
    verbose_calls = []

    def traverse(node):
        # Check if this is a function call
        if node.type == 'call_expression':
            # Get the function being called
            function_node = node.child_by_field_name('function')
            if function_node and function_node.type == 'identifier':
                function_name = function_node.text.decode('utf-8')
                if function_name == 'verbose':
                    arg_list = node.child_by_field_name('arguments')
                    literal = arg_list.children[3]
                    print(literal.text.decode('utf-8'))

                    # Found a verbose() call
                    start_point = node.start_point
                    end_point = node.end_point

                    # Get the full call text
                    call_text = node.text.decode('utf-8')

                    # Store the call information
                    verbose_calls.append({
                        'line': start_point.row + 1,  # 1-based line numbering
                        'column': start_point.column + 1,  # 1-based column numbering
                        'text': call_text,
                        'end_line': end_point.row + 1,
                        'end_column': end_point.column + 1
                    })

        # Recursively traverse child nodes
        for child in node.children:
            traverse(child)

    traverse(node)
    return verbose_calls

def main():
    """Main function to parse file and find verbose() calls."""
    file_path = '/home/isolodrai/kernels/bpf-next/kernel/bpf/verifier.c'

    if len(sys.argv) > 1:
        file_path = sys.argv[1]

    # Check if file exists
    if not os.path.exists(file_path):
        print(f"Error: File '{file_path}' not found.")
        return 1

    # Set up parser
    parser = setup_c_parser()
    if parser is None:
        return 1

    try:
        # Read the file
        with open(file_path, 'r', encoding='utf-8') as f:
            source_code = f.read()

        # Parse the file
        print(f"Parsing {file_path}...")
        tree = parser.parse(bytes(source_code, 'utf-8'))

        # Find verbose() calls
        source_lines = source_code.split('\n')
        verbose_calls = find_verbose_calls(tree.root_node, source_lines, file_path)

        # Print results to stdout
        print(f"Found {len(verbose_calls)} verbose() calls:\n")

        return 0

    except FileNotFoundError:
        print(f"Error: Could not read file '{file_path}'")
        return 1
    except Exception as e:
        print(f"Error processing file: {e}")
        return 1

It produced this list of strings: https://gist.github.com/theihor/b96de6993990ea70bf5863b4b3b3c015

It needs to be cleaned up a bit, but then we can use it to determine a verifier error message at any position and highlight it. And then the head/tail logic with respect to instruction wouldn't be necessary.

We could polish the script to produce a json, and put it in the repo. What do you think?

@jordalgo
Copy link
Copy Markdown
Collaborator Author

jordalgo commented Sep 2, 2025

@theihor Have we ever seen cases where the error message doesn't directly follow the last instruction in the verifier error log? If we haven't, then building this list of possible strings feels a little over-engineered and also possibly less reliable as the verifier can change and then our list will be out of date. We can also just land this dumb solution first and see if we encounter logs where it breaks.

@theihor
Copy link
Copy Markdown
Collaborator

theihor commented Sep 2, 2025

Have we ever seen cases where the error message doesn't directly follow the last instruction in the verifier error log?

I haven't, and usually the error message is printed right before return -EACCES; or something like that.

The issue is that bpfvv can receive any fragment of a log, or multiple logs (say from veristat). Of course we can't handle all possible cases of incomplete or modified input, but since the list of the verifier errors is known (from the kernel source code), why not use it to highlight important messages.

If we haven't, then building this list of possible strings feels a little over-engineered and also possibly less reliable as the verifier can change and then our list will be out of date. We can also just land this dumb solution first and see if we encounter logs where it breaks.

We have the same problem with the helpers list, and more general problem of parsing the log from different kernel versions, so storing a list of known error messages doesn't introduce "verifier might change" problem or make it worse. Also this kind of list will be necessary anyways for "common errors" hints based on the error, that we were thinking to implement.

I understand this is a bit more work, so maybe let's park it for later, I'll open an issue.

As for now, if we are going head/tail route, I would put the logic of highlighting in the UI, and wouldn't touch the analyzer or parser. In my mind, parser determines line kind by it's content, and analyzer builds up the array of states. Patching a type of a ParsedLine after the fact based on position doesn't look right to me (although I admit it's similar to global func message handling).

How about this: remember the index of first and last ParsedLineType.INSTRUCTION in the VerifierLogState and then look at the indices when rendering log line for highlighting. No need to introduce a new message type. Thoughts?

@jordalgo
Copy link
Copy Markdown
Collaborator Author

jordalgo commented Sep 2, 2025

so storing a list of known error messages doesn't introduce "verifier might change" problem or make it worse

Ah ok. If that's the case, then what you suggested sounds fine. Truth be told, I'd rather our identification of error messages was part of the parser/analyzer (and ParsedLine type) instead of in the UI. But I'm fine to do what you suggested as a stop-gap while we work on this script integration.

We could polish the script to produce a json, and put it in the repo. What do you think?

I think the script should be written in javascript/node. Even though python is pretty ubiquitous these days, I think it's helpful if the repo can stick to one language if possible. I'm sure we can just ask the LLM to re-write it in Node 😄

@jordalgo
Copy link
Copy Markdown
Collaborator Author

jordalgo commented Sep 2, 2025

@theihor PR updated with your suggestion.


for (let i = lines.length - 1; i >= 0; --i) {
if (lines[i].type == ParsedLineType.INSTRUCTION) {
lastInsIdx = lines[i].idx;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could track this in the previous loop:

if (parsedLine === ParsedLineType.INSTRUCTION) lastInsIdx = idx;

But given we are starting from the end it's a nit.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way is probably more performant if you have a ton of log lines (no check and mutation) but it's less code - so....idk I'll probably just leave it considering it's temporary anyway

@theihor
Copy link
Copy Markdown
Collaborator

theihor commented Sep 2, 2025

@theihor PR updated with your suggestion.

This is much simpler, thanks!

@theihor theihor merged commit 2f70be7 into libbpf:master Sep 2, 2025
1 check passed
jordalgo added a commit to jordalgo/bpfvv that referenced this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants