Highlight error message at the end of log#71
Conversation
jordalgo
commented
Aug 30, 2025
5a05904 to
6440617
Compare
|
@jordalgo Thanks for working on this. I don't like that we are updating a ParsedLine type depending on the line position, doesn't look reliable. I am thinking a better approach is to directly pattern-match known messages, similarly to how we determine helper function calls, with bpf-helpers.json. But that requires some C parsing. I made a script (with LLM help) parsing current verifier.c with tree-sitter and printing an argument of #!/usr/bin/env python3
"""
Python script to find all verbose() function calls in a C file using tree-sitter.
"""
import sys
import os
try:
import tree_sitter_c
from tree_sitter import Language, Parser
except ImportError:
print("Error: tree-sitter library not installed. Install with: pip install tree-sitter tree-sitter-c")
sys.exit(1)
def setup_c_parser():
"""Set up tree-sitter parser for C language."""
try:
# Get the C language
c_language = tree_sitter_c.language()
# Create a Language object from the PyCapsule
lang_obj = Language(c_language)
# Create parser
parser = Parser()
parser.language = lang_obj
return parser
except Exception as e:
print(f"Error setting up C parser: {e}")
return None
def find_verbose_calls(node, source_lines, file_path=""):
"""
Recursively traverse the AST to find all verbose() function calls.
Args:
node: Current AST node
source_lines: List of source code lines for context
file_path: Path to the source file for error reporting
Returns:
List of dictionaries containing call information
"""
verbose_calls = []
def traverse(node):
# Check if this is a function call
if node.type == 'call_expression':
# Get the function being called
function_node = node.child_by_field_name('function')
if function_node and function_node.type == 'identifier':
function_name = function_node.text.decode('utf-8')
if function_name == 'verbose':
arg_list = node.child_by_field_name('arguments')
literal = arg_list.children[3]
print(literal.text.decode('utf-8'))
# Found a verbose() call
start_point = node.start_point
end_point = node.end_point
# Get the full call text
call_text = node.text.decode('utf-8')
# Store the call information
verbose_calls.append({
'line': start_point.row + 1, # 1-based line numbering
'column': start_point.column + 1, # 1-based column numbering
'text': call_text,
'end_line': end_point.row + 1,
'end_column': end_point.column + 1
})
# Recursively traverse child nodes
for child in node.children:
traverse(child)
traverse(node)
return verbose_calls
def main():
"""Main function to parse file and find verbose() calls."""
file_path = '/home/isolodrai/kernels/bpf-next/kernel/bpf/verifier.c'
if len(sys.argv) > 1:
file_path = sys.argv[1]
# Check if file exists
if not os.path.exists(file_path):
print(f"Error: File '{file_path}' not found.")
return 1
# Set up parser
parser = setup_c_parser()
if parser is None:
return 1
try:
# Read the file
with open(file_path, 'r', encoding='utf-8') as f:
source_code = f.read()
# Parse the file
print(f"Parsing {file_path}...")
tree = parser.parse(bytes(source_code, 'utf-8'))
# Find verbose() calls
source_lines = source_code.split('\n')
verbose_calls = find_verbose_calls(tree.root_node, source_lines, file_path)
# Print results to stdout
print(f"Found {len(verbose_calls)} verbose() calls:\n")
return 0
except FileNotFoundError:
print(f"Error: Could not read file '{file_path}'")
return 1
except Exception as e:
print(f"Error processing file: {e}")
return 1It produced this list of strings: https://gist.github.com/theihor/b96de6993990ea70bf5863b4b3b3c015 It needs to be cleaned up a bit, but then we can use it to determine a verifier error message at any position and highlight it. And then the head/tail logic with respect to instruction wouldn't be necessary. We could polish the script to produce a json, and put it in the repo. What do you think? |
|
@theihor Have we ever seen cases where the error message doesn't directly follow the last instruction in the verifier error log? If we haven't, then building this list of possible strings feels a little over-engineered and also possibly less reliable as the verifier can change and then our list will be out of date. We can also just land this dumb solution first and see if we encounter logs where it breaks. |
I haven't, and usually the error message is printed right before The issue is that bpfvv can receive any fragment of a log, or multiple logs (say from veristat). Of course we can't handle all possible cases of incomplete or modified input, but since the list of the verifier errors is known (from the kernel source code), why not use it to highlight important messages.
We have the same problem with the helpers list, and more general problem of parsing the log from different kernel versions, so storing a list of known error messages doesn't introduce "verifier might change" problem or make it worse. Also this kind of list will be necessary anyways for "common errors" hints based on the error, that we were thinking to implement. I understand this is a bit more work, so maybe let's park it for later, I'll open an issue. As for now, if we are going head/tail route, I would put the logic of highlighting in the UI, and wouldn't touch the analyzer or parser. In my mind, parser determines line kind by it's content, and analyzer builds up the array of states. Patching a type of a ParsedLine after the fact based on position doesn't look right to me (although I admit it's similar to global func message handling). How about this: remember the index of first and last |
Ah ok. If that's the case, then what you suggested sounds fine. Truth be told, I'd rather our identification of error messages was part of the parser/analyzer (and ParsedLine type) instead of in the UI. But I'm fine to do what you suggested as a stop-gap while we work on this script integration.
I think the script should be written in javascript/node. Even though python is pretty ubiquitous these days, I think it's helpful if the repo can stick to one language if possible. I'm sure we can just ask the LLM to re-write it in Node 😄 |
6440617 to
6759585
Compare
|
@theihor PR updated with your suggestion. |
|
|
||
| for (let i = lines.length - 1; i >= 0; --i) { | ||
| if (lines[i].type == ParsedLineType.INSTRUCTION) { | ||
| lastInsIdx = lines[i].idx; |
There was a problem hiding this comment.
We could track this in the previous loop:
if (parsedLine === ParsedLineType.INSTRUCTION) lastInsIdx = idx;
But given we are starting from the end it's a nit.
There was a problem hiding this comment.
The current way is probably more performant if you have a ton of log lines (no check and mutation) but it's less code - so....idk I'll probably just leave it considering it's temporary anyway
This is much simpler, thanks! |