-
Notifications
You must be signed in to change notification settings - Fork 43
Add: get_callers & get_callees tools #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- `get_callers`: Takes one or more function names/addresses, returns JSON
{"results":[{identifier,function,callers,caller_sites}], "errors":[...]}
where each entry includes normalized function metadata, every caller,
and HLIL/IL snippets for each call site.
- `get_callees`: Accepts the same identifier inputs, returns
{"results":[{identifier,function,callees,call_sites}], "errors":[...]}
listing every outgoing callee plus per-site IL context, falling back to
raw addresses when no symbol exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds two new tools for analyzing function call relationships in Binary Ninja: get_callers retrieves functions that call a given function along with call site details, and get_callees retrieves functions called by a given function. Both tools accept multiple identifiers (function names or addresses) and return JSON responses with normalized function metadata, caller/callee lists, and HLIL/IL snippets for each call site.
Key changes:
- Added identifier extraction and normalization utilities to handle comma/semicolon-separated lists across HTTP, core operations, and MCP bridge layers
- Implemented call graph traversal logic with fallback mechanisms for raw addresses when symbols are unavailable
- Extended HTTP API with
/getCallersand/getCalleesendpoints that accept flexible query parameter names
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| plugin/server/http_server.py | Added _extract_identifiers method for flexible query parameter parsing and two new GET endpoint handlers for /getCallers and /getCallees |
| plugin/core/binary_operations.py | Implemented core call graph analysis logic with helper methods for identifier normalization, function reference formatting, related function collection, and call site summarization |
| plugin/api/endpoints.py | Added thin proxy methods get_callers and get_callees that delegate to BinaryOperations |
| bridge/binja_mcp_bridge.py | Added _normalize_identifier_input helper and two new MCP tools get_callers and get_callees with comma-separated identifier support |
| README.md | Updated documentation tables to include the new MCP functions and HTTP endpoints with usage descriptions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raw = value.replace(";", ",").split(",") | ||
| tokens.extend([tok.strip() for tok in raw if tok.strip()]) | ||
| elif isinstance(value, (list, tuple, set)): | ||
| for item in value: | ||
| if item is None: | ||
| continue | ||
| tokens.extend(_normalize_identifier_input(str(item))) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recursive call to _normalize_identifier_input(str(item)) on line 817 will return a list, but the code attempts to extend tokens with this list. However, the recursive call for a string will parse it for comma/semicolon separation, which is correct. But if the original input is a list like ["a,b", "c"], the recursion will double-split, leading to unexpected behavior. The function should either not recurse for list items or handle them differently.
| raw = value.replace(";", ",").split(",") | |
| tokens.extend([tok.strip() for tok in raw if tok.strip()]) | |
| elif isinstance(value, (list, tuple, set)): | |
| for item in value: | |
| if item is None: | |
| continue | |
| tokens.extend(_normalize_identifier_input(str(item))) | |
| # Top-level string: split on comma/semicolon and normalize whitespace. | |
| raw = value.replace(";", ",").split(",") | |
| tokens.extend([tok.strip() for tok in raw if tok.strip()]) | |
| elif isinstance(value, (list, tuple, set)): | |
| # Iterable input: treat each element as an atomic identifier unless it is | |
| # itself a nested iterable of identifiers. | |
| for item in value: | |
| if item is None: | |
| continue | |
| if isinstance(item, (list, tuple, set)): | |
| tokens.extend(_normalize_identifier_input(item)) | |
| else: | |
| tok = str(item).strip() | |
| if tok: | |
| tokens.append(tok) |
| func = None | ||
| errors.append(f"{ident}: {exc}") | ||
| if not func: | ||
| errors.append(f"Function not found: {ident}") | ||
| continue |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an exception is caught during function lookup, the error is added to the errors list, but then the code continues to check if func is None and adds another error "Function not found: {ident}". This results in duplicate error messages being added for the same identifier when an exception occurs.
| func = None | ||
| errors.append(f"{ident}: {exc}") | ||
| if not func: | ||
| errors.append(f"Function not found: {ident}") | ||
| continue |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When an exception is caught during function lookup, the error is added to the errors list, but then the code continues to check if func is None and adds another error "Function not found: {ident}". This results in duplicate error messages being added for the same identifier when an exception occurs.
| def get_callers(self, identifiers: Any) -> dict[str, Any]: | ||
| """Collect caller information for the given function identifiers.""" | ||
| if not self._current_view: | ||
| raise RuntimeError("No binary loaded") | ||
|
|
||
| items = self._normalize_identifier_list(identifiers) | ||
| if not items: | ||
| raise ValueError("No function identifiers provided") | ||
|
|
||
| results: list[dict[str, Any]] = [] | ||
| errors: list[str] = [] | ||
| for ident in items: | ||
| try: | ||
| func = self.get_function_by_name_or_address(ident) | ||
| except Exception as exc: | ||
| func = None | ||
| errors.append(f"{ident}: {exc}") | ||
| if not func: | ||
| errors.append(f"Function not found: {ident}") | ||
| continue | ||
| entry = { | ||
| "identifier": str(ident), | ||
| "function": self._format_function_reference(func), | ||
| "callers": self._collect_related_functions(func, "callers"), | ||
| "caller_sites": self._summarize_call_sites(func, "callers"), | ||
| } | ||
| results.append(entry) | ||
|
|
||
| return {"results": results, "errors": errors} | ||
|
|
||
| def get_callees(self, identifiers: Any) -> dict[str, Any]: | ||
| """Collect callee information for the given function identifiers.""" | ||
| if not self._current_view: | ||
| raise RuntimeError("No binary loaded") | ||
|
|
||
| items = self._normalize_identifier_list(identifiers) | ||
| if not items: | ||
| raise ValueError("No function identifiers provided") | ||
|
|
||
| results: list[dict[str, Any]] = [] | ||
| errors: list[str] = [] | ||
| for ident in items: | ||
| try: | ||
| func = self.get_function_by_name_or_address(ident) | ||
| except Exception as exc: | ||
| func = None | ||
| errors.append(f"{ident}: {exc}") | ||
| if not func: | ||
| errors.append(f"Function not found: {ident}") | ||
| continue | ||
| entry = { | ||
| "identifier": str(ident), | ||
| "function": self._format_function_reference(func), | ||
| "callees": self._collect_related_functions(func, "callees"), | ||
| "call_sites": self._summarize_call_sites(func, "callees"), | ||
| } | ||
| results.append(entry) | ||
|
|
||
| return {"results": results, "errors": errors} |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in get_callers and get_callees is nearly identical with only minor differences in the relation attribute names ("callers" vs "callees", "caller_sites" vs "call_sites"). This duplication makes maintenance harder and increases the risk of inconsistent behavior. Consider extracting this into a common helper method that takes the relation type as a parameter.
| elif path == "/getCallers": | ||
| identifiers = self._extract_identifiers(params) | ||
| if not identifiers: | ||
| self._send_json_response( | ||
| { | ||
| "error": "Missing identifier parameter", | ||
| "help": "Provide ?identifier=<name|address> or comma-separated ?identifiers=a,b", | ||
| }, | ||
| 400, | ||
| ) | ||
| return | ||
| try: | ||
| payload = self.binary_ops.get_callers(identifiers) | ||
| except Exception as e: | ||
| bn.log_error(f"Error handling getCallers: {e}") | ||
| self._send_json_response({"error": str(e)}, 500) | ||
| else: | ||
| self._send_json_response(payload) | ||
|
|
||
| elif path == "/getCallees": | ||
| identifiers = self._extract_identifiers(params) | ||
| if not identifiers: | ||
| self._send_json_response( | ||
| { | ||
| "error": "Missing identifier parameter", | ||
| "help": "Provide ?identifier=<name|address> or comma-separated ?identifiers=a,b", | ||
| }, | ||
| 400, | ||
| ) | ||
| return | ||
| try: | ||
| payload = self.binary_ops.get_callees(identifiers) | ||
| except Exception as e: | ||
| bn.log_error(f"Error handling getCallees: {e}") | ||
| self._send_json_response({"error": str(e)}, 500) | ||
| else: | ||
| self._send_json_response(payload) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /getCallers and /getCallees endpoint handlers are nearly identical except for the method name they call. This creates maintenance overhead and increases the risk of inconsistent error handling or updates. Consider extracting the common logic into a helper method that takes the endpoint method name as a parameter.
| @mcp.tool() | ||
| def get_callers(identifiers: str) -> str: | ||
| """ | ||
| List callers and caller sites for one or more function identifiers (name or address). | ||
| Provide comma-separated identifiers like "sub_401000,main". | ||
| """ | ||
| items = _normalize_identifier_input(identifiers) | ||
| if not items: | ||
| return "Error: provide at least one identifier" | ||
| data = get_json("getCallers", {"identifiers": ",".join(items)}, timeout=None) | ||
| if not data: | ||
| return "Error: no response" | ||
| import json as _json | ||
|
|
||
| return _json.dumps(data, indent=2, ensure_ascii=False) | ||
|
|
||
|
|
||
| @mcp.tool() | ||
| def get_callees(identifiers: str) -> str: | ||
| """ | ||
| List callees and call sites for one or more function identifiers (name or address). | ||
| Provide comma-separated identifiers like "sub_401000,main". | ||
| """ | ||
| items = _normalize_identifier_input(identifiers) | ||
| if not items: | ||
| return "Error: provide at least one identifier" | ||
| data = get_json("getCallees", {"identifiers": ",".join(items)}, timeout=None) | ||
| if not data: | ||
| return "Error: no response" | ||
| import json as _json | ||
|
|
||
| return _json.dumps(data, indent=2, ensure_ascii=False) |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_callers and get_callees functions are nearly identical, differing only in the endpoint name and docstring. This duplication increases maintenance burden. Consider extracting a common helper function that accepts the endpoint name as a parameter.
| except Exception: | ||
| pass |
Copilot
AI
Dec 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| pass | |
| except Exception as exc: | |
| # Swallow errors when iterating related functions, but log for diagnostics. | |
| bn.log_debug( | |
| f"BinaryOperations._collect_related_functions: failed while processing " | |
| f"{relation_attr} for function {getattr(func, 'name', '<unknown>')}: {exc}" | |
| ) |
get_callers: Takes one or more function names/addresses, returns JSON {"results":[{identifier,function,callers,caller_sites}], "errors":[...]} where each entry includes normalized function metadata, every caller, and HLIL/IL snippets for each call site.get_callees: Accepts the same identifier inputs, returns {"results":[{identifier,function,callees,call_sites}], "errors":[...]} listing every outgoing callee plus per-site IL context, falling back to raw addresses when no symbol exists.