Skip to content

Conversation

@visz11
Copy link
Collaborator

@visz11 visz11 commented Oct 14, 2025

CodeAnt-AI Description

Expose operand_offset (instruction address offset) to Sleigh language users

What Changed

  • Added a new built-in symbol named operand_offset that evaluates to the current instruction's address offset and can be used in Sleigh patterns, expressions, and pcode specifications.
  • The Sleigh compiler, parsers, assembler, and decompiler now recognize operand_offset in language files and grammar; it is persisted/restored in XML as offset_exp/offset_sym so specs using it parse and compile correctly.
  • When used in printing or symbol contexts, operand_offset is rendered as a hex value (e.g., 0x...) representing the instruction start address.

Impact

✅ Can reference instruction address offset in Sleigh patterns
✅ Fewer Sleigh spec parse/compile errors when using operand_offset
✅ Disassembly/decompiler can show instruction offset where specified

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features

    • Added a built-in symbol to reference the current instruction’s operand offset in Sleigh, available in decompiler, assembler, and p-code snippet contexts.
    • Enabled parsing and evaluation of a new offset expression in pattern/expression definitions and XML import/export.
    • Extended symbol tables and restoration to recognize and persist the new offset symbol.
  • Style

    • Minor whitespace tidy-up in attributes configuration.

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@gemini-code-assist
Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the 'operand_offset' symbol into Ghidra's Sleigh language, providing a new mechanism for Sleigh specifications to access the offset of the current instruction. This enhancement allows for more precise and flexible definition of instruction semantics and pattern matching, particularly useful in scenarios where the instruction's relative position is a critical factor. The changes involve comprehensive updates to the Sleigh parser, compiler, semantic representation, and symbol management across both the C++ and Java codebases.

Highlights

  • Introduced operand_offset Symbol: A new symbol, 'operand_offset', has been added to the Sleigh language, allowing direct reference to the offset of the current instruction within patterns and semantics.
  • Core Sleigh Parser and Compiler Integration: The Sleigh parser (pcodeparse.y, slghparse.y, slghscan.l) and compiler (slgh_compile.cc, SleighCompiler.g) have been updated to recognize, parse, and process this new 'operand_offset' symbol.
  • Semantic Representation (ConstTpl and PatternExpression): A new constant type 'j_offset' was added to ConstTpl to represent the instruction offset, and corresponding OperandOffsetValue classes were created in both C++ and Java to handle its pattern expression and value retrieval.
  • Cross-Language Implementation: The changes are consistently applied across both the C++ decompiler components and the Java-based Ghidra framework, ensuring full support for the new symbol in both environments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Oct 14, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 14, 2025

Walkthrough

Adds a new OffsetSymbol (“operand_offset”) across Sleigh/decompiler: grammar tokens, lexers, parsers, pattern-expression nodes, symbol classes, XML (save/restore), predefined symbol registration, and p-code/assembler integration. Extends ConstTpl with j_offset and corresponding XML tags/handling. Minor whitespace tweak in .gitattributes.

Changes

Cohort / File(s) Summary
Grammar & Lexer (C++ Decompiler)
Ghidra/Features/Decompiler/src/decompile/cpp/pcodeparse.y, .../slghparse.y, .../slghscan.l
Introduces OFFSETSYM token and OffsetSymbol * in unions; lexer returns OFFSETSYM for offset_symbol; grammar accepts OFFSETSYM in symbol productions.
Sleigh Symbols (C++ Decompiler)
.../slghsymbol.hh, .../slghsymbol.cc
Adds OffsetSymbol class and integrates into SleighSymbol::symbol_type and SymbolTable::restoreSymbolHeader via "offset_sym_head".
Pattern Expressions (C++ Decompiler)
.../slghpatexpress.hh, .../slghpatexpress.cc
Adds OperandOffsetValue and supports "offset_exp" in PatternExpression::restoreExpression.
Semantics/ConstTpl (C++ Decompiler)
.../semantics.hh, .../semantics.cc
Adds ConstTpl::j_offset; handles fix/saveXml/restoreXml with operand offset tag.
Sleigh Compilation (C++ Decompiler)
.../slgh_compile.cc
Registers predefined OffsetSymbol named "operand_offset".
ANTLR Sleigh Grammar (Java)
Ghidra/Framework/SoftwareModeling/src/main/antlr/ghidra/sleigh/grammar/SleighCompiler.g
Permits offset_symbol in various symbol/use sites and expression derivations.
Assembler Builder (Java)
.../sleigh/SleighAssemblerBuilder.java
Skips OffsetSymbol during grammar build, mirroring StartSymbol.
Pattern Expressions (Java Sleigh)
.../sleigh/expression/PatternExpression.java, .../sleigh/expression/OffsetInstructionValue.java
Adds "offset_exp" handling; introduces OffsetInstructionValue to yield current operand offset.
Sleigh Symbols (Java Sleigh)
.../sleigh/symbol/OffsetSymbol.java, .../sleigh/symbol/SymbolTable.java
Adds Java OffsetSymbol with fixed-handle/print/XML restore; symbol-table recognizes "offset_sym_head".
pcodeCPort Pattern/Const/Symbols (Java)
.../pcodeCPort/semantics/ConstTpl.java, .../pcodeCPort/slghpatexpress/PatternExpression.java, .../pcodeCPort/slghpatexpress/OffsetInstructionValue.java, .../pcodeCPort/slghsymbol/OffsetSymbol.java, .../pcodeCPort/slghsymbol/SymbolTable.java, .../pcodeCPort/slghsymbol/symbol_type.java, .../slgh_compile/SleighCompile.java, .../slgh_compile/Yylval.java
Adds j_offset, "offset"/"offset_exp" XML handling, OffsetInstructionValue, OffsetSymbol, symbol-type enum value, predefined "operand_offset", Yylval field offsetsym, and symbol-table restoration for offset symbol.
P-code Parser (Java)
.../program/model/lang/PcodeParser.java
Predefines "operand_offset" OffsetSymbol in parser symbolMap.
Repo config
.gitattributes
Adds trailing newline in binary patterns section (whitespace only).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Dev as Sleigh Author
  participant SC as SleighCompile
  participant ST as SymbolTable
  participant XML as XML (save/restore)
  note over SC,ST: Build time
  Dev->>SC: Compile SLEIGH
  SC->>ST: Register predefined "operand_offset" (OffsetSymbol)
  ST-->>Dev: Symbol available in language

  participant P as Parser/Lexer
  participant PE as PatternExpression
  participant RT as Runtime Eval
  note over P,RT: Parse + Evaluate
  Dev->>P: Use [operand_offset] in patterns
  P->>PE: Restore "offset_exp"
  PE-->>P: Offset(Operand) PatternValue
  P->>RT: Build instruction semantics
  RT->>RT: walker.getOffset(-1)
  RT-->>Dev: Concrete offset value
Loading
sequenceDiagram
  autonumber
  participant Cpp as Decompiler C++ side
  participant Java as Java pcodeCPort/Sleigh
  participant IO as XML
  note over Cpp,Java: Cross-language ConstTpl
  Cpp->>IO: saveXml type="operand_offset" (j_offset)
  Java->>IO: saveXml type="offset" (j_offset)
  IO-->>Cpp: restoreXml -> j_offset
  IO-->>Java: restoreXml -> j_offset
  note over Cpp,Java: Both sides understand operand offset constants
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

I nibble bytes and hop through code,
A fresh new symbol joins the road—
[operand_offset], neat and bright,
Guides my jumps by offset light.
I twitch my ears at XML,
j_offset sings its hex-byte spell.
Thump-thump—deploy! All’s well. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title includes an extraneous prefix “Clone” and uses awkward phrasing that does not clearly or concisely summarize the core change of adding operand offset symbol support. It fails to meet the requirement of a clear, specific single-sentence title that highlights the main contribution. The current phrasing may confuse reviewers scanning the project history. Rename the pull request to a concise title focused on the primary change, for example “Introduce operand offset symbol support,” removing unrelated or misleading words like “Clone.” This will improve clarity and ensure the project history accurately reflects the intent of the change.
Docstring Coverage ⚠️ Warning Docstring coverage is 20.29% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-introduce-operand-offset

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Sleigh symbol, operand_offset, which appears to be an alias for inst_start. The changes are extensive, touching many files across both the C++ and Java codebases, including parsers, semantic analysis, and serialization logic. While the overall approach of cloning existing inst_start logic seems sound, there are several critical and high-severity issues related to copy-paste errors, particularly in the Java serialization code, which could lead to incorrect behavior. I've also pointed out several medium-severity issues regarding misleading comments and Javadoc that should be addressed to improve code clarity and maintainability.

Comment on lines +89 to +93
public void saveXml(PrintStream s) {
s.append("<start_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This appears to be a copy-paste error from StartSymbol.java. The XML tag for saveXml should be <offset_sym... instead of <start_sym.... This is a serialization bug and will cause incorrect output.

Suggested change
public void saveXml(PrintStream s) {
s.append("<start_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}
public void saveXml(PrintStream s) {
s.append("<offset_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}

Comment on lines +96 to +100
public void saveXmlHeader(PrintStream s) {
s.append("<start_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to saveXml, this saveXmlHeader method has a copy-paste error from StartSymbol.java. The XML tag should be <offset_sym_head... instead of <start_sym_head.... This will cause incorrect serialization.

Suggested change
public void saveXmlHeader(PrintStream s) {
s.append("<start_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}
public void saveXmlHeader(PrintStream s) {
s.append("<offset_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}

Comment on lines +301 to +303
case j_offset:
s.append("offset\"/>");
break;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is an inconsistency in the XML serialization between the Java and C++ implementations for j_offset. The Java code uses "offset", while the C++ code (semantics.cc) uses "operand_offset". This can cause compatibility issues. It's best to use the more descriptive "operand_offset" in both places.

Suggested change
case j_offset:
s.append("offset\"/>");
break;
case j_offset:
s.append("operand_offset\"/>");
break;

Comment on lines +358 to +360
else if (typestring.equals("offset")) {
type = const_type.j_offset;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To match the C++ implementation and ensure XML compatibility, restoreXml should expect "operand_offset" instead of "offset". This is consistent with the suggested change in saveXml.

Suggested change
else if (typestring.equals("offset")) {
type = const_type.j_offset;
}
else if (typestring.equals("operand_offset")) {
type = const_type.j_offset;
}

Comment on lines +124 to +125
case j_offset:
return walker.getAddr().getOffset(); // Fill in starting address placeholder with real address

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for case j_offset appears to be a copy-paste from case j_start. This is misleading as it doesn't mention that this case handles the operand_offset symbol. Please update the comment for clarity.

  case j_offset:
    return walker.getAddr().getOffset(); // Fill in operand offset placeholder with instruction address

{
hand.space = walker.getCurSpace();
hand.offset_space = (AddrSpace *)0;
hand.offset_offset = walker.getAddr().getOffset(); // Get starting address of instruction

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is a copy from StartSymbol and is a bit generic. To improve clarity, it would be better to make it specific to operand_offset.

  hand.offset_offset = walker.getAddr().getOffset(); // Get instruction offset for operand_offset symbol

Comment on lines +573 to +575
else if (sym instanceof OffsetSymbol) {
// Ignore. We handle inst_start in semantic processing
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment here seems to be a copy-paste error from the StartSymbol case. It should refer to operand_offset instead of inst_start to avoid confusion.

Suggested change
else if (sym instanceof OffsetSymbol) {
// Ignore. We handle inst_start in semantic processing
}
else if (sym instanceof OffsetSymbol) {
// Ignore. We handle operand_offset in semantic processing
}

Comment on lines +28 to +32
/**
*
*
* The offset value of the current instructions address
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Javadoc for this class has some formatting issues (e.g., an extra blank line) and is quite brief. Improving it would enhance code clarity.

/**
 * The offset value of the current instruction's address, for the operand_offset symbol
 */

Comment on lines +30 to +35
/**
*
*
* TripleSymbol with semantic value equal to offset of instruction's
* current address
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Javadoc for this class could be improved. It has an extra blank line and refers to TripleSymbol, which is a superclass but SpecificSymbol would be more precise. A more descriptive comment would be beneficial.

/**
 * Symbol representing the offset of the current instruction's address,
 * referred to as 'operand_offset' in Sleigh.
 */

else if (typestring=="start") {
type = j_start;
}
else if (typestring=="operand_offset") {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Accept a legacy alias when restoring XML by recognizing both "operand_offset" and the older "offset" string to improve backward compatibility. [best practice]

Suggested change
else if (typestring=="operand_offset") {
else if (typestring=="operand_offset" || typestring=="offset") {
Why Change? ⭐

The change simply extends the existing string comparison to accept a legacy alias ("offset") in addition to the new "operand_offset" token.

  • Syntax: The expression (typestring == "operand_offset" || typestring == "offset") is valid C++ for std::string comparisons.
  • Context: restoreXml already uses typestring (const string&) to select the ConstTpl::type, and saveXml emits "operand_offset". Accepting "offset" only affects parsing and maps it to the same enum value j_offset.
  • Safety: No new dependencies or calls are introduced. There is no modification of surrounding logic or state, and no ambiguity introduced because both tokens now explicitly map to the same enum case.
  • Assumptions: typestring is std::string as shown in the surrounding code; there is no other branch that relies on "offset" meaning something different in this same restoreXml switch. Given the PR diff, this is a backward-compatibility improvement and does not change runtime behavior for existing "operand_offset" XML.
    Therefore the improved code is syntactically correct, executable, and safe to merge.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/assembler/sleigh/SleighAssemblerBuilder.java (1)

573-575: Update comment to reflect offset handling.

The logic is correct to ignore OffsetSymbol during grammar construction, mirroring the treatment of StartSymbol. However, the comment should be updated for clarity.

Apply this diff to update the comment:

 				else if (sym instanceof OffsetSymbol) {
-					// Ignore. We handle inst_start in semantic processing
+					// Ignore. We handle operand_offset in semantic processing
 				}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2abfd4e and 099fc98.

📒 Files selected for processing (26)
  • .gitattributes (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/pcodeparse.y (4 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/semantics.cc (3 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/semantics.hh (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slgh_compile.cc (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghparse.y (4 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.cc (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.hh (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghscan.l (1 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.cc (2 hunks)
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.hh (2 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/antlr/ghidra/sleigh/grammar/SleighCompiler.g (6 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/assembler/sleigh/SleighAssemblerBuilder.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/OffsetInstructionValue.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/PatternExpression.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/OffsetSymbol.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/SymbolTable.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/semantics/ConstTpl.java (3 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slgh_compile/SleighCompile.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slgh_compile/Yylval.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/OffsetInstructionValue.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/PatternExpression.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/SymbolTable.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/symbol_type.java (1 hunks)
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/lang/PcodeParser.java (1 hunks)
🔇 Additional comments (23)
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/PatternExpression.java (1)

52-53: LGTM!

The addition of offset_exp handling follows the established pattern for instruction-value expressions. The placement between start_exp and end_exp is logical and maintains consistency with similar expression types.

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/SymbolTable.java (1)

153-154: LGTM!

The addition of offset_sym_head handling is consistent with the existing symbol restoration pattern. The placement between start_sym_head and end_sym_head maintains logical grouping with other instruction-related symbols.

Ghidra/Features/Decompiler/src/decompile/cpp/semantics.hh (1)

37-37: LGTM!

The addition of j_offset=13 to the const_type enum follows the sequential numbering convention and extends the constant template types to support offset-based operands.

Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.cc (1)

481-482: LGTM!

The addition of offset_exp handling in the C++ implementation mirrors the Java version and follows the established pattern for restoring pattern expressions. The placement maintains consistency with the instruction-value expression hierarchy.

Ghidra/Framework/SoftwareModeling/src/main/antlr/ghidra/sleigh/grammar/SleighCompiler.g (6)

345-345: LGTM!

The addition of offset_symbol to the specific_symbol rule enables offset symbols to be recognized as valid specific symbols during parsing, consistent with other instruction-related symbols.


843-843: LGTM!

The addition of offset_symbol to pattern_symbol allows offset symbols to be used in pattern expressions, maintaining consistency with other instruction-value symbols.


877-877: LGTM!

The addition of offset_symbol to pattern_symbol2 ensures offset symbols can be used in all pattern expression contexts, completing the integration with the pattern expression framework.


949-949: LGTM!

The addition of offset_symbol to the globalset context enables offset symbols to participate in context-setting operations, consistent with other specific symbol types.


1177-1177: LGTM!

The addition of offset_symbol to assignment statement handling allows offset symbols to be used as lvalues, maintaining parity with other instruction-related symbols.


1494-1494: LGTM!

The addition of offset_symbol to the function application handling enables offset symbols to be used in subpiece operations, completing the semantic integration.

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/OffsetInstructionValue.java (1)

28-67: LGTM!

The OffsetInstructionValue class is well-structured and follows the established pattern for instruction-value classes. The implementation correctly:

  • Extends PatternValue with proper constructor delegation
  • Retrieves the offset via pos.getOffset(-1)
  • Returns appropriate min/max values (0) for a runtime-determined offset
  • Implements XML serialization with the offset_exp tag
Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.hh (1)

157-169: LGTM!

The OperandOffsetValue class is the C++ counterpart to the Java OffsetInstructionValue and follows the established pattern. The implementation correctly:

  • Derives from PatternValue with appropriate virtual method overrides
  • Retrieves offset via walker.getOffset(-1), consistent with the Java implementation
  • Returns appropriate min/max values (0) for a runtime-determined offset
  • Uses the offset_exp XML tag for serialization consistency
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/PatternExpression.java (1)

87-89: LGTM! Consistent pattern for offset expression handling.

The addition of the "offset_exp" branch follows the same pattern as other instruction value types (start_exp, end_exp, etc.), creating an OffsetInstructionValue and delegating restoration to the standard flow.

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/SymbolTable.java (1)

300-302: LGTM! Proper symbol header restoration.

The "offset_sym_head" handling is correctly positioned between start_sym_head and end_sym_head, following the established pattern for symbol restoration.

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/semantics/ConstTpl.java (2)

49-50: LGTM! Enum extension is properly placed.

Adding j_offset at the end of the existing constants is safe and avoids breaking existing ordinal values.


301-303: LGTM! XML serialization/deserialization correctly implemented.

The XML handling for j_offset is consistent with other constant types, using "offset" as the type string in both save and restore operations.

Also applies to: 358-360

Ghidra/Features/Decompiler/src/decompile/cpp/pcodeparse.y (1)

42-42: LGTM! Parser integration follows established patterns.

The OFFSETSYM token is properly declared, added to the union, integrated into grammar rules, and mapped in the lexer—all consistent with how other symbol types (STARTSYM, ENDSYM, etc.) are handled.

Also applies to: 82-82, 230-230, 758-760

Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.hh (2)

30-30: LGTM! Enum value logically positioned.

Placing offset_symbol between start_symbol and end_symbol is consistent with the semantic relationship of these instruction-relative symbols.


379-394: LGTM! Class declaration follows established patterns.

The OffsetSymbol class properly extends SpecificSymbol and declares all necessary methods and members consistent with similar classes (StartSymbol, EndSymbol).

Ghidra/Features/Decompiler/src/decompile/cpp/slghparse.y (1)

62-62: LGTM! Consistent parser integration.

The OFFSETSYM integration mirrors the pcodeparse.y changes and correctly includes it in both specificsymbol and anysymbol grammar rules, enabling proper recognition across different parsing contexts.

Also applies to: 127-127, 509-509, 585-585

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/OffsetSymbol.java (2)

44-46: LGTM! Core methods properly implemented.

The pattern expression, fixed handle retrieval, and printing methods are correctly implemented, providing the expected offset-based semantics.

Also applies to: 52-57, 63-66, 69-71


76-80: LGTM! XML restoration uses correct element name.

The restoreXml method correctly uses "offset_sym" which matches the expected element name in the restoration flow.

Note: This implementation appears to lack saveXml and saveXmlHeader methods. Verify whether these are inherited from the base class or if they need explicit implementation.

Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java (1)

52-57: LGTM! Core implementation is correct.

The constructor, VarnodeTpl generation, getFixedHandle, print, and restoreXml methods are properly implemented, providing the expected offset-based functionality consistent with other instruction-relative symbols.

Also applies to: 68-73, 75-80, 103-107

Comment on lines +124 to 126
case j_offset:
return walker.getAddr().getOffset(); // Fill in starting address placeholder with real address
case j_next:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

operand_offset currently resolves to the instruction start.

ConstTpl::fix() returns walker.getAddr().getOffset() for j_offset, which is identical to j_start. As a result the new operand_offset symbol never yields the operand’s byte offset; it just repeats the instruction start address. That breaks the feature everywhere this constant is consumed. Use the parser walker’s operand-offset accessor (same one used by OffsetInstructionValue) instead:

-  case j_offset:
-    return walker.getAddr().getOffset(); // Fill in starting address placeholder with real address
+  case j_offset:
+    return walker.getOffset(-1); // Fill in operand-offset placeholder with actual operand position
🤖 Prompt for AI Agents
In Ghidra/Features/Decompiler/src/decompile/cpp/semantics.cc around lines
124-126, ConstTpl::fix() currently returns walker.getAddr().getOffset() for
j_offset which yields the instruction start (same as j_start) instead of the
operand byte offset; replace that return to call the parser walker's
operand-offset accessor (the same accessor used by OffsetInstructionValue) so
operand_offset resolves to the operand’s actual byte offset rather than the
instruction start.

Comment on lines +1226 to +1239

{
hand.space = walker.getCurSpace();
hand.offset_space = (AddrSpace *)0;
hand.offset_offset = walker.getAddr().getOffset(); // Get starting address of instruction
hand.size = hand.space->getAddrSize();
}

void OffsetSymbol::print(ostream &s,ParserWalker &walker) const

{
intb val = (intb) walker.getAddr().getOffset();
s << "0x" << std::hex << val << std::dec;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Operand offset needs operand-relative displacement

OffsetSymbol::getFixedHandle() and print() still return the instruction start (walker.getAddr().getOffset()), so operand_offset collapses to start and never exposes the intended operand displacement. Use the operand-offset value (e.g., via patexp->getValue(walker) or by adding the operand’s relative offset to the instruction base) so the symbol actually reports the current operand’s offset bytes. Right now every use of operand_offset will be wrong.

🤖 Prompt for AI Agents
In Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.cc around lines
1226-1239, both OffsetSymbol::getFixedHandle() and OffsetSymbol::print()
currently use walker.getAddr().getOffset() (instruction start) which causes
operand_offset to equal the instruction base; replace that with the
operand-relative displacement so the symbol reflects the operand offset: in
getFixedHandle() compute the offset by adding the operand's relative offset to
the instruction base (e.g. base = walker.getAddr().getOffset(); rel =
patexp->getValue(walker) or whatever mechanism provides the operand-relative
value; hand.offset_offset = base + rel) and in print() use the same computed
operand offset (not walker.getAddr().getOffset()) when formatting the hex value.

Comment on lines +89 to +100
public void saveXml(PrintStream s) {
s.append("<start_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}

@Override
public void saveXmlHeader(PrintStream s) {
s.append("<start_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Wrong XML tags used for OffsetSymbol serialization.

The saveXml and saveXmlHeader methods emit "start_sym" and "start_sym_head" tags, but this is an OffsetSymbol, not a StartSymbol. This copy-paste error will cause serialization/deserialization failures:

  • SymbolTable.java (line 300-302) expects "offset_sym_head" during restoration
  • The Java implementation (sleigh.symbol OffsetSymbol.java, line 77) uses "offset_sym"
  • getType() at line 49 correctly returns offset_symbol, confirming this should not use start_sym tags

Apply this diff to fix the XML tag names:

 	@Override
 	public void saveXml(PrintStream s) {
-		s.append("<start_sym");
+		s.append("<offset_sym");
 		saveSleighSymbolXmlHeader(s);
 		s.println("/>");
 	}

 	@Override
 	public void saveXmlHeader(PrintStream s) {
-		s.append("<start_sym_head");
+		s.append("<offset_sym_head");
 		saveSleighSymbolXmlHeader(s);
 		s.append("/>\n");
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public void saveXml(PrintStream s) {
s.append("<start_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}
@Override
public void saveXmlHeader(PrintStream s) {
s.append("<start_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}
public void saveXml(PrintStream s) {
s.append("<offset_sym");
saveSleighSymbolXmlHeader(s);
s.println("/>");
}
@Override
public void saveXmlHeader(PrintStream s) {
s.append("<offset_sym_head");
saveSleighSymbolXmlHeader(s);
s.append("/>\n");
}
🤖 Prompt for AI Agents
In
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java
around lines 89 to 100, the XML tag names are incorrect (they use "start_sym"
and "start_sym_head"); change them to "offset_sym" and "offset_sym_head"
respectively so serialization matches SymbolTable and the Java sleigh.symbol
implementation — keep the existing self-closing syntax and newline behavior
(saveXml should emit "<offset_sym .../>" and saveXmlHeader should emit
"<offset_sym_head .../>\n") to ensure correct deserialization.

@visz11
Copy link
Collaborator Author

visz11 commented Oct 14, 2025

/refacto-visz

@refacto-visz
Copy link

refacto-visz bot commented Oct 14, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

Pull Request Feedback 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Token/union consistency
    The scanner now returns OFFSETSYM. Confirm OFFSETSYM is declared in the parser token enum and that the parser value union (sleighlval / SLEIGHSTYPE) contains the offsetsym field of type OffsetSymbol *. If either is missing, the parser/lexer build will fail or produce UB.

  • Lexer integration
    The grammar now recognizes an OFFSETSYM token and provides an OffsetSymbol *offsetsym semantic value, but the lexer must return OFFSETSYM and set yylval.offsetsym. If the lexer isn't updated (or the mapping to an existing OffsetSymbol instance isn't performed), parsing will fail or the symbol semantic value will be null. Also ensure the lexer uses the same identifier (e.g., operand_offset) and the same lifecycle (who allocates/frees) as other symbol tokens.

  • Token registration
    You added %token <offsetsym> OFFSETSYM. Ensure other parts of the build that depend on token enums or parser tables are regenerated and that there are no token-number collisions. Confirm scanner/parser artifacts are updated in the build system.

  • Token registration
    The new token declaration %token <offsetsym> OFFSETSYM was added. Verify all other places that depend on token ordering or token lists (including header files autogenerated by bison/yacc) are updated and regenerated; mismatched or stale generated headers can cause build errors.

  • Type availability / include
    The code constructs an OperandOffsetValue. Ensure the class is declared and included (or forward-declared) where this translation unit builds. Otherwise compilation will fail. Also verify the class implements the expected saveXml/restoreXml behavior for the "offset_exp" element.

  • Class availability / API contract
    The code instantiates OffsetInstructionValue. Confirm that OffsetInstructionValue exists, is included in this module, and implements restoreXml, getValue, and toString. If that class is missing or lacks required methods, this change will cause compile-time errors or runtime failures.

@codeant-ai
Copy link

codeant-ai bot commented Oct 14, 2025

CodeAnt AI finished reviewing your PR.

@refacto-visz
Copy link

refacto-visz bot commented Oct 14, 2025

Code Review: Symbol Integration Enhancement

👍 Well Done
Comprehensive Symbol Integration

Systematic implementation across all language components ensures consistency and maintains architectural patterns.

Cross-Platform Consistency

Proper integration across Java and C++ components maintains system coherence and interoperability.

📁 Selected files for review (26)
  • .gitattributes
  • Ghidra/Features/Decompiler/src/decompile/cpp/pcodeparse.y
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.hh
  • Ghidra/Framework/SoftwareModeling/src/main/antlr/ghidra/sleigh/grammar/SleighCompiler.g
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/assembler/sleigh/SleighAssemblerBuilder.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/OffsetInstructionValue.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/PatternExpression.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/OffsetSymbol.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/symbol/SymbolTable.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/semantics/ConstTpl.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slgh_compile/SleighCompile.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slgh_compile/Yylval.java
  • Ghidra/Features/Decompiler/src/decompile/cpp/semantics.cc
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/OffsetInstructionValue.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghpatexpress/PatternExpression.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/SymbolTable.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/symbol_type.java
  • Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/program/model/lang/PcodeParser.java
  • Ghidra/Features/Decompiler/src/decompile/cpp/semantics.hh
  • Ghidra/Features/Decompiler/src/decompile/cpp/slgh_compile.cc
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghparse.y
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.cc
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghpatexpress.hh
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghscan.l
  • Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.cc
🎯 Custom Instructions
✅ Applied Instructions
Organization Guidelines
  • Avoid commented code in actual codebases
  • Hardcoding of variables

Scope: All files

📝 Additional Comments
Ghidra/Features/Decompiler/src/decompile/cpp/slghsymbol.cc (2)
Missing Null Check

Destructor performs null check before releasing patexp but constructor may leave patexp uninitialized in error conditions. Uninitialized pointer access during destruction could cause segmentation fault. Memory corruption or application crash results from accessing invalid memory addresses.

Standards:

  • ISO-IEC-25010-Reliability-Fault-Tolerance
  • ISO-IEC-25010-Functional-Correctness-Appropriateness
String Concatenation Inefficiency

Stream manipulation with hex formatting creates temporary string objects during address printing. For high-frequency symbol printing operations, this causes allocation overhead. Consider pre-formatting or string buffer optimization for better memory efficiency.

Standards:

  • ISO-IEC-25010-Performance-Efficiency-Time-Behaviour
  • String-Processing-Optimization
Ghidra/Features/Decompiler/src/decompile/cpp/semantics.cc (1)
Magic Number Usage

The j_offset case duplicates the exact logic from j_start case without clear differentiation. This creates potential confusion about semantic differences between offset and start operations. Consider adding documentation or extracting shared logic if behavior is intentionally identical.

Standards:

  • Clean-Code-Comments
  • Clean-Code-Functions
  • Maintainability-Quality-Clarity
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/app/plugin/processors/sleigh/expression/OffsetInstructionValue.java (1)
Inconsistent Naming Convention

Hash constant uses "[operand_offset]" while toString method returns "[operand_offset]" creating naming inconsistency with similar classes that use different bracket conventions. This reduces code consistency and developer comprehension when working across related classes.

Standards:

  • Clean-Code-Naming
  • Clean-Code-Consistency
  • Maintainability-Quality-Uniformity
Ghidra/Framework/SoftwareModeling/src/main/java/ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java (1)
Pattern Expression Caching

Each OffsetSymbol creates individual OffsetInstructionValue instances despite identical behavior. Multiple offset symbols in complex Sleigh specifications cause redundant pattern expression objects. Consider singleton pattern or shared instance pool for identical expression types.

Standards:

  • ISO-IEC-25010-Performance-Efficiency-Resource-Utilization
  • Object-Pooling-Pattern
Ghidra/pcodeCPort/slghsymbol/OffsetSymbol.java (1)
Template Pattern Opportunity

The getVarnode method follows identical pattern to StartSymbol and EndSymbol with only const_type parameter varying. This suggests opportunity for Template Method pattern to reduce duplication and improve maintainability when adding future symbol types.

Standards:

  • Design-Pattern-Template-Method
  • SOLID-OCP
  • Clean-Code-DRY

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants