For GSoC 2026 : FLOSS: Extract Language Specific Strings (.NET, Swift, Zig, ...). by utam-1 · Pull Request #1222 · mandiant/flare-floss

utam-1 · 2026-03-03T03:07:16Z

The code presented here is specifically related to : – extract strings from .NET binaries.
Issue: #718

google-cla · 2026-03-03T03:07:32Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

gemini-code-assist · 2026-03-03T03:07:34Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances FLOSS's capability to analyze .NET executables by introducing a dedicated mechanism for extracting language-specific strings. It focuses on recovering user strings embedded within the CLI metadata's #US heap, which are often overlooked by standard string scanning techniques. This addition provides a more complete picture of string artifacts present in .NET binaries, improving the depth of analysis.

Highlights

.NET String Extraction: Implemented a new module to extract user strings from .NET PE binaries by parsing the CLI metadata #US heap, enabling recovery of strings not found by generic scanners.
FLOSS Integration: Integrated the new .NET string extraction logic into the main FLOSS analysis workflow, allowing FLOSS to process .NET executables more comprehensively.
Global Prevalence Database Utility: Added an example script to generate a global prevalence database from extracted string data, with options to filter by .NET or native string types.
Comprehensive Testing: Included new unit tests to validate the functionality and accuracy of the .NET string extractor, ensuring robust operation.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

example_script/generate_gp_db.py
- Added a script to generate a global prevalence database from extracted string data, supporting filtering by .NET or native types.
floss/language/dotnet/extract.py
- Added a new module for extracting user strings from .NET PE binaries by parsing the CLI metadata #US heap.
- Implemented functions for reading compressed unsigned integers, locating the CLI metadata root, and parsing the #US heap structure.
- Provided a public API extract_dotnet_strings to retrieve StaticString instances from a given .NET PE file.
floss/main.py
- Updated to import and integrate the new .NET string extraction logic into the main analysis flow.
- Modified the .NET language handling to call extract_dotnet_strings when static string analysis is enabled.
- Disabled deobfuscation algorithms (stack, tight, decoded strings) for .NET binaries, noting that they are not meaningful for pure managed assemblies.
tests/test_language_extract_dotnet.py
- Added new unit tests to validate the functionality of the .NET string extractor.
- Verified the extraction of known strings, the location of the metadata root and #US heap, UTF-16LE encoding, minimum length filtering, and overall string count using a sample .NET binary.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a significant new feature for extracting language-specific strings from .NET binaries by parsing the #US heap. However, several security vulnerabilities related to improper handling of user-supplied file paths and file contents have been identified, which could lead to information disclosure, arbitrary file write, and denial of service. Additionally, a potential bug in the string deduplication logic and some minor style issues require attention.

gemini-code-assist · 2026-03-03T03:19:37Z

example_script/generate_gp_db.py

+            with open(file_path, "r", encoding="utf-8") as f:
+                d = json.load(f)


The generate_gp_db function processes a user-supplied path argument, iterating through all files within it and attempting to parse them as JSON. This design allows an attacker to specify sensitive directories or non-JSON files, leading to information disclosure of arbitrary file contents or denial of service through malformed input parsing. This is a critical security flaw as it allows reading of arbitrary files on the system.

gemini-code-assist · 2026-03-03T03:19:37Z

example_script/generate_gp_db.py

+        with gzip.open(args.outfile, "w") as f:
+            for k, v in sorted(gp.metadata_by_string.items(), key=lambda x: x[1][0].global_count, reverse=True):
+                for e in v:
+                    f.write(msgspec.json.encode(e) + b"\n")
+    else:
+        with open(args.outfile, "w", encoding="utf-8") as f:


The script takes a user-supplied outfile argument to determine where the generated database will be written. This allows an attacker to specify arbitrary file paths, including sensitive system files, leading to arbitrary file write. This could result in denial of service by corrupting critical files or even privilege escalation if specific configuration files are overwritten.

gemini-code-assist · 2026-03-03T03:19:37Z

floss/language/dotnet/extract.py

+    p = pathlib.Path(sample)
+    buf = p.read_bytes()


The extract_dotnet_strings function directly reads the entire content of a user-supplied sample file into memory. This poses a significant risk of path traversal, allowing an attacker to read sensitive files from the system. Additionally, if a very large file is provided, it could lead to memory exhaustion and a denial of service. While the floss/main.py caller has a size check, this function itself lacks such a safeguard, making it vulnerable if called directly.

gemini-code-assist · 2026-03-03T03:19:37Z

floss/language/dotnet/extract.py

+        key = (s, file_offset)
+        if key in seen:
+            continue
+        seen.add(key)


This deduplication logic is a no-op. The file_offset is unique for each string yielded by iter_dotnet_user_strings, so the key (s, file_offset) will always be unique, and the if key in seen: condition will never be true.

If the intention is to report every occurrence of a string, this block (and the seen set initialization on line 316) should be removed for clarity. If the intention is to deduplicate by string value, the logic should be changed to use only s as the key.

gemini-code-assist · 2026-03-03T03:19:37Z

floss/language/dotnet/extract.py

+    buf = p.read_bytes()
+
+    try:
+        pe = pefile.PE(data=buf, fast_load=True)


The extract_dotnet_strings function passes the raw byte content of a user-supplied file to the pefile.PE constructor. Parsing complex, untrusted file formats like PE files can expose the application to vulnerabilities within the parsing library. A specially crafted, malformed PE file could cause pefile to consume excessive CPU or memory, leading to resource exhaustion and a denial of service for the script. Although PEFormatError is caught, other internal issues or resource exhaustion might occur.

gemini-code-assist · 2026-03-03T03:19:37Z

floss/language/dotnet/extract.py

+        try:
+            s = raw.decode("utf-16-le")
+        except UnicodeDecodeError:
+            pass


Silently ignoring UnicodeDecodeError can hide issues with the string extraction logic or malformed input files. It is better to log these errors at a debug level to aid in future debugging.

Suggested change

pass

logger.debug("unicode decode error in string entry at offset 0x%x", entry_file_offset)

pass

gemini-code-assist · 2026-03-03T03:19:38Z

tests/test_language_extract_dotnet.py

+
+def test_metadata_root_found():
+    """The BSJB metadata root must be locatable in the test binary."""
+    import pefile


According to PEP 8, imports should be at the top of the file, not inside functions. This import pefile statement (and the one on line 105) should be moved to the top of the file with the other imports.

For GSoC 2026

ab1b9bd

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For GSoC 2026 : FLOSS: Extract Language Specific Strings (.NET, Swift, Zig, ...).#1222

For GSoC 2026 : FLOSS: Extract Language Specific Strings (.NET, Swift, Zig, ...).#1222
utam-1 wants to merge 1 commit intomandiant:masterfrom
utam-1:master

utam-1 commented Mar 3, 2026

Uh oh!

google-cla bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		with open(file_path, "r", encoding="utf-8") as f:
		d = json.load(f)

	pass
	logger.debug("unicode decode error in string entry at offset 0x%x", entry_file_offset)
	pass

Conversation

utam-1 commented Mar 3, 2026

Uh oh!

google-cla bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant