Skip to content

Check for python content was too broad#2949

Open
rkoumis wants to merge 3 commits intokevoreilly:masterfrom
rkoumis:python-check-is-too-broad
Open

Check for python content was too broad#2949
rkoumis wants to merge 3 commits intokevoreilly:masterfrom
rkoumis:python-check-is-too-broad

Conversation

@rkoumis
Copy link
Contributor

@rkoumis rkoumis commented Mar 18, 2026

  • too many files can contain 'import', for example .eml files
  • reverts part of baa0f5f

- too many files can contain 'import', for example .eml files
- reverts part of baa0f5f
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue where the file type detection mechanism was incorrectly classifying various files as Python scripts due to an overly broad check for the presence of the 'import' keyword. By removing this generic content check, the system will now rely on more specific indicators like file extensions (.py, .pyc) or explicit 'Python script' type identification, leading to more accurate file categorization within the analyzer.

Highlights

  • Python File Detection Refinement: The logic for identifying Python files in the choose_package function was updated to remove the overly broad check for the byte string 'import' within file content.
  • Reversion of Overly Broad Check: This change partially reverts a previous commit (baa0f5f) that introduced the generic 'import' content check, which was causing incorrect file classifications.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an issue where the check for Python content was too broad, leading to incorrect file type classifications. The change removes the problematic b"import" in file_content condition, which is a valid fix. I am suggesting an improvement to this change by replacing the removed condition with a more specific content-based check for a shebang line. This would allow for the detection of Python scripts that lack a standard file extension, but in a more reliable manner that avoids the original issue of false positives.

)

if sys.platform == "win32":
from ctypes import windll
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to run some of these windows tests under linux, which isn't really a thing, but this helped (a bit)

elif "Macromedia Flash" in file_type or file_name.endswith((".swf", ".fws")):
return "swf"
elif file_name.endswith((".py", ".pyc")) or "Python script" in file_type:
elif file_name.endswith((".py", ".pyc")) or "Python script" in file_type or (file_content.startswith(b'#!/') and b'python' in file_content.split(b'\n', 1)[0]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by copilot


How's it going?

Did you import that file I sent you last week?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made sure to include the word "import" in this email

- slight tweaks to make it easier to run the tests under Linux
@rkoumis rkoumis force-pushed the python-check-is-too-broad branch from 4ad0682 to 8e9520f Compare March 18, 2026 20:45
@josh-feather
Copy link
Contributor

@doomedraven FYI, this was causing problems for users who were submitting .eml files and relying on the package autodetection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants