Skip to content

[Fixes #13936] Support for XLSX File Uploads in GeoNode#13937

Open
Gpetrak wants to merge 18 commits intomasterfrom
ISSUE_13936
Open

[Fixes #13936] Support for XLSX File Uploads in GeoNode#13937
Gpetrak wants to merge 18 commits intomasterfrom
ISSUE_13936

Conversation

@Gpetrak
Copy link
Member

@Gpetrak Gpetrak commented Feb 5, 2026

This PR was created accordiding to this issue: #13936

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

  • Confirm you have read the contribution guidelines
  • You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
  • Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

  • There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
  • The issue connected to the PR must have Labels and Milestone assigned
  • PR for bug fixes and small new features are presented as a single commit
  • PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
  • New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

@Gpetrak Gpetrak added this to the 5.1.0 milestone Feb 5, 2026
@Gpetrak Gpetrak added feature A new feature to be added to the codebase enhancement labels Feb 5, 2026
@cla-bot cla-bot bot added the cla-signed CLA Bot: community license agreement signed label Feb 5, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Gpetrak, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances GeoNode's data upload capabilities by introducing native support for XLSX and XLS file formats. It provides a robust and secure mechanism for ingesting tabular geographic data from Excel spreadsheets, ensuring data integrity through strict validation of headers and coordinates, and integrating seamlessly with the existing upload orchestration. The feature is also made configurable via an environment variable, offering flexibility for deployment.

Highlights

  • XLSX/XLS File Upload Support: Introduced a new XLSXFileHandler to enable the upload of both .xlsx (OpenXML) and .xls (Binary/Legacy) Excel files, expanding GeoNode's data ingestion capabilities.
  • Internal Conversion to CSV: The handler internally converts Excel files to CSV format using the python-calamine library, leveraging existing CSV processing logic while ensuring robust parsing of Excel data.
  • Strict Data Validation: Implemented comprehensive validation checks including sheet validation (only the first sheet is processed), header validation (requiring unique, non-empty Latitude and Longitude columns), and high-speed coordinate data sense-checking to prevent invalid or malformed geographic data.
  • Configurable Feature Toggle: Added an XLSX_UPLOAD_ENABLED environment variable across all .env files, allowing administrators to easily enable or disable the XLSX/XLS upload functionality.
  • Enhanced Security and Testing: Included a dedicated security test to protect against XML Entity Expansion (Billion Laughs) attacks, alongside extensive unit tests covering various scenarios for file handling, validation, and error conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .env.sample
    • Added XLSX_UPLOAD_ENABLED=True to enable or disable XLSX/XLS upload functionality.
  • .env_dev
    • Added XLSX_UPLOAD_ENABLED=True for development environments.
  • .env_local
    • Added XLSX_UPLOAD_ENABLED=True for local development environments.
  • .env_test
    • Added XLSX_UPLOAD_ENABLED=True for test environments.
  • geonode/upload/handlers/xlsx/handler.py
    • New file: Implements XLSXFileHandler for processing Excel files.
    • Inherits from CSVFileHandler and converts XLSX/XLS to CSV internally.
    • Includes methods for can_handle, is_valid, create_ogr2ogr_command, create_dynamic_model_fields, and pre_processing.
    • Adds strict validation for sheet presence, unique and non-empty headers, and valid Latitude/Longitude columns.
    • Incorporates _data_sense_check for robust coordinate validation (numeric, finite, magnitude).
  • geonode/upload/handlers/xlsx/tests.py
    • New file: Contains unit tests for the XLSXFileHandler.
    • Tests can_handle for both .xlsx and .xls extensions.
    • Includes tests for successful pre_processing and failure cases (missing lat/lon, wrong data types).
    • Features a security test (test_security_billion_laughs_protection) to prevent XML entity expansion attacks.
  • geonode/upload/settings.py
    • Registered geonode.upload.handlers.xlsx.handler.XLSXFileHandler in RESOURCE_HANDLERS to activate the new handler.
  • geonode/upload/tests/fixture/wrong_data.csv
    • New file: Added a CSV fixture for testing error conditions related to data validation.
  • pyproject.toml
    • Added python-calamine==0.6.1 as a new dependency for parsing Excel files.
Activity
  • The pull request was created by Gpetrak to address issue Support for XLSX File Uploads in GeoNode #13936, aiming to add support for XLSX file uploads.
  • Initial code changes include the addition of a new file handler for XLSX/XLS, corresponding unit tests, and configuration updates.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.


except Exception as e:
logger.exception("XLSX Pre-processing failed")
raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 3 days ago

In general, to fix this class of problem you should avoid returning raw exception messages or stack traces to the client. Instead, log the full exception (stack trace and message) on the server for debugging, and send a generic, non-sensitive error message back to the user. This preserves observability for developers while preventing attackers from learning about internal structure or configuration.

For this specific code, the best fix without changing functionality is to keep the logger.exception("XLSX Pre-processing failed") call (so the complete exception is still captured in the logs), but change the InvalidInputFileException detail message to a constant, generic string that does not interpolate e. The location to change is in geonode/upload/handlers/xlsx/handler.py, within the XLSXFileHandler.pre_processing method, lines 211–213. We only need to modify line 213 to remove str(e) from the error message, e.g. raise InvalidInputFileException(detail="Failed to securely parse Excel file."). No extra imports or new methods are required, since we are only changing a literal string.

Suggested changeset 1
geonode/upload/handlers/xlsx/handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/geonode/upload/handlers/xlsx/handler.py b/geonode/upload/handlers/xlsx/handler.py
--- a/geonode/upload/handlers/xlsx/handler.py
+++ b/geonode/upload/handlers/xlsx/handler.py
@@ -210,7 +210,7 @@
 
         except Exception as e:
             logger.exception("XLSX Pre-processing failed")
-            raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")
+            raise InvalidInputFileException(detail="Failed to securely parse the uploaded Excel file.")
 
         # update the file path in the payload
         _data["files"]["base_file"] = output_file
EOF
@@ -210,7 +210,7 @@

except Exception as e:
logger.exception("XLSX Pre-processing failed")
raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")
raise InvalidInputFileException(detail="Failed to securely parse the uploaded Excel file.")

# update the file path in the payload
_data["files"]["base_file"] = output_file
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for uploading XLSX and XLS files by converting them to CSV during a pre-processing step and then utilizing the existing CSV handler pipeline. While the implementation includes some security considerations, a critical command injection vulnerability was identified in the ogr2ogr command construction and execution flow. This vulnerability could allow an authenticated attacker to achieve remote code execution by uploading a specially crafted XLSX file, and remediation is required to ensure all user-supplied data is properly sanitized before being used in shell commands. Furthermore, a critical issue was found in the is_valid method that incorrectly attempts to validate an XLSX file using a CSV driver, which would block all uploads of this type. There are also several medium-severity recommendations to improve error handling by using more specific exception types instead of generic ones, which will enhance maintainability and provide clearer feedback to users.

@Gpetrak Gpetrak linked an issue Feb 5, 2026 that may be closed by this pull request
8 tasks
@Gpetrak Gpetrak marked this pull request as ready for review February 5, 2026 11:20
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 77.23214% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (6a35d7e) to head (bdbd200).
⚠️ Report is 13 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13937      +/-   ##
==========================================
+ Coverage   74.19%   74.25%   +0.05%     
==========================================
  Files         944      949       +5     
  Lines       56468    56841     +373     
  Branches     7651     7722      +71     
==========================================
+ Hits        41899    42209     +310     
- Misses      12885    12928      +43     
- Partials     1684     1704      +20     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed CLA Bot: community license agreement signed enhancement feature A new feature to be added to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for XLSX File Uploads in GeoNode

2 participants