Skip to content

Conversation

@inv-jishnu
Copy link
Contributor

Description

This PR updates the column creation logic in createColumnFromValue() in ColumnUtils class to correctly interpret the string "null" as a null value for non-TEXT column types.

Previously, when CSV files contained null as a value it was parsed as "null" which caused the initial null check (value != null) and which caused a different behaviour when same value (null) is passed via JSON or JSONLines file type

This caused CSV-based imports to fail even though JSON and JSON Lines imports handled null values correctly.

The fix ensures consistent behavior across all file types by converting null to actual null values for all column types except TEXT when provided via a CSV file.

Related issues and/or PRs

NA

Changes made

  • Updated createColumnFromValue() method in ColumnUtils class and added a pre-check:

  • Added Unit Tests in ColumnUtilsTest

Checklist

The following is a best-effort checklist. If any items in this checklist are not applicable to this PR or are dependent on other, unmerged PRs, please still mark the checkboxes after you have read and understood each item.

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes.
  • I have considered whether similar issues could occur in other products, components, or modules if this PR is for bug fixes.
  • Any remaining open issues linked to this PR are documented and up-to-date (Jira, GitHub, etc.).
  • Tests (unit, integration, etc.) have been added for the changes.
  • My changes generate no new warnings.
  • Any dependent changes in other PRs have been merged and published.

Additional notes (optional)

NA

Release notes

Handle null as null value for non-TEXT columns in CSV import

@inv-jishnu inv-jishnu self-assigned this Nov 13, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @inv-jishnu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request standardizes the handling of "null" values across different data import formats. Previously, CSV imports treated the string "null" as a literal for non-TEXT columns, causing inconsistencies and potential failures compared to JSON and JSONLines. The changes modify the createColumnFromValue method to correctly convert the string "null" into a true null value for all non-TEXT data types, ensuring uniform data interpretation and improving the robustness of data loading processes.

Highlights

  • Consistent Null Handling: The pull request introduces a change to correctly interpret the string "null" as an actual null value for non-TEXT column types during CSV imports, aligning its behavior with JSON and JSONLines imports.
  • CSV Import Reliability: This fix resolves issues where CSV-based imports would fail due to the string "null" being incorrectly parsed as a literal string instead of a null value for non-TEXT fields.
  • Enhanced Test Coverage: New unit tests have been added to ColumnUtilsTest to ensure the correct handling of "null" strings for various data types, including specific tests for non-TEXT types being converted to null and TEXT types retaining "null" as a literal string.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly handles string "null" values from CSV imports as null for non-TEXT columns, aligning behavior with other file formats. The implementation is sound, and new unit tests validate the change. My main feedback is to make the "null" string check case-insensitive (e.g., to handle "NULL" and "Null") for better robustness, and to update the corresponding unit tests to cover these cases. This will make the feature more user-friendly.

Comment on lines +86 to +88
if (value != null && !dataType.equals(DataType.TEXT) && value.equals("null")) {
value = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation handles only the lowercase string "null". It would be more robust to handle different cases, such as "NULL" or "Null", as users might provide null values in various formats in CSV files. Using equalsIgnoreCase would make the null handling more flexible and user-friendly.

Suggested change
if (value != null && !dataType.equals(DataType.TEXT) && value.equals("null")) {
value = null;
}
if (value != null && !dataType.equals(DataType.TEXT) && value.equalsIgnoreCase("null")) {
value = null;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ypeckstadt @thongdk8 Should I consider this change as null and NULL is handled differently in JSON and JSONLines file format?

Comment on lines +232 to +266
/**
* Tests that the string "null" (lowercase) is correctly treated as null for numeric, boolean, and
* date/time types.
*/
@Test
void createColumnFromValue_valueIsLowercaseNull_shouldReturnNullColumn()
throws ColumnParsingException {
String columnName = "testColumn";
ColumnInfo columnInfo = ColumnInfo.builder().columnName(columnName).build();

// Integer type
Column<?> intColumn = ColumnUtils.createColumnFromValue(DataType.INT, columnInfo, "null");
assertEquals(IntColumn.ofNull(columnName), intColumn);

// Double type
Column<?> doubleColumn = ColumnUtils.createColumnFromValue(DataType.DOUBLE, columnInfo, "null");
assertEquals(DoubleColumn.ofNull(columnName), doubleColumn);

// Boolean type
Column<?> boolColumn = ColumnUtils.createColumnFromValue(DataType.BOOLEAN, columnInfo, "null");
assertEquals(BooleanColumn.ofNull(columnName), boolColumn);

// Date type
Column<?> dateColumn = ColumnUtils.createColumnFromValue(DataType.DATE, columnInfo, "null");
assertEquals(DateColumn.ofNull(columnName), dateColumn);

// Time type
Column<?> timeColumn = ColumnUtils.createColumnFromValue(DataType.TIME, columnInfo, "null");
assertEquals(TimeColumn.ofNull(columnName), timeColumn);

// Timestamp type
Column<?> timestampColumn =
ColumnUtils.createColumnFromValue(DataType.TIMESTAMP, columnInfo, "null");
assertEquals(TimestampColumn.ofNull(columnName), timestampColumn);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To accompany the suggested change for case-insensitive null checking in ColumnUtils, this test should also be updated to verify that behavior. This can be done by parameterizing the test to run with different variations of "null" like "NULL" and "Null". This avoids code duplication and makes the test more robust.

You will need to add a MethodSource provider method like the following in the test class:

private static Stream<String> provideNullStrings() {
  return Stream.of("null", "NULL", "Null");
}
  /**
   * Tests that the string "null" (case-insensitive) is correctly treated as null for numeric,
   * boolean, and date/time types.
   */
  @org.junit.jupiter.params.ParameterizedTest
  @org.junit.jupiter.params.provider.MethodSource("provideNullStrings")
  void createColumnFromValue_valueIsNullString_shouldReturnNullColumn(String nullValue)
      throws ColumnParsingException {
    String columnName = "testColumn";
    ColumnInfo columnInfo = ColumnInfo.builder().columnName(columnName).build();

    // Integer type
    Column<?> intColumn = ColumnUtils.createColumnFromValue(DataType.INT, columnInfo, nullValue);
    assertEquals(IntColumn.ofNull(columnName), intColumn);

    // Double type
    Column<?> doubleColumn = ColumnUtils.createColumnFromValue(DataType.DOUBLE, columnInfo, nullValue);
    assertEquals(DoubleColumn.ofNull(columnName), doubleColumn);

    // Boolean type
    Column<?> boolColumn = ColumnUtils.createColumnFromValue(DataType.BOOLEAN, columnInfo, nullValue);
    assertEquals(BooleanColumn.ofNull(columnName), boolColumn);

    // Date type
    Column<?> dateColumn = ColumnUtils.createColumnFromValue(DataType.DATE, columnInfo, nullValue);
    assertEquals(DateColumn.ofNull(columnName), dateColumn);

    // Time type
    Column<?> timeColumn = ColumnUtils.createColumnFromValue(DataType.TIME, columnInfo, nullValue);
    assertEquals(TimeColumn.ofNull(columnName), timeColumn);

    // Timestamp type
    Column<?> timestampColumn =
        ColumnUtils.createColumnFromValue(DataType.TIMESTAMP, columnInfo, nullValue);
    assertEquals(TimestampColumn.ofNull(columnName), timestampColumn);
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant