Fixed bug where nested json inside pandas wouldn't be ingested correctly #568

AsafMah · 2025-02-18T07:59:02Z

solves #567

github-actions · 2025-02-18T08:02:30Z

Test Results

6 files ±0 6 suites ±0 25m 38s ⏱️ -34s
314 tests ±0 279 ✅ ±0 35 💤 ±0 0 ❌ ±0
1 884 runs ±0 1 674 ✅ ±0 210 💤 ±0 0 ❌ ±0

Results for commit 18d85ce. ± Comparison against base commit c2e31d3.

♻️ This comment has been updated with latest results.

yogilad · 2025-02-25T15:56:58Z

azure-kusto-ingest/azure/kusto/ingest/base_ingest_client.py

        https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-overview#ingestion-methods
        :param pandas.DataFrame df: input dataframe to ingest.
        :param azure.kusto.ingest.IngestionProperties ingestion_properties: Ingestion properties.
+        :param DataFormat data_format: Format to convert the dataframe to. If not specified, it will try to infer it from the mapping, if not found, it will default to JSON.


Please specificy the valid options are None, CSV and JSON

yogilad · 2025-02-25T15:59:44Z

azure-kusto-ingest/azure/kusto/ingest/base_ingest_client.py

+        # If we are given CSV mapping, or the mapping format is explicitly set to CSV, we should use CSV
+        if not data_format:
+            if ingestion_properties is not None and (ingestion_properties.ingestion_mapping_type == DataFormat.CSV):
+                is_json = False


You should check explictly for json mapping and throw if not CSV or JSON or None
https://learn.microsoft.com/en-us/kusto/management/mappings?view=microsoft-fabric#supported-mapping-types

ViaFerrata · 2025-03-14T15:12:15Z

Could it be that this has created a bug @AsafMah (tested with latest pypi release)?
I think the data format is not inferred correctly from the IngestionProperties now.

My Pandas ingestion from before the patch looked like this:

ingestion_props = IngestionProperties(database=kusto_db, table=table_name, data_format=DataFormat.CSV, ignore_first_record=True, report_level=ReportLevel.FailuresAndSuccesses)
result_pd = client_qi.ingest_from_dataframe(df_csv, ingestion_properties=ingestion_props)
print(result_pd)

Output (before patch):

IngestionResult(status=IngestionStatus.QUEUED, database=log, table=ssh_log_data, source_id=xxx, obfuscated_blob_uri=https://xxx/yyy.csv.gz)

With the same code after the patch, json is chosen even though csv is explicitely selected within the IngestionProperties.
Output after patch:

IngestionResult(status=IngestionStatus.QUEUED, database=log, table=ssh_log_data, source_id=xxx, obfuscated_blob_uri=https://xxxx/yyy.json.gz)

Only noticed this, because the ingestion now fails since json and ignore_first_record=True is not compatible.

yogilad · 2025-03-16T10:24:35Z

@ViaFerrata ,

IIUC, our impl. of the Pandas ingestion does not generate a header record.
It sounds like you were losing the first record of each ingestion.
Can you confirm?

@AsafMah , we should set this value to False in all ingestion paths (JSON and CSV)

ViaFerrata · 2025-03-16T11:34:04Z

@ViaFerrata ,

IIUC, our impl. of the Pandas ingestion does not generate a header record. It sounds like you were losing the first record of each ingestion. Can you confirm?

Looking at the source code (don't have access to my code right now), I think so, yes (due to header=False):

azure-kusto-python/azure-kusto-ingest/azure/kusto/ingest/base_ingest_client.py

Line 141 in 18d85ce

df.to_csv(temp_file_path, index=False, encoding="utf-8", header=False)

I remember that I was fiddling around with the various ingestion options and reused the same IngestionProperties due to being lazy. Wanted to use the proper IngestionProperties for the pandas ingestion in the end, but seems like I forgot, thanks!

And regarding the data format, that is as intended, right (digged a bit into the source code now)?
-> data_format in IngestionProperties is not related to the json/csv handover to ingest_from_file? Only the ingestion_mapping_kind plays a role.

Was just a bit confused since a month ago above code would handover csv files to ingest_from_file instead of json (according to the blob_uri).

EDIT: Release notes make it kind of clear: "By default, the data will be serialized as json to avoid this issue."

Fixed bug where nested json inside pandas wouldn't be ingested correctly

1ef6c6d

AsafMah requested a review from ohadbitt February 18, 2025 07:59

AsafMah had a problem deploying to build February 18, 2025 07:59 — with GitHub Actions Failure

Update changelog

9a7f33b

AsafMah had a problem deploying to build February 18, 2025 08:00 — with GitHub Actions Failure

Format

ccd04ee

AsafMah had a problem deploying to build February 18, 2025 08:07 — with GitHub Actions Failure

Fixed test

a51d4d1

AsafMah had a problem deploying to build February 18, 2025 08:20 — with GitHub Actions Failure

AsafMah had a problem deploying to build February 25, 2025 10:41 — with GitHub Actions Failure

New logic

7bf9b98

AsafMah had a problem deploying to build February 25, 2025 10:44 — with GitHub Actions Failure

New logic

60fc0f2

AsafMah temporarily deployed to build February 25, 2025 11:44 — with GitHub Actions Inactive

yogilad previously approved these changes Feb 25, 2025

View reviewed changes

Update

18d85ce

AsafMah dismissed yogilad’s stale review via 18d85ce February 27, 2025 06:44

AsafMah temporarily deployed to build February 27, 2025 06:44 — with GitHub Actions Inactive

AsafMah merged commit 3ca2f4b into master Feb 27, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed bug where nested json inside pandas wouldn't be ingested correctly #568

Fixed bug where nested json inside pandas wouldn't be ingested correctly #568

Uh oh!

AsafMah commented Feb 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2025 •

edited

Loading

Uh oh!

yogilad Feb 25, 2025

Uh oh!

yogilad Feb 25, 2025

Uh oh!

Uh oh!

ViaFerrata commented Mar 14, 2025

Uh oh!

yogilad commented Mar 16, 2025

Uh oh!

ViaFerrata commented Mar 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fixed bug where nested json inside pandas wouldn't be ingested correctly #568

Fixed bug where nested json inside pandas wouldn't be ingested correctly #568

Uh oh!

Conversation

AsafMah commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

yogilad Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

yogilad Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ViaFerrata commented Mar 14, 2025

Uh oh!

yogilad commented Mar 16, 2025

Uh oh!

ViaFerrata commented Mar 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AsafMah commented Feb 18, 2025 •

edited

Loading

github-actions bot commented Feb 18, 2025 •

edited

Loading

ViaFerrata commented Mar 16, 2025 •

edited

Loading