-
Notifications
You must be signed in to change notification settings - Fork 110
Fixed bug where nested json inside pandas wouldn't be ingested correctly #568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| https://docs.microsoft.com/en-us/azure/data-explorer/ingest-data-overview#ingestion-methods | ||
| :param pandas.DataFrame df: input dataframe to ingest. | ||
| :param azure.kusto.ingest.IngestionProperties ingestion_properties: Ingestion properties. | ||
| :param DataFormat data_format: Format to convert the dataframe to. If not specified, it will try to infer it from the mapping, if not found, it will default to JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please specificy the valid options are None, CSV and JSON
| # If we are given CSV mapping, or the mapping format is explicitly set to CSV, we should use CSV | ||
| if not data_format: | ||
| if ingestion_properties is not None and (ingestion_properties.ingestion_mapping_type == DataFormat.CSV): | ||
| is_json = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check explictly for json mapping and throw if not CSV or JSON or None
https://learn.microsoft.com/en-us/kusto/management/mappings?view=microsoft-fabric#supported-mapping-types
|
Could it be that this has created a bug @AsafMah (tested with latest pypi release)? My Pandas ingestion from before the patch looked like this: ingestion_props = IngestionProperties(database=kusto_db, table=table_name, data_format=DataFormat.CSV, ignore_first_record=True, report_level=ReportLevel.FailuresAndSuccesses)
result_pd = client_qi.ingest_from_dataframe(df_csv, ingestion_properties=ingestion_props)
print(result_pd)Output (before patch): With the same code after the patch, json is chosen even though csv is explicitely selected within the IngestionProperties. Only noticed this, because the ingestion now fails since |
|
IIUC, our impl. of the Pandas ingestion does not generate a header record. @AsafMah , we should set this value to False in all ingestion paths (JSON and CSV) |
Looking at the source code (don't have access to my code right now), I think so, yes (due to
I remember that I was fiddling around with the various ingestion options and reused the same IngestionProperties due to being lazy. Wanted to use the proper IngestionProperties for the pandas ingestion in the end, but seems like I forgot, thanks! And regarding the data format, that is as intended, right (digged a bit into the source code now)? Was just a bit confused since a month ago above code would handover csv files to EDIT: Release notes make it kind of clear: "By default, the data will be serialized as json to avoid this issue." |
solves #567