-
Notifications
You must be signed in to change notification settings - Fork 231
Add Parquet export support to OtelTracesSqlEngine #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
AstraBert
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user
| # Add date column for partitioning if needed | ||
| if partition_cols and "date" in partition_cols: | ||
| df["date"] = pd.to_datetime(df["start_time"], unit="us").dt.date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is adding the "date" column needed? Can't we just convert the start_time one to datetime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus, there is no validation of the partition columns, meaning that they could include also columns that are not in the dataframe
| @@ -0,0 +1,32 @@ | |||
| import pandas as pd | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description says there are 10+ test cases, but here I only see one: is there another test file you did not commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it will be in the next patch.
I wanted to get your opinion on the approach before I could add it to the Streamlit application. If the overall approach looks good, I will add the rest of the changes in the next patch. |
Summary
Adds
to_parquet()method toOtelTracesSqlEnginefor exporting trace data in Apache Parquet format, enabling efficient storage and high-performance analytics.Motivation
Currently, trace data can only be exported to SQL databases or kept in memory as pandas DataFrames. For long-term storage, archival, and sharing trace datasets, a performant columnar file format is needed.
Changes
to_parquet()method with: