Add Parquet export support to OtelTracesSqlEngine #43

nishanthp · 2026-02-08T06:30:17Z

Summary

Adds to_parquet() method to OtelTracesSqlEngine for exporting trace data in Apache Parquet format, enabling efficient storage and high-performance analytics.

Motivation

Currently, trace data can only be exported to SQL databases or kept in memory as pandas DataFrames. For long-term storage, archival, and sharing trace datasets, a performant columnar file format is needed.

Changes

Added to_parquet() method with:
- Configurable compression algorithms (snappy, gzip, brotli, lz4, zstd)
- Optional partitioning support for efficient filtering
- Automatic date column extraction for time-based partitioning
Comprehensive test suite with 10+ test cases covering:
- Basic export functionality
- Multiple compression algorithms
- Partitioning by service and date
- Data integrity validation
- Edge cases (empty dataframes, file size efficiency)

AstraBert

The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user

AstraBert · 2026-02-08T08:35:27Z

src/notebookllama/instrumentation.py

+        # Add date column for partitioning if needed
+        if partition_cols and "date" in partition_cols:
+            df["date"] = pd.to_datetime(df["start_time"], unit="us").dt.date


Why is adding the "date" column needed? Can't we just convert the start_time one to datetime?

Plus, there is no validation of the partition columns, meaning that they could include also columns that are not in the dataframe

AstraBert · 2026-02-08T08:37:34Z

tests/test_instrumentation.py

@@ -0,0 +1,32 @@
+import pandas as pd


The PR description says there are 10+ test cases, but here I only see one: is there another test file you did not commit?

Yeah, it will be in the next patch.

nishanthp · 2026-02-09T21:24:13Z

The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user

I wanted to get your opinion on the approach before I could add it to the Streamlit application.

If the overall approach looks good, I will add the rest of the changes in the next patch.

nishanthp added 2 commits February 7, 2026 15:02

Add support parquet

e38a392

Add relavent test cases

ce5371b

AstraBert reviewed Feb 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parquet export support to OtelTracesSqlEngine #43

Add Parquet export support to OtelTracesSqlEngine #43

nishanthp commented Feb 8, 2026

Uh oh!

AstraBert left a comment

Uh oh!

AstraBert Feb 8, 2026

Uh oh!

AstraBert Feb 8, 2026

Uh oh!

AstraBert Feb 8, 2026

Uh oh!

nishanthp Feb 9, 2026

Uh oh!

nishanthp commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Parquet export support to OtelTracesSqlEngine #43

Are you sure you want to change the base?

Add Parquet export support to OtelTracesSqlEngine #43

Conversation

nishanthp commented Feb 8, 2026

Summary

Motivation

Changes

Uh oh!

AstraBert left a comment

Choose a reason for hiding this comment

Uh oh!

AstraBert Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

AstraBert Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

AstraBert Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

nishanthp Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

nishanthp commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants