Skip to content

Conversation

BlakeOrth
Copy link
Contributor

Which issue does this PR close?

This does not fully close, but is an incremental building block component for:

The full context of how this code is likely to progress can be seen in the POC for this effort:

Rationale for this change

This change implements actual instrumentation for individual GET requests through an InstrumentedObjectStore and renders that information for users when profiling has been enabled.

What changes are included in this PR?

  • Adds request instrumentation to GET operations when using the InstrumentedObjectStore in datafusion-cli
  • Adds RequestDetails type to store and format details about instrumented requests
  • Adds tests for new functionality

Are these changes tested?

Yes, aside from the specific functionality in PrintOptions, however all methods executed there are tested via unit tests and functional test output can be seen below:

DataFusion CLI v50.1.0
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.208 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-10T23:21:49.102730971+00:00 operation=Get duration=0.022583s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-10T23:21:49.125493437+00:00 operation=Get duration=0.024233s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet

> \object_store_profiling disabled
ObjectStore Profile mode set to Disabled
> CREATE EXTERNAL TABLE hits2
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_2.parquet';
0 row(s) fetched.
Elapsed 0.274 seconds.

>

Are there any user-facing changes?

Yes? Clearly there is new user-facing output generated, however there are no new user options and/or breaking changes introduced in this PR.

cc @alamb

 - Adds request instrumentation to GET operations when using the
   InstrumentedObjectStore in datafusion-cli
 - Adds RequestDetails type to store and format details about
   instrumented requests
 - Adds tests for new functionality
@BlakeOrth BlakeOrth force-pushed the feature/cli_instrument_request branch from 7ff1403 to 855ab17 Compare October 10, 2025 23:34
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BlakeOrth -- this is great.

I had one a small test suggestion, but I think we could also do it as a follow on PR

I also took the liberty of writing a cli integration test and pushing it to your branch to show how this was all connected up

}

async fn get_opts(&self, location: &Path, options: GetOptions) -> Result<GetResult> {
if self.instrument_mode.load(Ordering::Relaxed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make the code cleaner to move the load ordering into its own method -- like

Something like this maybe

impl InstrumentedObjectStore {
  /// return the current mode of this instrumented store
  fn enabled(&self) -> bool {
    self.instrument_mode.load(Ordering::Relaxed) != InstrumentedObjectStoreMode::Disabled
  }
}


let requests = instrumented.take_requests();
assert_eq!(requests.len(), 1);
assert!(instrumented.requests.lock().is_empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good here to assert on the contents of the request too -- to make sure that all fields got captured

For example:

assert_eq!(requests[0].op, Operation::Get);
// check other fields, etc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3c3e0ef


Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: AmazonS3(data)
<TIMESTAMP> operation=Get duration=[DURATION] size=1006 path=cars.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty cool to see

@alamb
Copy link
Contributor

alamb commented Oct 11, 2025

BTW this is really great -- you can really see the effect of the metadata cache from @nuno-faria now:

cargo run --bin datafusion-cli
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/datafusion-cli`
DataFusion CLI v50.1.0
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.342 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:15:30.713572+00:00 operation=Get duration=0.027873s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:30.741797+00:00 operation=Get duration=0.038220s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet

> select "Title", COUNT(*) FROM hits GROUP BY "Title" ORDER BY "Title";
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Title
...
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:15:35.467659+00:00 operation=Get duration=0.035913s size=21561876 range: bytes=82420716-103982591 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:35.467642+00:00 operation=Get duration=0.088730s size=255995 range: bytes=631794-887788 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:35.750944+00:00 operation=Get duration=0.024182s size=539555 range: bytes=11174906-11714460 path=hits_compatible/athena_partitioned/hits_1.parquet

But then disable the cache and you see many more requests

> set datafusion.runtime.metadata_cache_limit = '0M';
0 row(s) fetched.
Elapsed 0.005 seconds.


Object Store Profiling
> select "Title", COUNT(*) FROM hits GROUP BY "Title" ORDER BY "Title";
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Title
...
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:16:23.327461+00:00 operation=Get duration=0.029055s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327451+00:00 operation=Get duration=0.034038s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327506+00:00 operation=Get duration=0.065108s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327588+00:00 operation=Get duration=0.076740s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327754+00:00 operation=Get duration=0.100190s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328033+00:00 operation=Get duration=0.107667s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327736+00:00 operation=Get duration=0.108225s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327998+00:00 operation=Get duration=0.110686s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328019+00:00 operation=Get duration=0.110924s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327818+00:00 operation=Get duration=0.112115s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327855+00:00 operation=Get duration=0.115440s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.404359+00:00 operation=Get duration=0.044957s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327875+00:00 operation=Get duration=0.126352s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.361519+00:00 operation=Get duration=0.094653s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.427970+00:00 operation=Get duration=0.028384s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.392641+00:00 operation=Get duration=0.071319s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327762+00:00 operation=Get duration=0.136210s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327890+00:00 operation=Get duration=0.136089s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327823+00:00 operation=Get duration=0.141164s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.435977+00:00 operation=Get duration=0.036920s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328052+00:00 operation=Get duration=0.150241s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.438696+00:00 operation=Get duration=0.040188s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.438959+00:00 operation=Get duration=0.043520s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.435718+00:00 operation=Get duration=0.047369s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.443307+00:00 operation=Get duration=0.045238s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.439946+00:00 operation=Get duration=0.055710s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.463988+00:00 operation=Get duration=0.043367s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.454250+00:00 operation=Get duration=0.059977s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.466805+00:00 operation=Get duration=0.049686s size=255995 range: bytes=631794-887788 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.478305+00:00 operation=Get duration=0.047092s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.469006+00:00 operation=Get duration=0.059250s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.463996+00:00 operation=Get duration=0.066627s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.356563+00:00 operation=Get duration=0.176128s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.535239+00:00 operation=Get duration=0.032856s size=21561876 range: bytes=82420716-103982591 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.599274+00:00 operation=Get duration=0.041027s size=539555 range: bytes=11174906-11714460 path=hits_compatible/athena_partitioned/hits_1.parquet

@alamb alamb added this pull request to the merge queue Oct 11, 2025
Merged via the queue into apache:main with commit a1acf77 Oct 11, 2025
30 checks passed
@BlakeOrth
Copy link
Contributor Author

@alamb thanks for making those test changes to get this across the finish line!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants