-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Adds instrumentation to get requests for datafusion-cli #18016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Adds request instrumentation to GET operations when using the InstrumentedObjectStore in datafusion-cli - Adds RequestDetails type to store and format details about instrumented requests - Adds tests for new functionality
7ff1403
to
855ab17
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @BlakeOrth -- this is great.
I had one a small test suggestion, but I think we could also do it as a follow on PR
I also took the liberty of writing a cli integration test and pushing it to your branch to show how this was all connected up
} | ||
|
||
async fn get_opts(&self, location: &Path, options: GetOptions) -> Result<GetResult> { | ||
if self.instrument_mode.load(Ordering::Relaxed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make the code cleaner to move the load ordering into its own method -- like
Something like this maybe
impl InstrumentedObjectStore {
/// return the current mode of this instrumented store
fn enabled(&self) -> bool {
self.instrument_mode.load(Ordering::Relaxed) != InstrumentedObjectStoreMode::Disabled
}
}
|
||
let requests = instrumented.take_requests(); | ||
assert_eq!(requests.len(), 1); | ||
assert!(instrumented.requests.lock().is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be good here to assert on the contents of the request too -- to make sure that all fields got captured
For example:
assert_eq!(requests[0].op, Operation::Get);
// check other fields, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3c3e0ef
|
||
Object Store Profiling | ||
Instrumented Object Store: instrument_mode: Enabled, inner: AmazonS3(data) | ||
<TIMESTAMP> operation=Get duration=[DURATION] size=1006 path=cars.csv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty cool to see
BTW this is really great -- you can really see the effect of the metadata cache from @nuno-faria now: cargo run --bin datafusion-cli
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
Running `target/debug/datafusion-cli`
DataFusion CLI v50.1.0
> \object_store_profiling enabled
ObjectStore Profile mode set to Enabled
> CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet';
0 row(s) fetched.
Elapsed 0.342 seconds.
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:15:30.713572+00:00 operation=Get duration=0.027873s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:30.741797+00:00 operation=Get duration=0.038220s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
> select "Title", COUNT(*) FROM hits GROUP BY "Title" ORDER BY "Title";
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Title
...
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:15:35.467659+00:00 operation=Get duration=0.035913s size=21561876 range: bytes=82420716-103982591 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:35.467642+00:00 operation=Get duration=0.088730s size=255995 range: bytes=631794-887788 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:15:35.750944+00:00 operation=Get duration=0.024182s size=539555 range: bytes=11174906-11714460 path=hits_compatible/athena_partitioned/hits_1.parquet
But then disable the cache and you see many more requests > set datafusion.runtime.metadata_cache_limit = '0M';
0 row(s) fetched.
Elapsed 0.005 seconds.
Object Store Profiling
> select "Title", COUNT(*) FROM hits GROUP BY "Title" ORDER BY "Title";
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
| Title
...
Object Store Profiling
Instrumented Object Store: instrument_mode: Enabled, inner: HttpStore
2025-10-11T12:16:23.327461+00:00 operation=Get duration=0.029055s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327451+00:00 operation=Get duration=0.034038s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327506+00:00 operation=Get duration=0.065108s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327588+00:00 operation=Get duration=0.076740s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327754+00:00 operation=Get duration=0.100190s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328033+00:00 operation=Get duration=0.107667s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327736+00:00 operation=Get duration=0.108225s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327998+00:00 operation=Get duration=0.110686s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328019+00:00 operation=Get duration=0.110924s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327818+00:00 operation=Get duration=0.112115s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327855+00:00 operation=Get duration=0.115440s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.404359+00:00 operation=Get duration=0.044957s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327875+00:00 operation=Get duration=0.126352s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.361519+00:00 operation=Get duration=0.094653s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.427970+00:00 operation=Get duration=0.028384s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.392641+00:00 operation=Get duration=0.071319s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327762+00:00 operation=Get duration=0.136210s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327890+00:00 operation=Get duration=0.136089s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.327823+00:00 operation=Get duration=0.141164s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.435977+00:00 operation=Get duration=0.036920s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.328052+00:00 operation=Get duration=0.150241s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.438696+00:00 operation=Get duration=0.040188s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.438959+00:00 operation=Get duration=0.043520s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.435718+00:00 operation=Get duration=0.047369s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.443307+00:00 operation=Get duration=0.045238s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.439946+00:00 operation=Get duration=0.055710s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.463988+00:00 operation=Get duration=0.043367s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.454250+00:00 operation=Get duration=0.059977s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.466805+00:00 operation=Get duration=0.049686s size=255995 range: bytes=631794-887788 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.478305+00:00 operation=Get duration=0.047092s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.469006+00:00 operation=Get duration=0.059250s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.463996+00:00 operation=Get duration=0.066627s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.356563+00:00 operation=Get duration=0.176128s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.535239+00:00 operation=Get duration=0.032856s size=21561876 range: bytes=82420716-103982591 path=hits_compatible/athena_partitioned/hits_1.parquet
2025-10-11T12:16:23.599274+00:00 operation=Get duration=0.041027s size=539555 range: bytes=11174906-11714460 path=hits_compatible/athena_partitioned/hits_1.parquet
|
@alamb thanks for making those test changes to get this across the finish line! |
Which issue does this PR close?
This does not fully close, but is an incremental building block component for:
datafusion-cli
] Add a way to see what object store requests are made #17207The full context of how this code is likely to progress can be seen in the POC for this effort:
Rationale for this change
This change implements actual instrumentation for individual
GET
requests through anInstrumentedObjectStore
and renders that information for users when profiling has been enabled.What changes are included in this PR?
Are these changes tested?
Yes, aside from the specific functionality in
PrintOptions
, however all methods executed there are tested via unit tests and functional test output can be seen below:Are there any user-facing changes?
Yes? Clearly there is new user-facing output generated, however there are no new user options and/or breaking changes introduced in this PR.
cc @alamb