Skip to content

Commit 8adbe93

Browse files
authored
Add guidance on chosing a reader (#23)
1 parent 3220504 commit 8adbe93

File tree

1 file changed

+72
-1
lines changed

1 file changed

+72
-1
lines changed

src/obspec_utils/obspec.py

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,6 +172,29 @@ class BufferedStoreReader:
172172
173173
The reader uses `get_range()` calls to fetch data on-demand, with optional
174174
read-ahead buffering for efficiency.
175+
176+
When to Use
177+
-----------
178+
Use BufferedStoreReader when:
179+
180+
- **Sequential reading with rare backward seeks**: Best for workloads that
181+
mostly read forward through a file with rare backward seeks.
182+
- **Simple use cases**: When you need a basic file-like interface without
183+
caching or parallel fetching.
184+
- **Streaming data**: Processing data as it arrives without loading the full
185+
file into memory.
186+
187+
Consider alternatives when:
188+
189+
- You need to read the entire file anyway → use [EagerStoreReader][obspec_utils.obspec.EagerStoreReader]
190+
- You have many non-contiguous reads → use [ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader]
191+
- You'll repeatedly access the same regions → use [EagerStoreReader][obspec_utils.obspec.EagerStoreReader]
192+
or [ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader]
193+
194+
See Also
195+
--------
196+
[EagerStoreReader][obspec_utils.obspec.EagerStoreReader] : Loads entire file into memory for fast random access.
197+
[ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader] : Uses parallel requests with LRU caching for sparse access.
175198
"""
176199

177200
def __init__(
@@ -334,6 +357,30 @@ class EagerStoreReader:
334357
protocol, the file size will be determined automatically via a HEAD request.
335358
336359
Works with any ReadableStore protocol implementation.
360+
361+
When to Use
362+
-----------
363+
Use EagerStoreReader when:
364+
365+
- **Reading the entire file**: When you know you'll need most or all of the
366+
file's contents.
367+
- **Repeated random access**: After the initial load, any byte is accessible
368+
with no network latency.
369+
- **Small to medium files**: Files that fit comfortably in memory.
370+
- **Parallel initial fetch**: With `chunk_size` set, the initial load uses
371+
parallel requests for faster download.
372+
373+
Consider alternatives when:
374+
375+
- You only need a small portion of a large file → use [ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader]
376+
- Memory is constrained → use [ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader] (bounded cache)
377+
or [BufferedStoreReader][obspec_utils.obspec.BufferedStoreReader]
378+
- You're streaming sequentially and won't revisit data → use [BufferedStoreReader][obspec_utils.obspec.BufferedStoreReader]
379+
380+
See Also
381+
--------
382+
[BufferedStoreReader][obspec_utils.obspec.BufferedStoreReader] : On-demand reads with read-ahead buffering.
383+
[ParallelStoreReader][obspec_utils.obspec.ParallelStoreReader] : Uses parallel requests with LRU caching for sparse access.
337384
"""
338385

339386
def __init__(
@@ -447,9 +494,33 @@ class ParallelStoreReader:
447494
to avoid redundant fetches.
448495
449496
This is particularly efficient for workloads that access multiple non-contiguous
450-
regions of a file, such as reading Zarr/HDF5 datasets.
497+
regions of a file.
451498
452499
Works with any ReadableStore protocol implementation.
500+
501+
When to Use
502+
-----------
503+
Use ParallelStoreReader when:
504+
505+
- **Sparse access patterns**: Reading many non-contiguous regions of a file.
506+
- **Large files with partial reads**: When you only need portions of a large
507+
file and don't want to load it all into memory.
508+
- **Memory-constrained environments**: The LRU cache has bounded memory usage
509+
(`chunk_size * max_cached_chunks`), regardless of file size.
510+
- **Unknown access patterns**: When you don't know upfront which parts of the
511+
file you'll need.
512+
513+
Consider alternatives when:
514+
515+
- You'll read the entire file anyway → use [EagerStoreReader][obspec_utils.obspec.EagerStoreReader]
516+
- Access is purely sequential → use [BufferedStoreReader][obspec_utils.obspec.BufferedStoreReader]
517+
- You need repeated access to more data than fits in the cache → use
518+
[EagerStoreReader][obspec_utils.obspec.EagerStoreReader] to avoid re-fetching evicted chunks
519+
520+
See Also
521+
--------
522+
[BufferedStoreReader][obspec_utils.obspec.BufferedStoreReader] : On-demand reads with read-ahead buffering.
523+
[EagerStoreReader][obspec_utils.obspec.EagerStoreReader] : Loads entire file into memory for fast random access.
453524
"""
454525

455526
def __init__(

0 commit comments

Comments
 (0)