Skip to content

Commit 1107096

Browse files
michalsosngabrysCopilotpiterclnormandy7
authored
feat: fetch metric buckets (#46)
* feat: fetch_metrics_buckets output format + draft of the rest (#25) * feat: fetch_metrics_buckets [WIP] * fix: naming * fix: use IntervalIndex for buckets * fix: review --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * feat: directly fetch run_attribute_definitions (#27) * wip * feat: directly fetch run_attribute_definitions * feat: apply fetch_run_attribute_definitions to fethc_metric_buckets * fix: revert changes to metrics * fix: add tests for fetch_run_attribute_definitions_single_filter * Deduplicate attribute values fetching logic (#30) * fix: review * fix: remove test --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * fix: review --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * PY-225 Retrieval code communicating to the backend for metric rendering API (#28) * Bump neptune-api to the dev version * PY-225 Retrieval code communicating to the backend for metric rendering API * Remove debugging left-overs * Code review notes addressed * Rename BucketMetric to TimeseriesBucket and populate all fields that backend sends * Fix protobuf_v4plus imports * Remove DEFAULT_MAX_BUCKETS and x_neptune_client_metadata * fix circular import * Fix unit tests too * chore: rename cols/index (#32) Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * feat: add logger (#33) * feat: add logger * Update src/neptune_query/internal/retrieval/retry.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update src/neptune_query/internal/retrieval/retry.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: sort create_metric_buckets_dataframe output df (#34) Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * fix: change column name (#35) * Bump neptune-api to 0.22.0 (#36) * PY-235 Fix deserialization of optional fields of ProtoTimeseriesBucketsDTO (#40) * PY-235 Fix deserialization of optional fields of ProtoTimeseriesBucketsDTO * Update dataclass types too * PY-233 Add @experimental decorator to fetch_metric_buckets (#42) * PY-233 Add @experimental decorator to fetch_metric_buckets * Add docstring for fetch_metric_buckets() (#37) * add docstrings for fetch_metric_buckets() * chore: add metric buckets tests (#31) * chore: add some bucket tests * fix: x range calcs * fix: add nan/inf tests * add different xs axes tests * fix --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * feat: merge 1st and 2nd bucket (#44) * feat: merge 1st and 2nd bucket * fix: add several tests * fix: limit 1000 and range ends equality --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * fix: add proper typing to experimental decorator (#51) * fix: take start_point in the first bucket of a series (#45) * take 1st point in the first series bucket * fix: return proper first/last x/y from generate_categorized_rows --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * Fix output format for all non finite rows (#43) * PY-236 Fix output format for all non finite rows * fix: change index->filter in create_metrics_dataframe. Use ind/col instead of 0/1 * add a test with a bucket series of nans --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> * fix: experimental status of fetch_metric_buckets in docstrings (#52) * fix: experimental status of fetch_metric_buckets in docstrings * Copy the copy from docs * Some more changes --------- Co-authored-by: Michał Sośnicki <michal.sosnicki@neptune.ai> Co-authored-by: Piotr Gabryjeluk <piotr.gabryjeluk@neptune.ai> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Piotr Łusakowski <piotr.lusakowskI@neptune.ai> Co-authored-by: Sabine Ståhlberg <sabine.stahlberg@neptune.ai>
1 parent 5a9f757 commit 1107096

28 files changed

+1889
-123
lines changed

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ pattern = "default-unprefixed"
1212
python = "^3.10"
1313

1414
# Base neptune package
15-
neptune-api = ">=0.20.0,<0.22.0"
15+
neptune-api = ">=0.22.0,<0.23.0"
1616
azure-storage-blob = "^12.7.0"
1717
pandas = ">=1.4.0"
1818

src/neptune_query/__init__.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
"list_attributes",
2121
"fetch_experiments_table",
2222
"fetch_metrics",
23+
"fetch_metric_buckets",
2324
"fetch_series",
2425
"download_files",
2526
]
@@ -45,16 +46,19 @@
4546
resolve_destination_path,
4647
resolve_experiments_filter,
4748
resolve_files,
49+
resolve_metrics_y,
4850
resolve_sort_by,
4951
)
5052
from neptune_query.exceptions import NeptuneUserError
5153
from neptune_query.internal.composition import download_files as _download_files
54+
from neptune_query.internal.composition import fetch_metric_buckets as _fetch_metric_buckets
5255
from neptune_query.internal.composition import fetch_metrics as _fetch_metrics
5356
from neptune_query.internal.composition import fetch_series as _fetch_series
5457
from neptune_query.internal.composition import fetch_table as _fetch_table
5558
from neptune_query.internal.composition import list_attributes as _list_attributes
5659
from neptune_query.internal.composition import list_containers as _list_containers
5760
from neptune_query.internal.context import set_api_token
61+
from neptune_query.internal.experimental import experimental
5862
from neptune_query.internal.query_metadata_context import use_query_metadata
5963
from neptune_query.internal.retrieval import search as _search
6064

@@ -424,3 +428,75 @@ def download_files(
424428
destination=destination_path,
425429
container_type=_search.ContainerType.EXPERIMENT,
426430
)
431+
432+
433+
@experimental
434+
@use_query_metadata(api_function="fetch_metric_buckets")
435+
def fetch_metric_buckets(
436+
*,
437+
project: Optional[str] = None,
438+
experiments: Union[str, list[str], filters.Filter],
439+
x: Union[Literal["step"]] = "step",
440+
y: Union[str, list[str], filters.AttributeFilter],
441+
limit: int = 1000,
442+
lineage_to_the_root: bool = True,
443+
include_point_previews: bool = False,
444+
) -> _pandas.DataFrame:
445+
"""Fetches a table of metric values split by X-axis buckets.
446+
447+
**Caution:** This function is experimental and might be changed or removed in a future minor release.
448+
Use with caution in production code.
449+
450+
One point is returned from each bucket. To control the number of buckets, use the `limit` parameter.
451+
452+
Both the first and last points of each metric are always included:
453+
- For every first bucket of a given series, the first point is returned.
454+
- For the remaining buckets, the last point is returned.
455+
456+
Args:
457+
project: Path of the Neptune project, as `WorkspaceName/ProjectName`.
458+
If not provided, the NEPTUNE_PROJECT environment variable is used.
459+
experiments: Filter specifying which experiments to include.
460+
If a string is provided, it's treated as a regex pattern that the names must match.
461+
If a list of strings is provided, it's treated as exact experiment names to match.
462+
To provide a more complex condition on an arbitrary attribute value, pass a Filter object.
463+
x: The X-axis series used for the bucketing. Only "step" is currently supported.
464+
y: Filter specifying which metrics to include.
465+
If a string is provided, it's treated as a regex pattern that the metric names must match.
466+
If a list of strings is provided, it's treated as exact metric names to match.
467+
To provide a more complex condition, pass an AttributeFilter object.
468+
limit: Number of buckets to use. The default and maximum value is 1000.
469+
lineage_to_the_root: If True (default), includes all values from the complete experiment history.
470+
If False, only includes values from the most recent experiment in the lineage.
471+
include_point_previews: If False (default), the returned results only contain committed
472+
points. If True, the results also include preview points and the returned DataFrame will
473+
have additional sub-columns with preview status: is_preview and preview_completion.
474+
475+
Example:
476+
From two specific experiments, fetch training losses split into 10 buckets:
477+
```
478+
import neptune_query as nq
479+
480+
481+
nq.fetch_metric_buckets(
482+
experiments=["seagull-week1", "seagull-week2"],
483+
x="step",
484+
y=r"^train/loss",
485+
limit=10, # Only 10 buckets for broad trends
486+
)
487+
```
488+
"""
489+
project_identifier = get_default_project_identifier(project)
490+
experiments_filter = resolve_experiments_filter(experiments)
491+
resolved_y = resolve_metrics_y(y)
492+
493+
return _fetch_metric_buckets.fetch_metric_buckets(
494+
project_identifier=project_identifier,
495+
filter_=experiments_filter,
496+
x=x,
497+
y=resolved_y,
498+
limit=limit,
499+
lineage_to_the_root=lineage_to_the_root,
500+
include_point_previews=include_point_previews,
501+
container_type=_search.ContainerType.EXPERIMENT,
502+
)

src/neptune_query/_internal.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,12 @@ def resolve_attributes_filter(
8080
)
8181

8282

83+
def resolve_metrics_y(
84+
attributes: Optional[Union[str, list[str], filters.AttributeFilter]],
85+
) -> _filters._BaseAttributeFilter:
86+
return resolve_attributes_filter(attributes)
87+
88+
8389
def resolve_sort_by(sort_by: Union[str, filters.Attribute]) -> _filters._Attribute:
8490
if isinstance(sort_by, str):
8591
return filters.Attribute(sort_by)._to_internal()

src/neptune_query/internal/composition/attribute_components.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
split,
3838
util,
3939
)
40+
from . import attributes
4041

4142

4243
def fetch_attribute_definitions_split(
@@ -163,3 +164,29 @@ def fetch_attribute_values_split(
163164
downstream=downstream,
164165
),
165166
)
167+
168+
169+
def fetch_attribute_values_by_filter_split(
170+
client: AuthenticatedClient,
171+
project_identifier: identifiers.ProjectIdentifier,
172+
executor: Executor,
173+
fetch_attribute_definitions_executor: Executor,
174+
sys_ids: list[identifiers.SysId],
175+
attribute_filter: filters._BaseAttributeFilter,
176+
downstream: Callable[[util.Page[att_vals.AttributeValue]], concurrency.OUT],
177+
) -> concurrency.OUT:
178+
return concurrency.generate_concurrently(
179+
items=split.split_sys_ids(sys_ids),
180+
executor=executor,
181+
downstream=lambda split: concurrency.generate_concurrently(
182+
items=attributes.fetch_attribute_values(
183+
client=client,
184+
project_identifier=project_identifier,
185+
run_identifiers=[identifiers.RunIdentifier(project_identifier, s) for s in split],
186+
attribute_filter=attribute_filter,
187+
executor=fetch_attribute_definitions_executor,
188+
),
189+
executor=executor,
190+
downstream=downstream,
191+
),
192+
)

src/neptune_query/internal/composition/attributes.py

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,9 @@
3131
)
3232
from ..composition import concurrency
3333
from ..retrieval import attribute_definitions as att_defs
34+
from ..retrieval import attribute_values as att_vals
3435
from ..retrieval import util
36+
from ..retrieval.attribute_filter import split_attribute_filters
3537
from ..retrieval.attribute_types import TYPE_AGGREGATIONS
3638

3739

@@ -126,7 +128,7 @@ def go_fetch_single(
126128
batch_size=batch_size,
127129
)
128130

129-
filters_ = att_defs.split_attribute_filters(attribute_filter)
131+
filters_ = split_attribute_filters(attribute_filter)
130132

131133
output = concurrency.generate_concurrently(
132134
items=(filter_ for filter_ in filters_),
@@ -138,3 +140,53 @@ def go_fetch_single(
138140
),
139141
)
140142
yield from concurrency.gather_results(output)
143+
144+
145+
def fetch_attribute_values(
146+
client: AuthenticatedClient,
147+
project_identifier: identifiers.ProjectIdentifier,
148+
run_identifiers: Iterable[identifiers.RunIdentifier],
149+
attribute_filter: filters._BaseAttributeFilter,
150+
executor: Executor,
151+
batch_size: int = env.NEPTUNE_QUERY_ATTRIBUTE_DEFINITIONS_BATCH_SIZE.get(),
152+
) -> Generator[util.Page[att_vals.AttributeValue], None, None]:
153+
pages_filters = _fetch_attribute_values(
154+
client, project_identifier, run_identifiers, attribute_filter, batch_size, executor
155+
)
156+
157+
seen_items: set[att_vals.AttributeValue] = set()
158+
for page in pages_filters:
159+
new_items = [item for item in page.items if item not in seen_items]
160+
seen_items.update(new_items)
161+
yield util.Page(items=new_items)
162+
163+
164+
def _fetch_attribute_values(
165+
client: AuthenticatedClient,
166+
project_identifier: identifiers.ProjectIdentifier,
167+
run_identifiers: Iterable[identifiers.RunIdentifier],
168+
attribute_filter: filters._BaseAttributeFilter,
169+
batch_size: int,
170+
executor: Executor,
171+
) -> Generator[util.Page[att_vals.AttributeValue], None, None]:
172+
def go_fetch_single(filter_: filters._AttributeFilter) -> Generator[util.Page[att_vals.AttributeValue], None, None]:
173+
return att_vals.fetch_attribute_values(
174+
client=client,
175+
project_identifier=project_identifier,
176+
run_identifiers=run_identifiers,
177+
attribute_definitions=filter_,
178+
batch_size=batch_size,
179+
)
180+
181+
filters_ = split_attribute_filters(attribute_filter)
182+
183+
output = concurrency.generate_concurrently(
184+
items=(filter_ for filter_ in filters_),
185+
executor=executor,
186+
downstream=lambda filter_: concurrency.generate_concurrently(
187+
items=go_fetch_single(filter_),
188+
executor=executor,
189+
downstream=lambda _page: concurrency.return_value(_page),
190+
),
191+
)
192+
yield from concurrency.gather_results(output)

0 commit comments

Comments
 (0)