Skip to content

Commit 9f4e139

Browse files
Added checkpointing support to Neptune Scale Logger (#2)
* feat: Added checkpointing support, updated docs links * tests: Updated tests * Apply suggestions from code review Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com> * feat: Update NeptuneScaleLogger to log model checkpoint paths instead of uploading checkpoints * docs: Fix formatting of NeptuneScaleLogger API key and project placeholders in documentation * Update src/lightning/pytorch/loggers/neptune.py --------- Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
1 parent 3d01924 commit 9f4e139

File tree

6 files changed

+174
-46
lines changed

6 files changed

+174
-46
lines changed

docs/source-pytorch/extensions/logging.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ The following are loggers we support:
3131
CSVLogger
3232
MLFlowLogger
3333
NeptuneLogger
34+
NeptuneScaleLogger
3435
TensorBoardLogger
3536
WandbLogger
3637

docs/source-pytorch/visualize/supported_exp_managers.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,9 +60,9 @@ Here's the full documentation for the :class:`~lightning.pytorch.loggers.MLFlowL
6060

6161
----
6262

63-
Neptune.ai
63+
Neptune 2.x
6464
==========
65-
To use `Neptune.ai <https://www.neptune.ai/>`_ first install the neptune package:
65+
To use `Neptune 2.x <https://docs-legacy.neptune.ai/>`_ first install the neptune package:
6666

6767
.. code-block:: bash
6868
@@ -101,9 +101,9 @@ Here's the full documentation for the :class:`~lightning.pytorch.loggers.Neptune
101101

102102
----
103103

104-
Neptune Scale
104+
Neptune 3.x (Neptune Scale)
105105
==========
106-
To use `Neptune Scale <https://docs-beta.neptune.ai/>`_ first install the neptune-scale package:
106+
To use `Neptune 3.x <https://docs.neptune.ai/>`_ first install the neptune-scale package:
107107

108108
.. code-block:: bash
109109
@@ -119,8 +119,8 @@ Configure the logger and pass it to the :class:`~lightning.pytorch.trainer.train
119119
from lightning.pytorch.loggers import NeptuneScaleLogger
120120

121121
neptune_scale_logger = NeptuneScaleLogger(
122-
api_key=<YOUR_NEPTUNE_SCALE_API_KEY>, # replace with your own
123-
project="common/pytorch-lightning-integration", # format "<WORKSPACE/PROJECT>"
122+
api_key="<YOUR_NEPTUNE_SCALE_API_KEY>", # replace with your own
123+
project="<YOUR_NEPTUNE_SCALE_WORKSPACE>/<YOUR_NEPTUNE_SCALE_PROJECT>", # replace with your own
124124
)
125125
trainer = Trainer(logger=neptune_scale_logger)
126126

requirements/pytorch/loggers.info

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# all supported loggers. this list is here as a reference, but they are not installed in CI
22

33
neptune >=1.0.0
4-
neptune-scale
4+
neptune-scale >= 0.12.0
55
comet-ml >=3.31.0
66
mlflow >=1.0.0
77
wandb >=0.12.10

src/lightning/pytorch/loggers/neptune.py

Lines changed: 119 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ def wrapper(*args: Any, **kwargs: Any) -> Any:
6969

7070

7171
class NeptuneLogger(Logger):
72-
r"""Log using `Neptune <https://docs.neptune.ai/integrations/lightning/>`_.
72+
r"""Log using `Neptune <https://docs-legacy.neptune.ai/integrations/lightning/>`_.
7373
7474
Install it with pip:
7575
@@ -129,7 +129,7 @@ def any_lightning_module_function_or_hook(self):
129129
Note that the syntax ``self.logger.experiment["your/metadata/structure"].append(metadata)`` is specific to
130130
Neptune and extends the logger capabilities. It lets you log various types of metadata, such as
131131
scores, files, images, interactive visuals, and CSVs.
132-
Refer to the `Neptune docs <https://docs.neptune.ai/logging/methods>`_
132+
Refer to the `Neptune docs <https://docs-legacy.neptune.ai/logging/methods>`_
133133
for details.
134134
You can also use the regular logger methods ``log_metrics()``, and ``log_hyperparams()`` with NeptuneLogger.
135135
@@ -184,7 +184,7 @@ def any_lightning_module_function_or_hook(self):
184184
)
185185
trainer = Trainer(max_epochs=3, logger=neptune_logger)
186186
187-
Check `run documentation <https://docs.neptune.ai/api/neptune/#init_run>`_
187+
Check `run documentation <https://docs-legacy.neptune.ai/api/neptune/#init_run>`_
188188
for more info about additional run parameters.
189189
190190
**Details about Neptune run structure**
@@ -196,18 +196,18 @@ def any_lightning_module_function_or_hook(self):
196196
197197
See also:
198198
- Read about
199-
`what objects you can log to Neptune <https://docs.neptune.ai/logging/what_you_can_log/>`_.
199+
`what objects you can log to Neptune <https://docs-legacy.neptune.ai/logging/what_you_can_log/>`_.
200200
- Check out an `example run <https://app.neptune.ai/o/common/org/pytorch-lightning-integration/e/PTL-1/all>`_
201201
with multiple types of metadata logged.
202202
- For more detailed examples, see the
203-
`user guide <https://docs.neptune.ai/integrations/lightning/>`_.
203+
`user guide <https://docs-legacy.neptune.ai/integrations/lightning/>`_.
204204
205205
Args:
206206
api_key: Optional.
207207
Neptune API token, found on https://www.neptune.ai upon registration.
208208
You should save your token to the `NEPTUNE_API_TOKEN`
209209
environment variable and leave the api_key argument out of your code.
210-
Instructions: `Setting your API token <https://docs.neptune.ai/setup/setting_api_token/>`_.
210+
Instructions: `Setting your API token <https://docs-legacy.neptune.ai/setup/setting_api_token/>`_.
211211
project: Optional.
212212
Name of a project in the form "workspace-name/project-name", for example "tom/mask-rcnn".
213213
If ``None``, the value of `NEPTUNE_PROJECT` environment variable is used.
@@ -377,7 +377,7 @@ def training_step(self, batch, batch_idx):
377377
is specific to Neptune and extends the logger capabilities.
378378
It lets you log various types of metadata, such as scores, files,
379379
images, interactive visuals, and CSVs. Refer to the
380-
`Neptune docs <https://docs.neptune.ai/logging/methods>`_
380+
`Neptune docs <https://docs-legacy.neptune.ai/logging/methods>`_
381381
for more detailed explanations.
382382
You can also use the regular logger methods ``log_metrics()``, and ``log_hyperparams()``
383383
with NeptuneLogger.
@@ -600,7 +600,7 @@ def version(self) -> Optional[str]:
600600

601601

602602
class NeptuneScaleLogger(Logger):
603-
r"""Log using `Neptune Scale <https://docs-beta.neptune.ai/>`_.
603+
r"""Log using `Neptune Scale <https://docs.neptune.ai/>`_.
604604
605605
Install it with pip:
606606
@@ -630,7 +630,6 @@ class NeptuneScaleLogger(Logger):
630630
631631
.. code-block:: python
632632
633-
from neptune.types import File
634633
from lightning.pytorch import LightningModule
635634
636635
@@ -647,7 +646,7 @@ def any_lightning_module_function_or_hook(self):
647646
648647
Note that the syntax ``self.logger.run.log_metrics(data={"your/metadata/structure": metadata}, step=step)``
649648
is specific to Neptune Scale.
650-
Refer to the `Neptune Scale docs <https://docs-beta.neptune.ai/log_metadata>`_ for details.
649+
Refer to the `Neptune Scale docs <https://docs.neptune.ai/log_metadata>`_ for details.
651650
You can also use the regular logger methods ``log_metrics()``, and ``log_hyperparams()`` with NeptuneScaleLogger.
652651
653652
**Log after fitting or testing is finished**
@@ -670,6 +669,18 @@ def any_lightning_module_function_or_hook(self):
670669
neptune_logger.run.log_configs(data={"your/metadata/structure": metadata})
671670
neptune_logger.run.add_tags(["tag1", "tag2"])
672671
672+
**Log model checkpoint paths**
673+
674+
If you have :class:`~lightning.pytorch.callbacks.ModelCheckpoint` configured,
675+
the Neptune logger can log model checkpoint paths.
676+
Paths will be logged to the "model/checkpoints" namespace in the Neptune run.
677+
You can disable this option with:
678+
679+
.. code-block:: python
680+
681+
neptune_logger = NeptuneScaleLogger(log_model_checkpoints=False)
682+
683+
Note: All model checkpoint paths will be logged. ``save_last`` and ``save_top_k`` are currently not supported.
673684
674685
**Pass additional parameters to the Neptune run**
675686
@@ -688,7 +699,7 @@ def any_lightning_module_function_or_hook(self):
688699
)
689700
trainer = Trainer(max_epochs=3, logger=neptune_scale_logger)
690701
691-
Check `run documentation <https://docs-beta.neptune.ai/run>`_ for more info about additional run
702+
Check `run documentation <https://docs.neptune.ai/run>`_ for more info about additional run
692703
parameters.
693704
694705
**Details about Neptune run structure**
@@ -712,26 +723,30 @@ def any_lightning_module_function_or_hook(self):
712723
Neptune API token, found on https://scale.neptune.ai upon registration.
713724
You should save your token to the `NEPTUNE_API_TOKEN` environment variable and leave
714725
the api_token argument out of your code.
715-
Instructions: `Setting your API token <https://docs-beta.neptune.ai/setup#3-get-your-api-token>`_.
726+
Instructions: `Setting your API token <https://docs.neptune.ai/setup#3-get-your-api-token>`_.
716727
resume: Optional.
717728
If `False`, creates a new run.
718729
To continue an existing run, set to `True` and pass the ID of an existing run to the `run_id` argument.
719730
In this case, omit the `experiment_name` parameter.
720731
To fork a run, use `fork_run_id` and `fork_step` instead.
721732
mode: Optional.
722-
`Mode <https://docs-beta.neptune.ai/modes>`_ of operation.
733+
`Mode <https://docs.neptune.ai/modes>`_ of operation.
723734
If "disabled", the run doesn't log any metadata.
724-
If "offline", the run is only stored locally. For details, see `Offline logging <https://docs-beta.neptune.ai/offline>`_.
735+
If "offline", the run is only stored locally. For details, see `Offline logging <https://docs.neptune.ai/offline>`_.
725736
If this parameter and the
726-
`NEPTUNE_MODE <https://docs-beta.neptune.ai/environment_variables/neptune_scale#neptune_mode>`_
737+
`NEPTUNE_MODE <https://docs.neptune.ai/environment_variables/neptune_scale#neptune_mode>`_
727738
environment variable are not set, the default is "async".
728739
experiment_name: Optional.
729-
Name of the experiment <https://docs-beta.neptune.ai/experiments> to associate the run with.
740+
Name of the experiment <https://docs.neptune.ai/experiments> to associate the run with.
730741
Can't be used together with the `resume` parameter.
731742
To make the name easy to read in the app, ensure that it's at most 190 characters long.
732743
run: Optional. Default is ``None``. A Neptune ``Run`` object.
733744
If specified, this existing run will be used for logging, instead of a new run being created.
734745
prefix: Optional. Default is ``"training"``. Root namespace for all metadata logging.
746+
log_model_checkpoints: Optional. Default is ``True``. Log model checkpoint paths to Neptune.
747+
Works only if ``ModelCheckpoint`` is passed to the ``Trainer``.
748+
NOTE: All model checkpoint paths will be logged.
749+
``save_last`` and ``save_top_k`` are currently not supported.
735750
neptune_run_kwargs: Additional arguments like ``creation_time``, ``log_directory``,
736751
``fork_run_id``, ``fork_step``, ``*_callback``, etc. used when a run is created.
737752
@@ -757,6 +772,7 @@ def __init__(
757772
experiment_name: Optional[str] = None,
758773
run: Optional["Run"] = None,
759774
prefix: str = "training",
775+
log_model_checkpoints: Optional[bool] = True,
760776
**neptune_run_kwargs: Any,
761777
):
762778
if not _NEPTUNE_SCALE_AVAILABLE:
@@ -778,16 +794,12 @@ def __init__(
778794
self._run_id = run_id
779795
self._experiment_name = experiment_name
780796
self._prefix = prefix
797+
self._log_model_checkpoints = log_model_checkpoints
781798
self._neptune_run_kwargs = neptune_run_kwargs
782799
self._description = self._neptune_run_kwargs.pop("description", None)
783800
self._tags = self._neptune_run_kwargs.pop("tags", None)
784801
self._group_tags = self._neptune_run_kwargs.pop("group_tags", None)
785802

786-
if "log_model_checkpoints" in self._neptune_run_kwargs:
787-
log.warning("Neptune Scale does not support logging model checkpoints.")
788-
del self._neptune_run_kwargs["log_model_checkpoints"]
789-
self._log_model_checkpoints = False
790-
791803
if self._run_instance is not None:
792804
self._retrieve_run_data()
793805

@@ -887,7 +899,7 @@ def training_step(self, batch, batch_idx):
887899
888900
Note that the syntax ``self.logger.run.log_metrics(data={"your/metadata/structure": metadata}, step=step)``
889901
is specific to Neptune Scale. Refer to the
890-
`Neptune Scale docs <https://docs-beta.neptune.ai/log_metadata>`_
902+
`Neptune Scale docs <https://docs.neptune.ai/log_metadata>`_
891903
for more detailed explanations.
892904
You can also use the regular logger methods ``log_metrics()``, and ``log_hyperparams()``
893905
with NeptuneScaleLogger.
@@ -1004,7 +1016,7 @@ def finalize(self, status: str) -> None:
10041016
# initialized there
10051017
return
10061018
if status:
1007-
self.run._status = status
1019+
self.run.log_configs({self._construct_path_with_prefix("status"): status})
10081020

10091021
super().finalize(status)
10101022

@@ -1025,25 +1037,100 @@ def save_dir(self) -> Optional[str]:
10251037

10261038
@rank_zero_only
10271039
def log_model_summary(self, model: "pl.LightningModule", max_depth: int = -1) -> None:
1028-
"""Not implemented for Neptune Scale."""
1029-
log.warning("Neptune Scale does not support logging model summaries.")
1030-
return
1040+
"""Logs a summary of all layers in the model to Neptune as a text file."""
1041+
from neptune_scale.types import File
1042+
1043+
model_str = str(ModelSummary(model=model, max_depth=max_depth))
1044+
self.run.assign_files({
1045+
self._construct_path_with_prefix("model/summary"): File(
1046+
source=model_str.encode("utf-8"), mime_type="text/plain"
1047+
)
1048+
})
10311049

10321050
@override
10331051
@rank_zero_only
10341052
def after_save_checkpoint(self, checkpoint_callback: Checkpoint) -> None:
1035-
"""Not implemented for Neptune Scale."""
1036-
return
1053+
"""Automatically log checkpointed model's path. Called after model checkpoint callback saves a new checkpoint.
1054+
1055+
Args:
1056+
checkpoint_callback: the model checkpoint callback instance
1057+
1058+
"""
1059+
if not self._log_model_checkpoints:
1060+
return
1061+
1062+
file_names = set()
1063+
checkpoints_namespace = self._construct_path_with_prefix("model/checkpoints")
1064+
1065+
# save last model
1066+
if hasattr(checkpoint_callback, "last_model_path") and checkpoint_callback.last_model_path:
1067+
model_last_name = self._get_full_model_name(checkpoint_callback.last_model_path, checkpoint_callback)
1068+
file_names.add(model_last_name)
1069+
self.run.log_configs({
1070+
f"{checkpoints_namespace}/{model_last_name}": checkpoint_callback.last_model_path,
1071+
})
1072+
1073+
# save best k models
1074+
if hasattr(checkpoint_callback, "best_k_models"):
1075+
for key in checkpoint_callback.best_k_models:
1076+
model_name = self._get_full_model_name(key, checkpoint_callback)
1077+
file_names.add(model_name)
1078+
self.run.log_configs({
1079+
f"{checkpoints_namespace}/{model_name}": key,
1080+
})
1081+
1082+
# log best model path and checkpoint
1083+
if hasattr(checkpoint_callback, "best_model_path") and checkpoint_callback.best_model_path:
1084+
self.run.log_configs({
1085+
self._construct_path_with_prefix("model/best_model_path"): checkpoint_callback.best_model_path,
1086+
})
1087+
1088+
model_name = self._get_full_model_name(checkpoint_callback.best_model_path, checkpoint_callback)
1089+
file_names.add(model_name)
1090+
self.run.log_configs({
1091+
f"{checkpoints_namespace}/{model_name}": checkpoint_callback.best_model_path,
1092+
})
1093+
1094+
# remove old models logged to experiment if they are not part of best k models at this point
1095+
# TODO: Implement after Neptune Scale supports `del`
1096+
# if self.run.exists(checkpoints_namespace):
1097+
# exp_structure = self.run.get_structure()
1098+
# uploaded_model_names = self._get_full_model_names_from_exp_structure(
1099+
# exp_structure, checkpoints_namespace
1100+
# )
1101+
1102+
# for file_to_drop in list(uploaded_model_names - file_names):
1103+
# del self.run[f"{checkpoints_namespace}/{file_to_drop}"]
1104+
1105+
# log best model score
1106+
if hasattr(checkpoint_callback, "best_model_score") and checkpoint_callback.best_model_score:
1107+
self.run.log_configs({
1108+
self._construct_path_with_prefix("model/best_model_score"): float(
1109+
checkpoint_callback.best_model_score.cpu().detach().numpy()
1110+
),
1111+
})
10371112

10381113
@staticmethod
1039-
def _get_full_model_name(model_path: str, checkpoint_callback: Checkpoint) -> None:
1114+
def _get_full_model_name(model_path: str, checkpoint_callback: Checkpoint) -> str:
10401115
"""Returns model name which is string `model_path` appended to `checkpoint_callback.dirpath`."""
1041-
return
1116+
if hasattr(checkpoint_callback, "dirpath"):
1117+
model_path = os.path.normpath(model_path)
1118+
expected_model_path = os.path.normpath(checkpoint_callback.dirpath)
1119+
if not model_path.startswith(expected_model_path):
1120+
raise ValueError(f"{model_path} was expected to start with {expected_model_path}.")
1121+
# Remove extension from filepath
1122+
filepath, _ = os.path.splitext(model_path[len(expected_model_path) + 1 :])
1123+
return filepath.replace(os.sep, "/")
1124+
return model_path.replace(os.sep, "/")
10421125

10431126
@classmethod
1044-
def _get_full_model_names_from_exp_structure(cls, exp_structure: dict[str, Any], namespace: str) -> set[None]:
1127+
def _get_full_model_names_from_exp_structure(cls, exp_structure: dict[str, Any], namespace: str) -> set[str]:
10451128
"""Returns all paths to properties which were already logged in `namespace`"""
1046-
return set()
1129+
structure_keys: list[str] = namespace.split(cls.LOGGER_JOIN_CHAR)
1130+
for key in structure_keys:
1131+
exp_structure = exp_structure[key]
1132+
uploaded_models_dict = exp_structure
1133+
return set(cls._dict_paths(uploaded_models_dict))
10471134

10481135
@classmethod
10491136
def _dict_paths(cls, d: dict[str, Any], path_in_build: Optional[str] = None) -> Generator:

src/pytorch_lightning/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -252,9 +252,12 @@ trainer = Trainer(logger=loggers.CometLogger())
252252
# mlflow
253253
trainer = Trainer(logger=loggers.MLFlowLogger())
254254

255-
# neptune
255+
# neptune 2.x
256256
trainer = Trainer(logger=loggers.NeptuneLogger())
257257

258+
# neptune 3.x
259+
trainer = Trainer(logger=loggers.NeptuneScaleLogger())
260+
258261
# ... and dozens more
259262
```
260263

0 commit comments

Comments
 (0)