Getting ValueError Error when using ModelCheckpoint with auto_insert_metric_name=False #16385

quy-ng · 2023-01-16T17:22:36Z

quy-ng
Jan 16, 2023

Hi, I understand that auto_insert_metric_name=False when metric names contain /.
For my case, I set checkpoints as follows.

 pl.callbacks.ModelCheckpoint(monitor='epoch/val/loss', mode='min',
                                                       auto_insert_metric_name=False,
                                                       filename='epoch={epoch:02d}-loss={epoch/val/loss:.2f}')

But encounter a problem which is ValueError: Only '.' or '[' may follow ']' in format field specifier at pytorch_lightning/callbacks/model_checkpoint.py", line 515

Wondering if I missed anything on this feature?

Answered by 1pha

Mar 14, 2023

Hi @dongchirua, I found this issue by debugging pytorch_lightning/callbacks/model_checkpoint.py #L524

    def _format_checkpoint_name(
        cls,
        filename: Optional[str],
        metrics: Dict[str, Tensor],
        prefix: str = "",
        auto_insert_metric_name: bool = True,
    ) -> str:
        if not filename:
            # filename is not set, use default name
            filename = "{epoch}" + cls.CHECKPOINT_JOIN_CHAR + "{step}"

        # check and parse user passed keys in the string
        groups = re.findall(r"(\{.*?)[:\}]", filename)
        if len(groups) >= 0:
            for group in groups:
                name = group[1:]

                if auto_insert_metric…

View full answer

1pha · 2023-03-14T12:11:37Z

1pha
Mar 14, 2023

Hi @dongchirua, I found this issue by debugging pytorch_lightning/callbacks/model_checkpoint.py #L524

    def _format_checkpoint_name(
        cls,
        filename: Optional[str],
        metrics: Dict[str, Tensor],
        prefix: str = "",
        auto_insert_metric_name: bool = True,
    ) -> str:
        if not filename:
            # filename is not set, use default name
            filename = "{epoch}" + cls.CHECKPOINT_JOIN_CHAR + "{step}"

        # check and parse user passed keys in the string
        groups = re.findall(r"(\{.*?)[:\}]", filename)
        if len(groups) >= 0:
            for group in groups:
                name = group[1:]

                if auto_insert_metric_name:
                    filename = filename.replace(group, name + "={" + name)

                # support for dots: https://stackoverflow.com/a/7934969
                filename = filename.replace(group, f"{{0[{name}]")

                if name not in metrics:
                    metrics[name] = torch.tensor(0)
            filename = filename.format(metrics)

        if prefix:
            filename = cls.CHECKPOINT_JOIN_CHAR.join([prefix, filename])

        return filename

Regex groups will contain "{epoch" and "{epoch/val/loss". In filename = filename.replace(group, f"{{0[{name}]"), the pattern "{epoch" will replace BOTH the first epoch and epoch of 'epoch/val/loss' which results in 'epoch={0[epoch]:02d}-loss={0[epoch]/val/loss:.2f}'. Format on this string will incur an error since designating floating point through ":.2f" requires proper bracket open-close.
This is NOT expected, but we rather expect 'epoch={0[epoch]:02d}-loss={0[epoch/val/loss]:.2f}'.

For me I just hacked into pytorch-lightning source code by changingfor group in groups[::-1, tracking "epoch/val/loss" first. This is just a hack which does NOT essentially solve the problem and not a desired solution as well. I think changing not only including the opening bracket {, but also closing bracket } in regex pattern may solve the issue

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting ValueError Error when using ModelCheckpoint with auto_insert_metric_name=False #16385

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Getting ValueError Error when using ModelCheckpoint with auto_insert_metric_name=False #16385

Uh oh!

quy-ng Jan 16, 2023

Replies: 1 comment

Uh oh!

Uh oh!

1pha Mar 14, 2023

quy-ng
Jan 16, 2023

1pha
Mar 14, 2023