Inconsistent usage of paths in the experimental loggers

### Outline & Motivation

The usage of [`log_dir`](https://github.com/Lightning-AI/pytorch-lightning/blob/9177ec09caadcf88859e1f1e3e10a18e8832069a/src/lightning/fabric/loggers/logger.py#L47) and [`save_dir`](https://github.com/Lightning-AI/pytorch-lightning/blob/9177ec09caadcf88859e1f1e3e10a18e8832069a/src/lightning/pytorch/loggers/logger.py#L45) are confusion. 

Imaging we are implement a new logger, let's see how we can define the `save_dir` property.

```python
class CustomLogger(Logger):
    def __init__(self, all_experiment_path):
        self._root_dir = all_experiment_path

    @property
    @override
    def save_dir(self) -> str:
        option1 = os.path.join(self._root_dir, self.name, str(self.version))
        # option2 = self._root_dir
        # option3 = os.path.join(self._root_dir, self.name)
        return option1  # or option2 or option3
```

The first option includes the experiment name and version as subdirectories, and the second one does not. 

As `CustomLogger` is not a subclass of `TensorBoardLogger` or `CSVLogger`. the `Trainer.log_dir` is defined to be `logger.save_dir` if the first logger is a `CustomLogger`, according to [here](https://github.com/Lightning-AI/pytorch-lightning/blob/9177ec09caadcf88859e1f1e3e10a18e8832069a/src/lightning/pytorch/cli.py#L252C13-L252C20).
The `config.yaml` file will be saved to `logger.save_dir`/`config.yaml`.  The checkpoint file will be saved to `logger.save_dir`/`logger.name`/`logger.version`, which 
expand to be `logger._root_dir`/`logger.name`/`logger.version`/`logger.name`/`logger.version`, which is with duplicate directory hierarchy, according to [`__resolve_ckpt_dir()`](https://github.com/Lightning-AI/pytorch-lightning/blob/9177ec09caadcf88859e1f1e3e10a18e8832069a/src/lightning/pytorch/callbacks/model_checkpoint.py#L610) function.

I **DO NOT** like it. The following tree view shows this clearly. 
```
<all_experiment_root>
└── experiment_name
    ├── version_0
    │   ├── config.yaml
    │   └── experiment_name
    │        └── version_0
    │             └── checkpoints
    │                  └── epoch=7-step=3000.ckpt
    └──  version_1
        ├── config.yaml
        └── experiment_name
              └── version_1
                   └── checkpoints
                        └── epoch=28-step=10875.ckpt
```

However, there is no way to remove the duplicate hierarchy for now. If the second or third option is adopted, the `config.yaml` file will be saved in a directory irrelevant to the experiment version, and will be overwrite by next experiment. This is not acceptable. 

### Pitch

I'm expecting the result hierarchy like this: 
```
<all_experiment_root>
└── experiment_name
    ├── version_0
    │   ├── config.yaml
    │   └── checkpoints
    │         └── epoch=7-step=3000.ckpt
    └── version_1
        ├── config.yaml
        └── checkpoints
        └── epoch=28-step=10875.ckpt
```

I tried to fix this problem and was willingly to submit a PR. Fixing this issue seems more complicated than imagined.



### Additional context

Because I'm new to Lightning, I'm not sure I fully understand Logger's design. From the current code, several attributes such as `root_dir`, `log_dir`, and `save_dir` are not clearly defined. For example, in the defination of `lightning.fabric.loggers.tensorboard.TensorBoardLogger` class, the experiment name is not included in the `root_dir` path, but for `lightning.pytorch.loggers.tensorboard.TensorBoardLogger`, it is included. 

The `root_dir` and `log_dir` properties can be `None` introduces unnecessary complexities. The documentation of `log_dir` says:

    Return the root directory where all versions of an experiment get saved, or `None` if the logger does not save data locally.

If I understand it correctly, the property `root_dir` and `log_dir` should never be `None` for all concrete loggers. This is because the `config.yaml` should always be saved to somewhere like `root_dir/name/version`. Even if a logger does not save `config.yaml` locally, we can still initialize `root_dir` to the current directory ('.'), but not really use it.

I have two questions:
- Why we need the `save_dir` property? Why always setting `root_dir` to be `<root>/name` and `log_dir` to be `<root>/name/version` not sufficient? 

- What makes `TensorBoardLogger` and `CSVLogger` special that result in the `Trainer.log_dir` to be `logger.log_dir` instead of `logger.save_dir`?

cc @lantiga @justusschock @borda @awaelchli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent usage of paths in the experimental loggers #20521

Outline & Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent usage of paths in the experimental loggers #20521

Description

Outline & Motivation

Pitch

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions