Skip to content

Commit b8bcfab

Browse files
authored
Merge pull request #835 from amas0/runset-cleanup
Standardizes file naming conventions across the range of possible output files.
2 parents 09f39de + 80a5fed commit b8bcfab

File tree

15 files changed

+230
-118
lines changed

15 files changed

+230
-118
lines changed

cmdstanpy/model.py

Lines changed: 35 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -288,12 +288,10 @@ def optimize(
288288
or to a temporary directory which is deleted upon session exit.
289289
290290
Output files are either written to a temporary directory or to the
291-
specified output directory. Output filenames correspond to the template
292-
'<model_name>-<YYYYMMDDHHMM>-<chain_id>' plus the file suffix which is
293-
either '.csv' for the CmdStan output or '.txt' for
294-
the console messages, e.g. 'bernoulli-201912081451-1.csv'.
295-
Output files written to the temporary directory contain an additional
296-
8-character random string, e.g. 'bernoulli-201912081451-1-5nm6as7u.csv'.
291+
specified output directory. Optimize output filenames correspond to
292+
the template '<model_name>-<YYYYMMDDHHMM>' plus the file suffix which is
293+
either '.csv' for the CmdStan output or '_stdout.txt' for
294+
the console messages, e.g. 'bernoulli-20251107142835.csv'.
297295
298296
:param data: Values for all data variables in the model, specified
299297
either as a dictionary with entries matching the data variables,
@@ -328,7 +326,7 @@ def optimize(
328326
329327
:param save_profile: Whether or not to profile auto-diff operations in
330328
labelled blocks of code. If ``True``, CSV outputs are written to
331-
file '<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>'.
329+
file '<model_name>-<YYYYMMDDHHMM>_profile.csv'.
332330
Introduced in CmdStan-2.26.
333331
334332
:param algorithm: Algorithm to use. One of: 'BFGS', 'LBFGS', 'Newton'
@@ -497,11 +495,15 @@ def sample(
497495
498496
Output files are either written to a temporary directory or to the
499497
specified output directory. Ouput filenames correspond to the template
500-
'<model_name>-<YYYYMMDDHHMM>-<chain_id>' plus the file suffix which is
501-
either '.csv' for the CmdStan output or '.txt' for
502-
the console messages, e.g. 'bernoulli-201912081451-1.csv'.
503-
Output files written to the temporary directory contain an additional
504-
8-character random string, e.g. 'bernoulli-201912081451-1-5nm6as7u.csv'.
498+
'<model_name>-<YYYYMMDDHHMM>' plus additional bits to identify which
499+
output file it corresponds to. CmdStan output will suffix with
500+
'_<chain_id>.csv' if there is more than one chain, and simply'.csv'
501+
in the single-chain case. For example, 'bernoulli-20251107144515_1.csv'.
502+
Console message output is written to a text file suffixed
503+
`_stdout_<chain_id>.txt` if each chain executes in a separate process
504+
(default behavior) or simply `_stdout.txt` if done so in a single
505+
process, such as when STAN_THREADS is enabled and you are sampling
506+
more than one chain.
505507
506508
:param data: Values for all data variables in the model, specified
507509
either as a dictionary with entries matching the data variables,
@@ -634,14 +636,17 @@ def sample(
634636
:param save_latent_dynamics: Whether or not to output the position and
635637
momentum information for the model parameters (unconstrained).
636638
If ``True``, CSV outputs are written to an output file
637-
'<model_name>-<YYYYMMDDHHMM>-diagnostic-<chain_id>',
638-
e.g. 'bernoulli-201912081451-diagnostic-1.csv', see
639+
'<model_name>-<YYYYMMDDHHMM>_diagnostic_<chain_id>',
640+
e.g. 'bernoulli-201912081451_diagnostic_1.csv', see
639641
https://mc-stan.org/docs/cmdstan-guide/stan_csv.html,
640642
section "Diagnostic CSV output file" for details.
641643
642644
:param save_profile: Whether or not to profile auto-diff operations in
643645
labelled blocks of code. If ``True``, CSV outputs are written to
644-
file '<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>'.
646+
file '<model_name>-<YYYYMMDDHHMM>_profile_<chain_id>.csv' if each
647+
chain runs in its own process, otherwise
648+
'<model_name>-<YYYYMMDDHHMM>_profile.csv' if all chains run in a
649+
single process.
645650
Introduced in CmdStan-2.26, see
646651
https://mc-stan.org/docs/cmdstan-guide/stan_csv.html,
647652
section "Profiling CSV output file" for details.
@@ -955,12 +960,16 @@ def generate_quantities(
955960
or to a temporary directory which is deleted upon session exit.
956961
957962
Output files are either written to a temporary directory or to the
958-
specified output directory. Output filenames correspond to the template
959-
'<model_name>-<YYYYMMDDHHMM>-<chain_id>' plus the file suffix which is
960-
either '.csv' for the CmdStan output or '.txt' for
961-
the console messages, e.g. 'bernoulli-201912081451-1.csv'.
962-
Output files written to the temporary directory contain an additional
963-
8-character random string, e.g. 'bernoulli-201912081451-1-5nm6as7u.csv'.
963+
specified output directory. Ouput filenames correspond to the template
964+
'<model_name>-<YYYYMMDDHHMM>' plus additional bits to identify which
965+
output file it corresponds to. CmdStan output will suffix with
966+
'_<chain_id>.csv' if there is more than one chain, and simply'.csv'
967+
in the single-chain case. For example, 'bernoulli-20251107144515_1.csv'.
968+
Console message output is written to a text file suffixed
969+
`_stdout_<chain_id>.txt` if each chain executes in a separate process
970+
(default behavior) or simply `_stdout.txt` if done so in a single
971+
process, such as when STAN_THREADS is enabled and you are sampling
972+
more than one chain.
964973
965974
:param data: Values for all data variables in the model, specified
966975
either as a dictionary with entries matching the data variables,
@@ -1146,11 +1155,9 @@ def variational(
11461155
11471156
Output files are either written to a temporary directory or to the
11481157
specified output directory. Output filenames correspond to the template
1149-
'<model_name>-<YYYYMMDDHHMM>-<chain_id>' plus the file suffix which is
1150-
either '.csv' for the CmdStan output or '.txt' for
1151-
the console messages, e.g. 'bernoulli-201912081451-1.csv'.
1152-
Output files written to the temporary directory contain an additional
1153-
8-character random string, e.g. 'bernoulli-201912081451-1-5nm6as7u.csv'.
1158+
'<model_name>-<YYYYMMDDHHMM>' plus the file suffix which is
1159+
either '.csv' for the CmdStan output or '_stdout.txt' for
1160+
the console messages, e.g. 'bernoulli-201912081451.csv'.
11541161
11551162
:param data: Values for all data variables in the model, specified
11561163
either as a dictionary with entries matching the data variables,
@@ -1429,7 +1436,7 @@ def pathfinder(
14291436
14301437
:param save_profile: Whether or not to profile auto-diff operations in
14311438
labelled blocks of code. If ``True``, CSV outputs are written to
1432-
file '<model_name>-<YYYYMMDDHHMM>-profile-<path_id>'.
1439+
file '<model_name>-<YYYYMMDDHHMM>_profile.csv'.
14331440
Introduced in CmdStan-2.26, see
14341441
https://mc-stan.org/docs/cmdstan-guide/stan_csv.html,
14351442
section "Profiling CSV output file" for details.
@@ -1659,7 +1666,7 @@ def laplace_sample(
16591666
16601667
:param save_profile: Whether or not to profile auto-diff operations in
16611668
labelled blocks of code. If ``True``, CSV outputs are written to
1662-
file '<model_name>-<YYYYMMDDHHMM>-profile-<path_id>'.
1669+
file '<model_name>-<YYYYMMDDHHMM>_profile.csv'.
16631670
Introduced in CmdStan-2.26, see
16641671
https://mc-stan.org/docs/cmdstan-guide/stan_csv.html,
16651672
section "Profiling CSV output file" for details.

cmdstanpy/stanfit/gq.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -725,10 +725,7 @@ def _previous_draws_pd(
725725

726726
def save_csvfiles(self, dir: str | None = None) -> None:
727727
"""
728-
Move output CSV files to specified directory. If files were
729-
written to the temporary session directory, clean filename.
730-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
731-
'bernoulli-201912081451-1.csv'.
728+
Move output CSV files to specified directory.
732729
733730
:param dir: directory path
734731

cmdstanpy/stanfit/laplace.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -309,10 +309,7 @@ def column_names(self) -> tuple[str, ...]:
309309

310310
def save_csvfiles(self, dir: str | None = None) -> None:
311311
"""
312-
Move output CSV files to specified directory. If files were
313-
written to the temporary session directory, clean filename.
314-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
315-
'bernoulli-201912081451-1.csv'.
312+
Move output CSV files to specified directory.
316313
317314
:param dir: directory path
318315

cmdstanpy/stanfit/mcmc.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -811,10 +811,7 @@ def method_variables(self) -> dict[str, np.ndarray]:
811811

812812
def save_csvfiles(self, dir: str | None = None) -> None:
813813
"""
814-
Move output CSV files to specified directory. If files were
815-
written to the temporary session directory, clean filename.
816-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
817-
'bernoulli-201912081451-1.csv'.
814+
Move output CSV files to specified directory.
818815
819816
:param dir: directory path
820817

cmdstanpy/stanfit/mle.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -295,10 +295,7 @@ def stan_variables(
295295

296296
def save_csvfiles(self, dir: str | None = None) -> None:
297297
"""
298-
Move output CSV files to specified directory. If files were
299-
written to the temporary session directory, clean filename.
300-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
301-
'bernoulli-201912081451-1.csv'.
298+
Move output CSV files to specified directory.
302299
303300
:param dir: directory path
304301

cmdstanpy/stanfit/pathfinder.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -216,10 +216,7 @@ def is_resampled(self) -> bool:
216216

217217
def save_csvfiles(self, dir: str | None = None) -> None:
218218
"""
219-
Move output CSV files to specified directory. If files were
220-
written to the temporary session directory, clean filename.
221-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
222-
'bernoulli-201912081451-1.csv'.
219+
Move output CSV files to specified directory.
223220
224221
:param dir: directory path
225222

cmdstanpy/stanfit/runset.py

Lines changed: 46 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -38,62 +38,59 @@ def __init__(
3838
self._args = args
3939
self._chains = chains
4040
self._one_process_per_chain = one_process_per_chain
41-
if one_process_per_chain:
42-
self._num_procs = chains
43-
else:
44-
self._num_procs = 1
41+
self._num_procs = chains if one_process_per_chain else 1
4542
self._retcodes = [-1 for _ in range(self._num_procs)]
4643
self._timeout_flags = [False for _ in range(self._num_procs)]
4744
if chain_ids is None:
4845
chain_ids = [i + 1 for i in range(chains)]
4946
self._chain_ids = chain_ids
5047

5148
if args.output_dir is not None:
52-
self._output_dir = args.output_dir
53-
else:
54-
# make a per-run subdirectory of our master temp directory
55-
self._output_dir = tempfile.mkdtemp(
56-
prefix=args.model_name, dir=_TMPDIR
57-
)
49+
self._outdir = args.output_dir
50+
else: # make a per-run subdirectory of our master temp directory
51+
self._outdir = tempfile.mkdtemp(prefix=args.model_name, dir=_TMPDIR)
5852

5953
# output files prefix: ``<model_name>-<YYYYMMDDHHMM>_<chain_id>``
6054
self._base_outfile = (
6155
f'{args.model_name}-{datetime.now().strftime(time_fmt)}'
6256
)
63-
# per-process outputs
64-
self._stdout_files = [''] * self._num_procs
65-
self._profile_files = [''] * self._num_procs # optional
66-
if one_process_per_chain:
67-
for i in range(chains):
68-
self._stdout_files[i] = self.file_path("-stdout.txt", id=i)
69-
if args.save_profile:
70-
self._profile_files[i] = self.file_path(
71-
".csv", extra="-profile", id=chain_ids[i]
72-
)
57+
self._stdout_files, self._profile_files = [], []
58+
self._csv_files, self._diagnostic_files = [], []
59+
60+
# per-process output files
61+
if one_process_per_chain and chains > 1:
62+
self._stdout_files = [
63+
self.gen_file_name(".txt", extra="stdout", id=id)
64+
for id in self._chain_ids
65+
]
66+
if args.save_profile:
67+
self._profile_files = [
68+
self.gen_file_name(".csv", extra="profile", id=id)
69+
for id in self._chain_ids
70+
]
7371
else:
74-
self._stdout_files[0] = self.file_path("-stdout.txt")
72+
self._stdout_files = [self.gen_file_name(".txt", extra="stdout")]
7573
if args.save_profile:
76-
self._profile_files[0] = self.file_path(
77-
".csv", extra="-profile"
78-
)
74+
self._profile_files = [
75+
self.gen_file_name(".csv", extra="profile")
76+
]
7977

8078
# per-chain output files
81-
self._csv_files: list[str] = [''] * chains
82-
self._diagnostic_files = [''] * chains # optional
83-
8479
if chains == 1:
85-
self._csv_files[0] = self.file_path(".csv")
80+
self._csv_files = [self.gen_file_name(".csv")]
8681
if args.save_latent_dynamics:
87-
self._diagnostic_files[0] = self.file_path(
88-
".csv", extra="-diagnostic"
89-
)
82+
self._diagnostic_files = [
83+
self.gen_file_name(".csv", extra="diagnostic")
84+
]
9085
else:
91-
for i in range(chains):
92-
self._csv_files[i] = self.file_path(".csv", id=chain_ids[i])
93-
if args.save_latent_dynamics:
94-
self._diagnostic_files[i] = self.file_path(
95-
".csv", extra="-diagnostic", id=chain_ids[i]
96-
)
86+
self._csv_files = [
87+
self.gen_file_name(".csv", id=id) for id in self._chain_ids
88+
]
89+
if args.save_latent_dynamics:
90+
self._diagnostic_files = [
91+
self.gen_file_name(".csv", extra="diagnostic", id=id)
92+
for id in self._chain_ids
93+
]
9794

9895
def __repr__(self) -> str:
9996
repr = 'RunSet: chains={}, chain_ids={}, num_processes={}'.format(
@@ -173,14 +170,14 @@ def cmd(self, idx: int) -> list[str]:
173170
else:
174171
return self._args.compose_command(
175172
idx,
176-
csv_file=self.file_path('.csv'),
173+
csv_file=self.gen_file_name('.csv'),
177174
diagnostic_file=(
178-
self.file_path(".csv", extra="-diagnostic")
175+
self.gen_file_name(".csv", extra="diagnostic")
179176
if self._args.save_latent_dynamics
180177
else None
181178
),
182179
profile_file=(
183-
self.file_path(".csv", extra="-profile")
180+
self.gen_file_name(".csv", extra="profile")
184181
if self._args.save_profile
185182
else None
186183
),
@@ -201,10 +198,7 @@ def stdout_files(self) -> list[str]:
201198

202199
def _check_retcodes(self) -> bool:
203200
"""Returns ``True`` when all chains have retcode 0."""
204-
for code in self._retcodes:
205-
if code != 0:
206-
return False
207-
return True
201+
return all(retcode == 0 for retcode in self._retcodes)
208202

209203
@property
210204
def diagnostic_files(self) -> list[str]:
@@ -216,16 +210,17 @@ def profile_files(self) -> list[str]:
216210
"""List of paths to CmdStan profiler files."""
217211
return self._profile_files
218212

219-
# pylint: disable=invalid-name
220-
def file_path(
213+
def gen_file_name(
221214
self, suffix: str, *, extra: str = "", id: int | None = None
222215
) -> str:
216+
"""Generate a standard file name according to CmdStan output pattern"""
217+
file = self._base_outfile
218+
if extra:
219+
file += f"_{extra}"
223220
if id is not None:
224-
suffix = f"_{id}{suffix}"
225-
file = os.path.join(
226-
self._output_dir, f"{self._base_outfile}{extra}{suffix}"
227-
)
228-
return file
221+
file += f"_{id}"
222+
file += suffix
223+
return os.path.join(self._outdir, file)
229224

230225
def _retcode(self, idx: int) -> int:
231226
"""Get retcode for process[idx]."""

cmdstanpy/stanfit/vb.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -249,10 +249,7 @@ def variational_sample_pd(self) -> pd.DataFrame:
249249

250250
def save_csvfiles(self, dir: str | None = None) -> None:
251251
"""
252-
Move output CSV files to specified directory. If files were
253-
written to the temporary session directory, clean filename.
254-
E.g., save 'bernoulli-201912081451-1-5nm6as7u.csv' as
255-
'bernoulli-201912081451-1.csv'.
252+
Move output CSV files to specified directory.
256253
257254
:param dir: directory path
258255

cmdstanpy_tutorial.ipynb

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@
9292
"CmdStanPy will use the following optional packages, if installed:\n",
9393
"\n",
9494
"* `xarray`, an n-dimension labeled dataset package which can be used for outputs\n",
95+
"* `polars`, a highly-optimized data manipulation library, which can speed up processing outputs of large Stan models\n",
9596
"\n",
9697
"To install CmdStanPy with all the optional packages:\n",
9798
"\n",
@@ -402,7 +403,7 @@
402403
"hash": "d31ce8e45781476cfd394e192e0962028add96ff436d4fd4e560a347d206b9cb"
403404
},
404405
"kernelspec": {
405-
"display_name": "Python 3",
406+
"display_name": "Python 3 (ipykernel)",
406407
"language": "python",
407408
"name": "python3"
408409
},
@@ -416,9 +417,9 @@
416417
"name": "python",
417418
"nbconvert_exporter": "python",
418419
"pygments_lexer": "ipython3",
419-
"version": "3.8.5"
420+
"version": "3.10.19"
420421
}
421422
},
422423
"nbformat": 4,
423-
"nbformat_minor": 2
424+
"nbformat_minor": 4
424425
}

docsrc/users-guide/outputs.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@ CSV File Outputs
88

99
Underlyingly, the CmdStan outputs are a set of per-chain
1010
`Stan CSV files <https://mc-stan.org/docs/cmdstan-guide/stan_csv_apdx.html#mcmc-sampler-csv-output>`__.
11-
The filenames follow the template '<model_name>-<YYYYMMDDHHMMSS>-<chain_id>'
12-
plus the file suffix '.csv'.
13-
CmdStanPy also captures the per-chain console and error messages.
11+
The filenames follow the template '<model_name>-<YYYYMMDDHHMMSS>_<chain_id>'
12+
plus the file suffix '.csv'. CmdStanPy also captures the per-chain console and
13+
error messages.
1414

1515
.. ipython:: python
1616

0 commit comments

Comments
 (0)