Skip to content

Commit 6d501e5

Browse files
authored
Updating dump_env_file.py and README.md (#14)
This PR: - adds a `--secure_pyarrow` command line option to `pyrosettacluster/dump_env_file.py` to support `PyRosettaCluster` scorefiles written using `pandas` versions `>=3.0.0`. - updates documentation strings and a `README.md` file
1 parent 2f95183 commit 6d501e5

File tree

3 files changed

+35
-15
lines changed

3 files changed

+35
-15
lines changed

pyrosettacluster/README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -130,17 +130,17 @@ Please refer to the following table to select _one_ environment file extraction
130130
| --- | --- | --- | --- |
131131
| `.pdb` | Decoy | Read file → Copy → Paste into new file | Run `dump_env_file.py` helper |
132132
| `.pdb.bz2` | Decoy | Unzip with `bzip2` → Read file → Copy → Paste into new file | Run `dump_env_file.py` helper |
133-
| `.pkl_pose`, `.pkl_pose.bz2`, `.b64_pose`, `.b64_pose.bz2` | Decoy | | Run `dump_env_file.py` helper |
133+
| `.pkl_pose`, `.pkl_pose.bz2`, `.b64_pose`, `.b64_pose.bz2` | Decoy | | Run `dump_env_file.py` helper _(requires an identical PyRosetta build signature to that used to save the original file)_ |
134134
| `.json` | Full-record scorefile | Read file → Copy → Paste into new file | Run `dump_env_file.py` helper |
135135
| Pickled `pandas.DataFrame`<br>(`.gz`, `.xz`, `.tar`, etc.) | Full-record scorefile | | Run `dump_env_file.py` helper |
136-
| `.init`, `.init.bz2` | PyRosetta initialization file | | Run `dump_env_file.py` helper |
136+
| `.init`, `.init.bz2` | PyRosetta initialization file | | Run `dump_env_file.py` helper _(requires an identical PyRosetta build signature to that used to save the original file)_ |
137137
138138
> [!NOTE]
139139
> **Extraction method #1:** If copy/pasting into a new file, the environment file string is located in the `record["instance"]["environment"]` nested key value of the PyRosettaCluster full record. Please paste it into one of the following file names (as expected in the next step) in a new folder, depending on the environment manager you're using to recreate the environment:
140140
> | Environment manager | New file name |
141141
> | --- | --- |
142142
> | `pixi` | `pixi.lock` |
143-
> | `uv` | `requirements.txt` |
143+
> | `uv` | `uv.lock` |
144144
> | `conda` | `environment.yml` |
145145
> | `mamba` | `environment.yml` |
146146
>
@@ -153,7 +153,7 @@ Please refer to the following table to select _one_ environment file extraction
153153
> Also note the `record["instance"]["sha1"]` nested key value holding the GitHub commit SHA1 required to [reproduce the PyRosettaCluster simulation](#clone-original-repository)!
154154
155155
> [!NOTE]
156-
> **Extraction method #2:** If running `dump_env_file.py`, the `pyrosetta` package (with version `>=2025.47`) and the [PyPI pyrosetta-distributed](https://pypi.org/project/pyrosetta-distributed/) package (for the `pyrosetta.distributed` framework dependencies) must be installed in any existing virtual environment, and that virtual environment's python interpreter used to run the script.
156+
> **Extraction method #2:** If running `dump_env_file.py`, the `pyrosetta` package (with version `>=2025.47`) and the [PyPI pyrosetta-distributed](https://pypi.org/project/pyrosetta-distributed/) package (for the `pyrosetta.distributed` framework dependencies) must be installed in any existing virtual environment, and that virtual environment's python interpreter used to run the script. If extracting from a `.pkl_pose`, `.pkl_pose.bz2`, `.b64_pose`, `.b64_pose.bz2`, `.init` or `.init.bz2` file, the PyRosetta build signature _must be identical_ to that used to save the original decoy file or initialization file, otherwise an exception or segmentation fault may occur.
157157
>
158158
> Also note the printed GitHub commit SHA1 required to [reproduce the PyRosettaCluster simulation](#clone-original-repository)!
159159
@@ -168,7 +168,7 @@ Run `python recreate_env.py` to recreate the virtual environment.
168168
> This script runs a subprocess with one of the following commands:<br>
169169
> - `conda env create ...`: when using the `conda` environment manager<br>
170170
> - `mamba env create ...`: when using the `mamba` environment manager<br>
171-
> - `uv pip sync ...`: when using the `uv` environment manager<br>
171+
> - `uv sync ...`: when using the `uv` environment manager<br>
172172
> - `pixi install ...`: when using the `pixi` environment manager<br>
173173
> Installing certain packages may not be secure, so please only run with an input environment file you trust!<br>
174174
> Learn more about [PyPI security](https://pypi.org/security) and [conda security](https://www.anaconda.com/docs/reference/security).
@@ -240,3 +240,4 @@ if __name__ == "__main__":
240240
241241
242242
243+

pyrosettacluster/dump_env_file.py

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ def main(
4040
env_dir: Optional[str],
4141
pyrosetta_init_flags: Optional[str],
4242
pandas_secure: bool,
43+
pyarrow_secure: bool,
4344
) -> None:
4445
"""
4546
Dump the PyRosettaCluster environment file(s) based on metadata from a PyRosettaCluster result.
@@ -49,16 +50,20 @@ def main(
4950
and input_file.endswith((".pkl_pose", ".pkl_pose.bz2", ".b64_pose", ".b64_pose.bz2"))
5051
):
5152
if pyrosetta_init_flags:
52-
pyrosetta.distributed.init(pyrosetta_init_flags)
53+
pyrosetta.init(options="", extra_options=pyrosetta_init_flags, silent=True)
5354
else:
54-
pyrosetta.distributed.init()
55+
pyrosetta.init(options="", extra_options="-run:constant_seed 1 -out:level 200", silent=True)
5556

5657
if (
5758
isinstance(scorefile, str)
5859
and not scorefile.endswith(".json")
5960
):
6061
if pandas_secure:
6162
pyrosetta.secure_unpickle.add_secure_package("pandas")
63+
print(f"[INFO] Added 'pandas' as a secure package to the PyRosetta unpickle-allowed list.")
64+
if pyarrow_secure:
65+
pyrosetta.secure_unpickle.add_secure_package("pyarrow")
66+
print(f"[INFO] Added 'pyarrow' as a secure package to the PyRosetta unpickle-allowed list.")
6267
else:
6368
raise RuntimeError(
6469
"Please also pass the `--pandas_secure` flag to retrieve data from a `pandas.DataFrame`. "
@@ -188,7 +193,10 @@ def ensure_env_dir(path: Optional[str]) -> str:
188193
"Path to a PyRosettaCluster output decoy file (a '.pdb', '.pdb.bz2', '.pkl_pose', "
189194
"'.pkl_pose.bz2', '.b64_pose', or '.b64_pose.bz2' file) or an output PyRosetta "
190195
"initialization file ('.init' or '.init.bz2' file). If provided, the `--scorefile` "
191-
"and `--decoy_name` flags are ignored."
196+
"and `--decoy_name` flags are ignored. If using a '.pkl_pose', '.pkl_pose.bz2', "
197+
"'.b64_pose', '.b64_pose.bz2', '.init' or '.init.bz2' file, please ensure that "
198+
"the current PyRosetta version is identical to that used to save the decoy file "
199+
"or initialization file."
192200
),
193201
)
194202

@@ -262,15 +270,26 @@ def ensure_env_dir(path: Optional[str]) -> str:
262270
"comes from a trusted source."
263271
),
264272
)
273+
parser.add_argument(
274+
"--pyarrow_secure",
275+
action="store_true",
276+
default=False,
277+
help=(
278+
"Allow loading a pickled `pandas.DataFrame` scorefile with PyArrow-backed datatypes, "
279+
"which may be required if the scorefile was saved with `pandas` version `>=3.0.0`. "
280+
"This flag only has an effect if the `--pandas_secure` flag is also passed."
281+
),
282+
)
265283

266284
args = parser.parse_args()
267285
args.env_dir = ensure_env_dir(args.env_dir)
268286

269287
main(
270-
input_file=args.input_file,
271-
scorefile=args.scorefile,
272-
decoy_name=args.decoy_name,
273-
env_dir=args.env_dir,
274-
pyrosetta_init_flags=args.pyrosetta_init_flags,
275-
pandas_secure=args.pandas_secure,
288+
args.input_file,
289+
args.scorefile,
290+
args.decoy_name,
291+
args.env_dir,
292+
args.pyrosetta_init_flags,
293+
args.pandas_secure,
294+
args.pyarrow_secure,
276295
)

pyrosettacluster/recreate_env.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
*Warning*: This script runs a subprocess with one of the following commands:
44
- `conda env create ...`: when 'conda' is an executable
55
- `mamba env create ...`: when 'mamba' is an executable
6-
- `uv pip sync ...`: when 'uv' is an executable
6+
- `uv sync ...`: when 'uv' is an executable
77
- `pixi install ...`: when 'pixi' is an executable
88
Installing certain packages may not be secure, so please only run with input files you trust.
99
Learn more about PyPI security at <https://pypi.org/security> and conda security

0 commit comments

Comments
 (0)