You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 05-create-profiles.md
+21-13Lines changed: 21 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,6 +73,8 @@ cd pycytominer
73
73
python3 -m pip install -e .
74
74
```
75
75
76
+
Note that if your system does not already have `cytominer-database` installed, you can install it at the same time as pycytominer by changing the final command above to `python3 -m pip install -e .[collate]`
77
+
76
78
The command below first calls `cytominer-database ingest` to create the SQLite backend, and then pycytominer's `aggregate_profiles` to create per-well profiles. Once complete, all files are uploaded to S3 and the local cache are deleted. This step takes several hours, but metadata creation and GitHub setup can be done in this time.
77
79
78
80
[collate.py](https://github.com/cytomining/pycytominer/blob/master/pycytominer/cyto_utils/collate.py) ingests and indexes the database. [collate_command.py](https://github.com/cytomining/pycytominer/blob/master/pycytominer/cyto_utils/collate_cmd.py) exposes this functionality to the command line.
`collate_cmd.py` does not recreate the SQLite backend if it already exists in the local cache. Add `--overwrite` flag to recreate.
101
+
If your SQLite creation succeeded but you ran into issues during aggregation, rerunning with `--aggregate-only` will allow you to rerun just that sub-step.
95
102
```
96
103
97
104
```{note}
98
-
`collate.py` does not recreate the SQLite backend if it already exists in the local cache. Add `--overwrite` flag to recreate.
105
+
`collate_cmd` assumes that you will have image measurements in the following categories - Count, Threshold (both generated by any Identify objects present in the module); Granularity, Threshold (both generated if in these modules you use the "both" setting when asked for measurements of images, objects, or both); and ImageQuality (generated by the MeasureImageQuality measurement). If any or all of these are missing in your data, or you wish to add other image measurements, you may pass in an `image-feature-categories` flag to `collate_cmd`: e.g. `--image-feature-categories="Granularity,Texture,Count,Threshold"` . We currently believe these features provide value, but you can also skip adding them to profiles by passing to `collate_cmd` the flag `--dont-add-image-features`.
99
106
```
100
107
101
108
```{note}
102
-
or pipelines that use FlagImage to skip the measurements modules if the image failed QC, the failed images will have Image.csv files with fewer columns that the rest (because columns corresponding to aggregated measurements will be absent). The ingest command will show a warning related to sqlite: `expected X columns but found Y - filling the rest with NULL`. This is expected behavior.
109
+
In pipelines that use FlagImage to skip the measurements modules if the image failed QC, the failed images will have Image.csv files with fewer columns that the rest (because columns corresponding to aggregated measurements will be absent). The ingest command will show a warning related to sqlite: `expected X columns but found Y - filling the rest with NULL`. This is expected behavior.
103
110
```
104
111
105
112
```{note}
@@ -180,8 +187,6 @@ Once and only once - fork the [profiling recipe](https://github.com/cytomining/p
180
187
181
188
Once per new PROJECT, not new batch - make a copy of the [template repository](https://github.com/cytomining/profiling-template) into your preferred organization with a project name that is similar OR identical to its project tag on S3 and elsewhere.
182
189
183
-
Once per new PROJECT, not new batch - make a copy of the [template repository](https://github.com/cytomining/profiling-template) into your preferred organization with a project name that is similar OR identical to its project tag on S3 and elsewhere.
184
-
185
190
## Make Profiles
186
191
187
192
### Optional - set up compute environment
@@ -285,7 +290,7 @@ This needs to happen once per project, not per batch.
285
290
Skip this step if not using DVC.
286
291
```
287
292
# Navigate
288
-
cd ~/work/projects/${PROJECT_NAME}/workspace/software/${DATA}/profiling-recipe
293
+
cd ~/work/projects/${PROJECT_NAME}/workspace/software/${DATA}
### If a first batch in this project, create the necessary directories
303
+
```{note}
304
+
If you have multiple AWS profiles on your machine and do not want to use the default one for DVC, you can specify which profile to use by running `dvc remote modify S3storage profile PROFILE_NAME` at any point between adding the remote and performing the final DVC push.
300
305
```
301
306
302
307
### If a first batch in this project, create the necessary directories
If not using DVC but using a data repository, push all new files to GitHub as follows
368
+
369
+
370
+
```{note}
371
+
If you have multiple AWS profiles on your machine and do not want to use the default one for DVC, you can specify which profile to use by running `dvc remote modify S3storage profile PROFILE_NAME` at any point between adding the remote and performing the final DVC push.
364
372
```
365
373
366
374
If not using DVC but using a data repository, push all new files to GitHub as follows
0 commit comments