Skip to content

Commit 8779141

Browse files
authored
Merge branch 'master' into pycytominer/issues/160
2 parents 9a82616 + f485f11 commit 8779141

File tree

2 files changed

+27
-20
lines changed

2 files changed

+27
-20
lines changed

05-create-profiles.md

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,8 @@ cd pycytominer
7373
python3 -m pip install -e .
7474
```
7575

76+
Note that if your system does not already have `cytominer-database` installed, you can install it at the same time as pycytominer by changing the final command above to `python3 -m pip install -e .[collate]`
77+
7678
The command below first calls `cytominer-database ingest` to create the SQLite backend, and then pycytominer's `aggregate_profiles` to create per-well profiles. Once complete, all files are uploaded to S3 and the local cache are deleted. This step takes several hours, but metadata creation and GitHub setup can be done in this time.
7779

7880
[collate.py](https://github.com/cytomining/pycytominer/blob/master/pycytominer/cyto_utils/collate.py) ingests and indexes the database. [collate_command.py](https://github.com/cytomining/pycytominer/blob/master/pycytominer/cyto_utils/collate_cmd.py) exposes this functionality to the command line.
@@ -89,17 +91,22 @@ parallel \
8991
--results ../../log/${BATCH_ID}/collate \
9092
--files \
9193
--keep-order \
92-
python3 pycytominer/cyto_utils/collate_cmd.py ${BATCH_ID} pycytominer/cyto_utils/ingest_config.ini {1} \
93-
--temp ~/ebs_tmp \
94-
--remote=s3://${BUCKET}/projects/${PROJECT_NAME}/workspace :::: ${PLATES}
94+
python3 pycytominer/cyto_utils/collate_cmd.py ${BATCH_ID} pycytominer/cyto_utils/database_config/ingest_config.ini {1} \
95+
--tmp-dir ~/ebs_tmp \
96+
--aws-remote=s3://${BUCKET}/projects/${PROJECT_NAME}/workspace :::: ${PLATES}
97+
```
98+
99+
```{note}
100+
`collate_cmd.py` does not recreate the SQLite backend if it already exists in the local cache. Add `--overwrite` flag to recreate.
101+
If your SQLite creation succeeded but you ran into issues during aggregation, rerunning with `--aggregate-only` will allow you to rerun just that sub-step.
95102
```
96103

97104
```{note}
98-
`collate.py` does not recreate the SQLite backend if it already exists in the local cache. Add `--overwrite` flag to recreate.
105+
`collate_cmd` assumes that you will have image measurements in the following categories - Count, Threshold (both generated by any Identify objects present in the module); Granularity, Threshold (both generated if in these modules you use the "both" setting when asked for measurements of images, objects, or both); and ImageQuality (generated by the MeasureImageQuality measurement). If any or all of these are missing in your data, or you wish to add other image measurements, you may pass in an `image-feature-categories` flag to `collate_cmd`: e.g. `--image-feature-categories="Granularity,Texture,Count,Threshold"` . We currently believe these features provide value, but you can also skip adding them to profiles by passing to `collate_cmd` the flag `--dont-add-image-features`.
99106
```
100107

101108
```{note}
102-
or pipelines that use FlagImage to skip the measurements modules if the image failed QC, the failed images will have Image.csv files with fewer columns that the rest (because columns corresponding to aggregated measurements will be absent). The ingest command will show a warning related to sqlite: `expected X columns but found Y - filling the rest with NULL`. This is expected behavior.
109+
In pipelines that use FlagImage to skip the measurements modules if the image failed QC, the failed images will have Image.csv files with fewer columns that the rest (because columns corresponding to aggregated measurements will be absent). The ingest command will show a warning related to sqlite: `expected X columns but found Y - filling the rest with NULL`. This is expected behavior.
103110
```
104111

105112
```{note}
@@ -180,8 +187,6 @@ Once and only once - fork the [profiling recipe](https://github.com/cytomining/p
180187

181188
Once per new PROJECT, not new batch - make a copy of the [template repository](https://github.com/cytomining/profiling-template) into your preferred organization with a project name that is similar OR identical to its project tag on S3 and elsewhere.
182189

183-
Once per new PROJECT, not new batch - make a copy of the [template repository](https://github.com/cytomining/profiling-template) into your preferred organization with a project name that is similar OR identical to its project tag on S3 and elsewhere.
184-
185190
## Make Profiles
186191

187192
### Optional - set up compute environment
@@ -285,7 +290,7 @@ This needs to happen once per project, not per batch.
285290
Skip this step if not using DVC.
286291
```
287292
# Navigate
288-
cd ~/work/projects/${PROJECT_NAME}/workspace/software/${DATA}/profiling-recipe
293+
cd ~/work/projects/${PROJECT_NAME}/workspace/software/${DATA}
289294
# Initialize DVC
290295
dvc init
291296
# Set up remote storage
@@ -295,8 +300,8 @@ git add .dvc/.gitignore .dvc/config
295300
git commit -m "Setup DVC"
296301
```
297302

298-
299-
### If a first batch in this project, create the necessary directories
303+
```{note}
304+
If you have multiple AWS profiles on your machine and do not want to use the default one for DVC, you can specify which profile to use by running `dvc remote modify S3storage profile PROFILE_NAME` at any point between adding the remote and performing the final DVC push.
300305
```
301306

302307
### If a first batch in this project, create the necessary directories
@@ -346,21 +351,24 @@ rsync -arzv --include="*/" --include="*.gz" --exclude "*" ../../backend/${BATCH_
346351
Especially for large number of plates, this will take some time. Output will be logged to the console as different steps proceed.
347352

348353
```
349-
python profiling-recipe/profiles/profiling_pipeline.py --config config_files/{$CONFIG_FILE}.yml
354+
python profiling-recipe/profiles/profiling_pipeline.py --config config_files/${CONFIG_FILE}.yml
350355
```
351356

352357
### Push resulting files back up to GitHub
353358
If using a data repository, push the newly created profiles to DVC and the .dvc files and other files to GitHub as follows
354359
```
355360
dvc add profiles/${BATCH} --recursive
356361
dvc push
357-
git add profiles/${BATCH}/*.dvc profiles/*.gitignore
362+
git add profiles/${BATCH}/*/*.dvc profiles/${BATCH}/*/*.gitignore
358363
git commit -m 'add profiles'
359364
git add *
360365
git commit -m 'add files made in profiling'
361366
git push
362367
```
363-
If not using DVC but using a data repository, push all new files to GitHub as follows
368+
369+
370+
```{note}
371+
If you have multiple AWS profiles on your machine and do not want to use the default one for DVC, you can specify which profile to use by running `dvc remote modify S3storage profile PROFILE_NAME` at any point between adding the remote and performing the final DVC push.
364372
```
365373

366374
If not using DVC but using a data repository, push all new files to GitHub as follows

06-appendix.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ this handbook
1717
image data
1818
- The `illum` folder is identical to the `images` folder in terms
1919
of structure
20-
- `illum` is an output of the first stage of cell profiler
20+
- `illum` is an output of the first stage of CellProfiler
2121
pipeline that stores a function to adjust the plates in
2222
`images`
2323
- `workspace` also has subdirectories
@@ -87,12 +87,11 @@ this handbook
8787
├── 2016_04_01_a549_48hr_batch1
8888
│ ├── illum
8989
│ │ └── SQ00015167
90-
│ │ ├── SQ00015167_IllumAGP.mat
91-
│ │ ├── SQ00015167_IllumDNA.mat
92-
│ │ ├── SQ00015167_IllumER.mat
93-
│ │ ├── SQ00015167_IllumMito.mat
94-
│ │ ├── SQ00015167_IllumRNA.mat
95-
│ │ └── SQ00015167.stderr
90+
│ │ ├── SQ00015167_IllumAGP.npy
91+
│ │ ├── SQ00015167_IllumDNA.npy
92+
│ │ ├── SQ00015167_IllumER.npy
93+
│ │ ├── SQ00015167_IllumMito.npy
94+
│ │ └── SQ00015167_IllumRNA.npy
9695
│ └── images
9796
│ └── SQ00015167__2016-04-21T03_34_00-Measurement1
9897
│ ├── Assaylayout

0 commit comments

Comments
 (0)