AKOYA pipeline is supposed to automate spatial proteomics low-level data processing. It uses only multichannel image as input file to perform image preprocessing, segmentation and intensity extraction to define presence/absence of marker genes for each segmented cell. It is build moslty using spatialproteomics_cellgeni package which was edited to work with large images (original package - spatialproteomics As output pipline can produce (i) original spatialproteomics file in zarr xarray format; (ii) set of csv files with analysis tables (like cell position, average intensity per cell and binary marker presence tables); (iii) AnnData file, where X matrix is average intensity of each marker gene per cell. You can find examples of output files and how to open them in notebook open_output_files.ipynb
Create the conda environment from yaml file and activate:
conda env create -f environment.yml
conda activate sp_env
Install spatialproteomics package from cellgeni fork:
python -m pip install "git+https://github.com/cellgeni/spatialproteomics.git"
All parameters used in pipeline together with input/output paths should be specified in one configuration file (see as example conf_AKOYA.yaml). Below is description of all par-s for configuration file are described below
image_path (str) - Path to the multiplex image file (e.g. .tif / .qptiff) to be processed
'list_of_channels (list[str])' - Channel names in the same order as the channel axis of the loaded image
channel_for_segmentation (str) - Which channel to use for StarDist segmentation. Typically the DAPI channel name
list_of_markers (list[str]) - Marker channels used to compute % positive pixels and generate binary labels / cell-type-like labels via thresholding.
crop_x (list[int, int]) - X-range to crop as [x_start, x_end].
crop_y (list[int, int]) - Y-range to crop as [y_start, y_end].
segmentation_label_expansion (int or falsy) - If truthy, expands segmentation labels by this radius (pixels) using expand_segmentation
min_area (int or falsy) - If truthy, filters out segmented objects with area <= min_area.
max_area (int or falsy) - If truthy, filters out segmented objects with area >= max_area.
save_intermediate_plots (bool) - Whether to save intermediate QC plots and ROI snapshots.
number_intermediate_plots (int) - Number of random ROIs (subimages) to sample for intermediate plotting.
size_intermediate_plots (int) - ROI size in pixels (square). Each ROI is size_intermediate_plots × size_intermediate_plots.
list_of_genes_intermediate_plots (list[str]) - Channel names to display in intermediate ROI plots (alongside DAPI).
save_individual_marker_presence_plots (bool) - Whether to plot marker-vs-celltype (binary label) maps.
fraction_of_positive_pixels (float) - Threshold applied to the _percentage_positive layer for each marker. Applied the same percentage for all channels
output_dir (str) - Path to directory where results and plots are written.
list_output_formats (list[str]) - Which outputs to save. Supported values in this script: ["zarr", "h5ad", "csv"]
save_intermediate_zarr (bool) - If True, saves intermediate sp_object snapshots to output_dir/sp_object.zarr after key steps.
normalise_intensity (bool) - If True, performs z-score normalization per channel across cells and stores:
pixelsize (float or null, optional) - Microns per full-resolution pixel. If provided, it is used only to create anndata h5ad object
The pipeline depending on image size requires signigicant amount of memory, so it is recommended for full-tissue crop (with ~(20k x 20k) pixels image and 60 channels) to use 200 Gb of memory or more. Example of submission code can be found in submit_AKOYA_pipeline.sh. Then one can submit a job simply as:
bsub < submit_AKOYA_pipeline.sh
To run separetely steps from the pipeline (such as image preprocessing, segmentation or intensity extraction) please use as an example notebook AKOYA_analysis_steps.ipynb. Please note, that there we use only some of all available from spatialproteomics, if you find to find out more about other options of image preprocessing, segmentation, plottig and celltyping please visit spatialproteomics documentation