Skip to content

Using compartments to input CSVs with custom names #362

@GaiaGerbaka

Description

@GaiaGerbaka

I'm using CellProfiler to create Nuclei, Cells and Cytoplasm profiles, but I created each of these three CSVs using different CellProfiler methods. For example I have CSVs names Nuclei_Cellpose3, Cells_Cellpose3 and Cytoplasm_Cellpose3. I was able to run the cytotable convert() method for the Nuclei.csv, Cells.csv and Cytoplasm.csv and it outputted the expected parquet, I used this code snippet

convert(
    source_path=source_path,
    source_datatype="csv",
    dest_path="cytotable",
    dest_datatype="parquet",
    concat=True,
    preset="cellprofiler_csv",
    no_sign_request=True,
)

In order to do the same for Nuclei_Cellpose3.csv, Cells_Cellpose3.csv and Cytoplasm_Cellpose3.csv, I wrote the following snippet

convert(
    source_path=source_path,
    source_datatype="csv",
    dest_path="cytotable_cellpose",
    dest_datatype="parquet",
    compartments=["Nuclei_Cellpose3", "Cells_Cellpose3", "Cytoplasm_Cellpose3"],
    join=True,
    joins="ImageNumber,ObjectNumber",
    page_keys={
        'join': 'ImageNumber', 
        'Cells_Cellpose3': 'ObjectNumber', 
        'Nuclei_Cellpose3': 'ObjectNumber', 
        'Cytoplasm_Cellpose3': 'ObjectNumber'
    },
    preset=None
)

but I get this error

CytoTableException: No matching key found in page_keys for source_group_name: all_files.csv. Please include a pagination key based on a column name from the table.

Then I tried adding a key for all_files.csv by modifying the code above, and it threw an SQL error, so I changed the join value as follows":

convert(
    source_path=source_path,
    source_datatype="csv",
    dest_path="cytotable_cellpose",
    dest_datatype="parquet",
    compartments=["Nuclei_Cellpose3", "Cells_Cellpose3","Cytoplasm_Cellpose3"],
    join=True,
    joins="ImageNumber,ObjectNumber",
    page_keys={
        'image': 'ImageNumber', 
        'Cells_Cellpose3': 'ObjectNumber', 
        'Nuclei_Cellpose3': 'ObjectNumber', 
        'Cytoplasm_Cellpose3': 'ObjectNumber', 
        'join': 'Cytoplasm_Number_Object_Number',
        'all_files.csv': 'ImageNumber'
        },
    chunk_size = 10000,
    sort_output=True,
    preset=None
)

This time the code run without error but never finishes and does not create any file (tried running for 1hr, while default took 3m)

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions