Skip to content

FeatureRequest: Automated identification and removal of non-biologically variant features #452

@hwarden162

Description

@hwarden162

Feature type

  • Add new functionality

  • Change existing functionality

General description of the proposed functionality

There are features measured during morphological profiling that are dependent on the positioning or rotation of the microscope. Simple examples of this are centroids and orientation measurements. Other examples would include measurements on bounding boxes, the image below shows how the bounding box area of a cell changes under rotation of the microscope.

boundingbox

Taking CellProfiler as an example, there are multiples of these measurements. When used for machine learning or statistical analysis they introduce technical noise and can contribute to batch effect and data leakage.

Feature example

I have a trial solution of this that requires the user to specify what software was used to generate their measurements and then iterates over feature names matching the patterns of variant features that have been identified manually. My solution extends feature_select like this

from pycytominer import feature_select

non_variant = feature_select(normalized_df, operation="drop_non_bio_variant", drop_non_bio_variant_data_source="cellprofiler")

Alternative Solutions

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions