Skip to content

rewrite after field test#3

Open
bodokaiser wants to merge 1 commit intomainfrom
rewrite
Open

rewrite after field test#3
bodokaiser wants to merge 1 commit intomainfrom
rewrite

Conversation

@bodokaiser
Copy link
Member

@bodokaiser bodokaiser commented Sep 22, 2025

We intensively did some data evaluation over the recent weeks and gained a lot of insights, which we want to incorporate into pydatatom with the present PR.

The most important learning:

We want to be much more flexible in the evaluation procedure. The Pipeline concept has been shown to be too rigid. Instead, we want to have small, composable helper functions, which can be easily replaced. Same is true for the built-in plotting utils. Where we also found that it would be convenient to have a set of plotting utils ready to use but which still give us the freedom to change the plot, e.g., plot_xyz(fig, ax). We also want to move more of the evaluation process into the pandas domain. Finally, we need to speedup the histogram calculation by using multiprocessing/threading.

Todos

  • remove Pipeline
  • allow thresholds and atom positions to be easily overwritten or modifiable
  • add ClickableImage to interactively select points on an image
  • add bbox_grid to fit a (frequency) grid to a bounding box
  • add helper methods to calculate mean and (histogram) counts across a Dataset
  • simplify pydatatom.analysis.threshold.gaussian
  • speedup pydatatom.analysis.threshold.gaussian with multithreading
  • add helper to visualize detected atoms
  • add helper to show detected atom pairs
  • add helper to show cropped patches
  • add helper to visualize histograms
  • allow histogram parameter boundaries to be overwritten (possibly allow to overwrite percentile values)
  • add tests for bbox_grid for every combination of three points (there should be a bug when selecting the bottom left, top left and top right boundaries)
  • more elegant way to convert a Dataset to a pandas.Dataframe, e.g., pd.DataFrame(list(dataset)) (right now this is really slow)
  • remove cuda depdendency for pytorch (takes forever to install)
  • recipy for correct per-atom site statistics

Simon's feedback on pandas column names:

filename <class 'str'>
filenumber <class 'int'>
ID <class 'str'> # Unique ID for image file (usually from date, runletters ("AA" usw.), number)
r0 <class 'numpy.float64'> # (region zero total count)
background <class 'numpy.float64'> # background region total count
region_sizes <class 'list'> # sizes (number of pixels) for regions r0, r1, r2 (these are separate from the array regions, not always used)
background_size <class 'numpy.float64'> # size of background region (number of pixels)
array_sizes <class 'numpy.ndarray'> # sizes (numbers of pixels) of each array field
array_sum <class 'numpy.ndarray'> # gross ("brutto") value of each array field (sum of all pixel values)
array_net <class 'numpy.ndarray'> # net value of each array field (= gross value - background calculated from background region)
array_counts <class 'numpy.ndarray'> # number of atoms in each array field
array_single <class 'numpy.ndarray'> # single atom in array field? (1 or 0)
array_multi <class 'numpy.ndarray'> # more than one atom in array? field (1 or 0)
a_0 <class 'numpy.float64'> # gross value of array field 0 (relic from the time when atom number was small...)
a_net_0 <class 'numpy.float64'> # net value of array field 0 (relic from the time when atom number was small...)
[...]
a_net_78 <class 'numpy.float64'>
a_79 <class 'numpy.float64'>
a_net_79 <class 'numpy.float64'>

analysis_time <class 'numpy.float64'> # Time stamp of analysis
variables_index_start <class 'numpy.float64'> # column index of the first variable column in the pandas array
variables_index_end <class 'numpy.float64'> # column index of the firs column AFTER the variable columns int he pandas array
im_num <class 'numpy.float64'> # image number (from filename)
run <class 'numpy.float64'> # run number (from Artiq)
TC_systime_sec <class 'numpy.float64'> ...
f_blue_big_sideband <class 'numpy.float64'>

[... lots more variables coming from artiq...]

b_do_rydberg_section <class 'numpy.float64'>
cam__nImagesToCapture <class 'numpy.float64'>
cam__nCapturedImages <class 'numpy.float64'>
cam__roi_offset_x <class 'numpy.float64'>
cam__roi_offset_y <class 'numpy.float64'>
cam__take_first_image <class 'bool'>
cam_cam_infos <class 'pandas.core.series.Series'>
cam_exposureRange <class 'pandas.core.series.Series'>
cam_gainRange <class 'pandas.core.series.Series'>
cam_gain <class 'numpy.float64'>
cam_id <class 'NoneType'>
cam_available_im_modes <class 'pandas.core.series.Series'>
cam_exposure <class 'numpy.float64'>
cam_imaging_mode <class 'str'>
cam_triggerMode <class 'numpy.float64'>
cam__gotImage <class 'bool'>
crop <class 'pandas.core.series.Series'> # crop coordinates of the camera file
frame_idx <class 'numpy.float64'> # frame index (there is one pandas row per frame)
total <class 'numpy.float64'> # Total (sum of all pixels from roi)
total_net <class 'numpy.float64'> # net value (total minus calculated background)
array_total <class 'numpy.float64'> # Sum of all "array_sum" values
array_total_net <class 'numpy.float64'> # sum of all "array_net" values
array_total_counts <class 'numpy.float64'> # sum of all "array_count" values
array_total_singles <class 'numpy.float64'> # sum of all "array_singles" values
array_total_multis <class 'numpy.float64'> # sum of all "array_multi" values
r0_net <class 'numpy.float64'> # net value of region zero count
total_diff <class 'numpy.int64'> # obsolete
total_bin <class 'numpy.int64'>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant