-
Notifications
You must be signed in to change notification settings - Fork 20
DLRawTriggerReader #151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DLRawTriggerReader #151
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
72afd7a to
52599e2
Compare
|
Hi @jbuces ! What is the status of this PR draft? Anything missing to convert to "ready for review", so @nietootein and me can have a look? |
|
Hi @TjarkMiener! The times for processing 50000 events with the tdscan trigger on the complete waveform (way higher for patches waveform) is of around 30 minutes. So I added to the code an option to read the tdscan mask from a precomputed h5 file. The problem of computing it on the fly would be for the training part, to visualise how TDSCAN works by creating batches with the dl1 it is more than fine. I was thinking in removing the tdscan from the PR, what do you thinnk @nietootein, @TjarkMiener? |
dl1_data_handler/image_mapper.py
Outdated
| f"HexagonalPatchMapper is only available for hexagonal pixel cameras. Pixel type of the selected camera is '{geometry.pix_type}'." | ||
| ) | ||
|
|
||
| path = files("dl1_data_handler.ressources").joinpath("newcam_sipm_patches.h5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file needs a versioning and a proper naming structure. I'm proposing
triggerpatches_AdvCam_v1.h5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you rename "UNKNOWN-7987PX" earlier to AdvCam, you could retrieve this file via f"{triggerpatches_{geometry.name}_v1}". Also the version should be a configuration. At this stage of the project we might want to experiment with different patches, so we need to be able to switch between different version without reinstalling the code.
dl1_data_handler/image_mapper.py
Outdated
|
|
||
| path = files("dl1_data_handler.ressources").joinpath("newcam_sipm_patches.h5") | ||
|
|
||
| if geometry.name == "UNKNOWN-7987PX": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line from above should only be executed once you the AdvCam selected, i.e. move it here.
dl1_data_handler/image_mapper.py
Outdated
| self.cam_neighbor_array = np.full((num_pixels, 7), -1, dtype=int) | ||
| for i, neighbors in enumerate(neighbor_lists): | ||
| self.cam_neighbor_array[i, :len(neighbors)] = neighbors | ||
| print("Computed neighbor array for: ", geometry.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a debug logger. Check out the logging system please. Also it would be good to have a debug log before the calculation started, i.e. "Computing neighbor array for {geometry.name} ...". This might help for a better error trace log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need also a test for the new hexagonal mapper. Please rebase to the main, there are tests for the different imagemapper.
dl1_data_handler/reader.py
Outdated
| quality_cuts = Bool( | ||
| default_value=True, | ||
| help=( | ||
| "Require quality cuts by ``TableQualityQuery``. False for nsb trigger." | ||
| ), | ||
| ).tag(config=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableQualityQuery should be empty if you do not want to pass any quality cuts. I know it is a quite convenient shortcut for you, but I would prefer to have properly handled by the configuration system. We should write example configs for the different use cases.
| # Register the destructor to close all open files properly | ||
| atexit.register(self.__destructor) | ||
| # Initialize the Table data quality query | ||
| self.quality_query = TableQualityQuery(parent=self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above
dl1_data_handler/reader.py
Outdated
| return true_image | ||
|
|
||
|
|
||
| def apply_low_trigger(x, mapper, trigger_type, l1_settings, tdscan_settings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give x a proper naming, r0_event or just event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @burmist-git & Tanguy, please review this function as it is the python implementation of the TDSCAN. Many thanks in advance!
dl1_data_handler/reader.py
Outdated
| true_image = np.abs(np.array(sim_event["true_image"], dtype=int).reshape(-1, 1)) | ||
|
|
||
| return true_image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simply return np.abs(np.array(sim_event["true_image"], dtype=int).reshape(-1, 1)) without the variable. Why np.abs is needed here?
dl1_data_handler/reader.py
Outdated
| @abstractmethod | ||
| def _get_raw_example(self, batch) -> Table: | ||
| pass | ||
|
|
||
| @abstractmethod | ||
| def _add_tdscan_table(self,batch) -> Table: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are they abstractmethod if they are only used in the trigger subclass? I think they should be removed here and elsewhere (except in the trigger subclass).
TjarkMiener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @jbuces for all the work! Overall it looks already good. I sent some comments. Also can you do a proper rebase to the main please?
dl1_data_handler/reader.py
Outdated
| data_level = "r1" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the enum for the datalevels from ctapipe. See here
dl1_data_handler/reader.py
Outdated
| ) | ||
|
|
||
| # Read the readout length from the first file | ||
| data_group = getattr(self.files[self.first_file].root, self.data_level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why here we have self.data_level and above data_level = "r1"? I think each reader subclass should hold a datalevel attribute from the enum (see comment above). Then you can your desired string self.data_level.name.lower().
dl1_data_handler/reader.py
Outdated
| return bin_flowers | ||
|
|
||
|
|
||
| class DLRawTriggerReader(DLWaveformReader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove Raw from the class name, since we do apply low-level fast calibration on the fly, like baseline subtraction.
dl1_data_handler/reader.py
Outdated
| ["patch", "sector", "camera"], | ||
| default_value = "camera", | ||
| help=( | ||
| "Set the number of pixels in the output vector, patch and sector only available for the Advanced SiPM camera. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the Advanced SiPM camera. -> for the Advanced SiPM camera (AdvCam).
dl1_data_handler/reader.py
Outdated
| add_dummy_channel = Bool( | ||
| default_value = False, | ||
| allow_none = False, | ||
| help=("Boolean variable to add or not an extra dummy dimension for the channel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a little bit more information for this help. E.g. why this is needed/when the user needs to do this action oon swiitching the boolean on.
dl1_data_handler/reader.py
Outdated
| subtract_mean = Int( | ||
| default_value = None, | ||
| allow_none = True, | ||
| help=("Value to subtract to the waveform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more information here please. From where you calculate the mean? Do you calculate for each pixels or over the whole camera? Are you using pedestal events for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, seeing it now, that is one is a fix integer that you "know" from elsewhere. For later we can consider to have this value pixel-wise and extracted from pedestal events before the obs starts (à la cat-A caibration)
| input_trigger_files = List( | ||
| default_value = None, | ||
| allow_none = True, | ||
| help=( | ||
| "h5 files with the obs_id, event_id, tel_id, trigger_mask, low_trigger columns" | ||
| ), | ||
| ).tag(config=True) | ||
|
|
||
| trigger_cuts = Bool( | ||
| default_value = False, | ||
| allow_none = False, | ||
| help=("Boolean variable to filter the table with events with positive triggers") | ||
| ).tag(config=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are those files and cuts produced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested to rename this file in a comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ressources -> resources in the folder structure
notebooks/test_r0_reader.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting together this detailed notebook. Can you please tell me the file size of e.g. /lustre/ific.uv.es/ml/ucm147/datasets/TriggerlessData/gammas/corsika_run110.r0.h5? We can consider uploading it to the test data and run it in the CI as unittests.
dl1_data_handler/reader.py
Outdated
| return true_image | ||
|
|
||
|
|
||
| def apply_low_trigger(x, mapper, trigger_type, l1_settings, tdscan_settings): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also @burmist-git & Tanguy, please review this function as it is the python implementation of the TDSCAN. Many thanks in advance!
|
I think the implementation is correct. One small improvement: move start and end outside the loop for efficiency. This is the version I use in my trigger chain: def execute(self, wf):
num_pixels, num_frames = wf.shape
# Compute rolling sum using cumsum
start = np.arange(num_frames) - self.eps_t
end = np.arange(num_frames) + self.eps_t + 1
start = np.clip(start, 0, num_frames)
end = np.clip(end, 0, num_frames)
# Output waveform
wf_out = np.zeros((num_pixels, num_frames), dtype=np.float32)
# Precompute cumulative sum over time for each pixel
cumsum_wf = np.pad(np.cumsum(wf, axis=1), ((0, 0), (1, 0))) # pad to handle start_t = 0 cleanly
# iterate over each pixel
for i in range(num_pixels):
neighbors = self.eps_xy_neighbors[i]# ex : 2,1,3,0,6,4,5 -> central pixel + its neighbors here central is 0
if len(neighbors) == 0:
continue
# Get total values in the window for each time frame using broadcasting
total = cumsum_wf[neighbors][:, end] - cumsum_wf[neighbors][:, start] # shape: (num_neighbors, num_frames)
total_sum = np.sum(total, axis=0) # sum across neighbors # non weighted
wf_out[i] = (total_sum >= self.min_points).astype(np.float32)
return wf_out |
|
@jbuces can you please rebase to the current main. We pushed some changes to the CI. Thanks in advance! |
135b18c to
051fb84
Compare
| max_memory_gb = Float( | ||
| default_value=10, | ||
| allow_none=True, | ||
| help=( | ||
| "Maximum memory in GB that RebinMapper is allowed to allocate. " | ||
| "Set to None to disable memory checks. Default is 10 GB. " | ||
| "Note: RebinMapper uses approximately (image_shape * 10)^2 * image_shape^2 * 4 bytes." | ||
| ), | ||
| ).tag(config=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is deleted? Probably a rebase mistake? Several of the changes here should be included.
This PR allows the possibility to process r0 waveforms, and multiple forms of treating trigger patches of these raw waveforms.
Fixes #148 .
DLRawTriggerReader.pdf