Skip to content

Link waveforms of same sort group in UnitWaveformFeaturesGroup#1501

Draft
samuelbray32 wants to merge 2 commits intomasterfrom
merge_waveforms
Draft

Link waveforms of same sort group in UnitWaveformFeaturesGroup#1501
samuelbray32 wants to merge 2 commits intomasterfrom
merge_waveforms

Conversation

@samuelbray32
Copy link
Collaborator

@samuelbray32 samuelbray32 commented Dec 22, 2025

Description

Resolves #1449

  • Adds the part table UnitWaveformFeaturesGroup.LinkedSorts
  • Implements UnitWaveformFeaturesGroup.fetch_data
    • After fetching times and waveforms for each merge_id
      • iterates through any entries for the group in ``UnitWaveformFeaturesGroup.LinkedSorts`
      • concatenates data from merges_ids in that entry
    • Called within ClusterlDecodingV1, so no additional changes needed there

Checklist:

  • NA If this PR should be accompanied by a release, I have updated the CITATION.cff
  • NA If this PR edits table definitions, I have included an alter snippet for release notes.
  • NA If this PR makes changes to position, I ran the relevant tests locally.
  • If this PR makes user-facing changes, I have added/edited docs/notebooks to reflect the changes
  • I have updated the CHANGELOG.md with PR number and description.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds functionality to link waveform features from the same sort group across different epochs/intervals, addressing issue #1449. The implementation allows users to combine spike sorting data from separate processing runs (e.g., run and sleep epochs) by concatenating spike times and waveform features for corresponding sort groups.

Changes:

  • Adds LinkedSorts part table to track which SpikeSortingMerge IDs should be concatenated
  • Implements fetch_data method to retrieve and merge linked sorts
  • Extends create_group to accept optional linked merge IDs during group creation
  • Updates fetch_spike_data to use the new fetch_data method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

raise ValueError(
f"Linked SpikeSortingMerge ID {merge_id} not found in "
+ "UnitFeatures table"
+ f" for group {key['waveform_features_group_name']}"
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message construction on lines 141-143 uses key['waveform_features_group_name'] but this key might not exist in the key parameter, as the method accepts key: dict = dict() and it's not guaranteed that the key contains this field after restriction. Consider using group_key['waveform_features_group_name'] instead, which is fetched on line 107.

Suggested change
+ f" for group {key['waveform_features_group_name']}"
+ f" for group {group_key['waveform_features_group_name']}"

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +127
df = pd.DataFrame(
{
"merge_id": merge_ids,
"spike_times": spike_times,
"waveform_features": spike_waveform_features,
}
)
df.set_index("merge_id", inplace=True)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a pandas DataFrame to organize the spike times and waveform features, only to convert them back to lists, adds unnecessary overhead. Consider using a dictionary instead, which would be more efficient: data_dict = {k['spikesorting_merge_id']: (st, wf) for k, st, wf in zip(waveform_keys, spike_times, spike_waveform_features)}. This would avoid the DataFrame construction and make the code more straightforward.

Copilot uses AI. Check for mistakes.
{**key, **group_key},
skip_duplicates=True,
)
if linked_ids is not None:
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't handle the case where linked_ids could be an empty list. If the user passes linked_ids=[], the code will still try to call insert_linked_ids with an empty list, which will iterate over nothing but still create a transaction. While this is harmless, it's inefficient. Consider checking if linked_ids instead of if linked_ids is not None to skip processing for both None and empty lists.

Suggested change
if linked_ids is not None:
if linked_ids:

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +195
def insert_linked_ids(
self, key: dict, linked_merge_ids_list: list[list[str]]
):
"""Insert linked SpikeSortingMerge IDs for multiple waveform features groups"""
with self.LinkedSorts.connection.transaction:
for linked_merge_ids in linked_merge_ids_list:
self.insert1_linked_ids(key, linked_merge_ids)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't validate that linked_merge_ids contains at least 2 elements. Linking a single merge_id to itself doesn't make sense and could lead to confusion. Consider adding validation to ensure each list in linked_ids contains at least 2 merge_ids, and raise a ValueError with a clear message if not.

Copilot uses AI. Check for mistakes.
Comment on lines +189 to +195
def insert_linked_ids(
self, key: dict, linked_merge_ids_list: list[list[str]]
):
"""Insert linked SpikeSortingMerge IDs for multiple waveform features groups"""
with self.LinkedSorts.connection.transaction:
for linked_merge_ids in linked_merge_ids_list:
self.insert1_linked_ids(key, linked_merge_ids)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code doesn't prevent the same merge_id from appearing in multiple different linked_id groups. If a merge_id appears in two different linked_id lists, it will be included twice in the final output (once for each link), leading to duplicated data. Consider adding validation to ensure that each merge_id appears in at most one linked_id group, and raise a ValueError if duplicates are detected across different groups.

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +138
if merge_id in df.index:
times_list.append(df.at[merge_id, "spike_times"])
features_list.append(df.at[merge_id, "waveform_features"])
managed_ids.add(merge_id)
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When checking if a merge_id exists in the DataFrame, the code uses if merge_id in df.index, but this returns True even if there are multiple occurrences of the merge_id in the index (in case of duplicates). Combined with the use of df.at[merge_id, ...], which returns the first match, this could lead to subtle bugs where only one occurrence is processed while others are silently ignored. Consider using a dictionary instead of a DataFrame with potential duplicate indices.

Copilot uses AI. Check for mistakes.
Comment on lines +128 to +159
merged_spike_times = []
merged_waveform_features = []
managed_ids = set()
for linked_ids in linked_id_list:
times_list = []
features_list = []
for merge_id in linked_ids:
if merge_id in df.index:
times_list.append(df.at[merge_id, "spike_times"])
features_list.append(df.at[merge_id, "waveform_features"])
managed_ids.add(merge_id)
else:
raise ValueError(
f"Linked SpikeSortingMerge ID {merge_id} not found in "
+ "UnitFeatures table"
+ f" for group {key['waveform_features_group_name']}"
)
if times_list:
merged_times_i = np.concatenate(times_list)
merged_features_i = np.concatenate(features_list)
ind_sort = np.argsort(merged_times_i)
merged_spike_times.append(merged_times_i[ind_sort])
merged_waveform_features.append(merged_features_i[ind_sort])

# add any remaining unlinked units
for merge_id in df.index:
if merge_id not in managed_ids:
merged_spike_times.append(df.at[merge_id, "spike_times"])
merged_waveform_features.append(
df.at[merge_id, "waveform_features"]
)
return merged_spike_times, merged_waveform_features
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method doesn't preserve the original order of units when concatenating linked and unlinked sorts. The order will be: first all linked sorts (in the order they appear in LinkedSorts), then all remaining unlinked sorts. This could be surprising to users and may break downstream code that expects a specific ordering. Consider documenting this behavior or preserving the original order of units.

Copilot uses AI. Check for mistakes.
keys: list[dict],
linked_ids: list[list[str]] = None,
):
"""Create a group of waveform features for a given session"""
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_group method now accepts a linked_ids parameter but doesn't document it. The docstring should be updated to document this new parameter, explaining that it's an optional list of lists where each inner list contains SpikeSortingMerge IDs that should be linked (concatenated) together.

Suggested change
"""Create a group of waveform features for a given session"""
"""
Create a group of waveform features for a given session.
Parameters
----------
nwb_file_name : str
Name of the NWB file corresponding to the session.
group_name : str
Name to assign to this waveform features group.
keys : list of dict
List of primary keys for `UnitWaveformFeatures` entries to include
in this group.
linked_ids : list of list of str, optional
Optional list of lists, where each inner list contains
SpikeSortingMerge IDs that should be linked (concatenated)
together into a single group. If None, no SpikeSortingMerge
entries are linked.
"""

Copilot uses AI. Check for mistakes.
nwb_file_name: str,
group_name: str,
keys: list[dict],
linked_ids: list[list[str]] = None,
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name linked_ids in create_group is somewhat ambiguous - it could be confused with the link_id field in the LinkedSorts table. Consider renaming it to linked_merge_ids to be more explicit and consistent with the naming in insert1_linked_ids and the field name in the LinkedSorts table.

Copilot uses AI. Check for mistakes.
if linked_ids is not None:
self.insert_linked_ids(group_key, linked_ids)

def fetch_data(self, key: dict = dict()):
Copy link

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fetch_data method is defined as an instance method (taking self as the first parameter) but is being called as a class method in line 696. This will cause an error since the method needs to be called on an instance, not the class itself. Either change this to a class method by adding the @classmethod decorator and changing self to cls, or call it on an instance.

Suggested change
def fetch_data(self, key: dict = dict()):
@classmethod
def fetch_data(cls, key: dict = dict()):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

update UnitWaveformFeaturesGroup to be compatible with ClusterlessDecodingV1

2 participants