Skip to content

Commit 248b78b

Browse files
Merge branch 'dev' into enh/subject_species
2 parents 9b94659 + 8e85ed6 commit 248b78b

File tree

10 files changed

+378
-42
lines changed

10 files changed

+378
-42
lines changed

docs/best_practices/extensions.rst

Lines changed: 25 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,47 @@
11
Extensions
22
==========
33

4-
Extend only when necessary. Extensions are an essential mechanism to integrate data with NWB that is otherwise not
5-
supported. However, we here need to consider that there are certain costs associated with extensions, e.g., cost of
6-
creating, supporting, documenting, and maintaining new extensions and effort for users to use and learn extensions.
7-
As such, users should attempt to use core neurodata_types or existing extentions before creating extensions.
8-
``DynamicTables``, which are used throughout the NWB schema e.g., to store information about time intervals and
9-
electrodes, provide the ability to dynamically add columns without the need for extensions, and can help avoid the
10-
need for custom extensions in many cases.
4+
Extend the core NWB schema only when necessary. Extensions are an essential mechanism to integrate
5+
data with NWB that is otherwise not supported. However, we here need to consider that there are certain costs associated
6+
with extensions, *e.g.*, cost of creating, supporting, documenting, and maintaining new extensions and effort for users
7+
to use and learn already-created extensions. As such, users should attempt to use core ``neurodata_types`` or
8+
pre-existing extentions before creating new ones. :nwb-schema:ref:`sec-DynamicTables`, which are used throughout the
9+
NWB schema to store information about time intervals, electrodes, or spiking output, provide the ability to
10+
dynamically add columns without the need for extensions, and can help avoid the need for custom extensions in many
11+
cases.
12+
13+
If an extension is required, tutorials for the process may be found through the
14+
:nwb-overview:`NWB Overview for Extensions <extensions_tutorial/extensions_tutorial_home.html>`.
15+
16+
It is also encouraged for extensions to contain their own check functions for their own best practices.
17+
See the` :ref:`adding_custom_checks` section of the Developer Guide for how to do this.
1118

12-
TODO, add links to the tutorials for extensions
1319

1420

1521
Use Existing Neurodata Types
1622
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17-
When possible, use existing types when creating extensions either by creating new neurodata_types that inherit from
18-
existing ones, or by creating neurodata_types that contain existing ones. Building on existing types facilitates the
23+
24+
When possible, use existing types when creating extensions either by creating new ``neurodata_types`` that inherit from
25+
existing ones, or by creating ``neurodata_types`` that contain existing ones. Building on existing types facilitates the
1926
reuse of existing functionality and interpretation of the data. If a community extension already exists that has a
2027
similar scope, it is preferable to use that extension rather than creating a new one.
2128

2229

2330
Provide Documentation
2431
~~~~~~~~~~~~~~~~~~~~~
2532

26-
When creating extensions be sure to provide meaningful documentation as part of the extension specification, of all
27-
fields (groups, datasets, attributes, links etc.) to describe what they store and how they are used.
33+
When creating extensions be sure to provide thorough, meaningful documentation as part of the extension specification.
34+
Explain all fields (groups, datasets, attributes, links etc.) and describe what they store and how they
35+
should be used.
2836

2937

3038
Write the Specification to the NWBFile
3139
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3240

33-
When using pynwb, you can store the specification (core and extension) within the NWBFile by using
34-
. Caching the specification is preferable, particularly if you are using a
35-
custom extension, because this ensures that anybody who receives the data also receives the necessary data to
36-
interpret it.
41+
You can store the specification (core and extension) within the NWBFile through caching.
42+
Caching the specification is preferable, particularly if you are using a custom extension, because this ensures that
43+
anybody who receives the data also receives the necessary data to interpret it.
3744

3845
.. note::
39-
In PyNWB, the extension is cached automatically. This can be specified explicitly with ``io.write(filepath,
40-
cache_spec=True)``
46+
In :pynwb-docs:`PyNWB <>`, the extension is cached automatically. This can be specified explicitly with
47+
``io.write(filepath, cache_spec=True)``

docs/best_practices/tables.rst

Lines changed: 73 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Tables
22
======
33

4-
The DynamicTable data type that NWB uses allows you to define custom columns, which offer a high degree of flexibility.
4+
The :nwb-schema:ref:`dynamictable` data type stores tabular data. It also allows you to define custom columns, which offer a high
5+
degree of flexibility.
56

67

78

@@ -10,38 +11,89 @@ The DynamicTable data type that NWB uses allows you to define custom columns, wh
1011
Tables With Only a Single Row
1112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1213

13-
It is not common to save a table with only a single row entry. Consider other ``neurodata_types``, such as a one-dimensional :nwb-schema:ref:`sec-TimeSeries` or any of its subtypes.
14+
It is not common to save a table with only a single row entry. Consider other ``neurodata_types``, such as a one-dimensional :nwb-schema:ref:`sec-TimeSeries`.
1415

1516
Check function: :py:meth:`~nwbinspector.checks.tables.check_single_row`
1617

1718

1819

1920
.. _best_practice_dynamic_table_region_data_validity:
2021

21-
Unassigned
22-
~~~~~~~~~~
22+
Table Region Data
23+
~~~~~~~~~~~~~~~~~
2324

24-
Store data with long columns rather than long rows. When constructing dynamic tables, keep in mind that the data is stored by column, so it will be
25-
inefficient to store data in a table with many columns.
26-
bools
25+
Store data with long columns rather than long rows. When constructing dynamic tables, keep in mind that the data is
26+
stored by column, so it will be less efficient to store data in a table that has many more columns than rows.
2727

28-
Check function :ref:`check_column_binary_capability <check_column_binary_capability>`
28+
Check function :py:meth:`~nwbinspector.checks.tables.check_dynamic_table_region_data_validity`
2929

3030

3131

32+
.. _best_practice_column_binary_capability:
3233

33-
Use boolean values where appropriate. Although boolean values (True/False) are not used in the core schema, they are a supported data type, and we
34-
encourage the use of DynamicTable columns with boolean values. For instance, boolean values would be appropriate for a correct custom column to the trials table.
35-
times
34+
Boolean Columns
35+
~~~~~~~~~~~~~~~
3636

37-
Times are always stored in seconds in NWB. This rule applies to times in TimeSeries, TimeIntervals and across NWB:N in general. E.g., in TimeInterval
38-
objects such as the trials and epochs table, start_time and stop_time should both be in seconds with respect to the timestamps_reference_time (which by
39-
default is set to the session_start_time).
40-
Additional time columns in TimeInterval tables (e.g., trials) should have _time as name suffix. E.g., if you add more times in the trials table, for
41-
instance a subject response time, name it with _time at the end (e.g. response_time) and store the time values in seconds from the timestamps_reference_time,
42-
just like start_time and stop_time.
37+
Use boolean values where appropriate. Although boolean values (``True``/``False``) are not used in the core schema,
38+
they are a supported data type, and we encourage the use of :nwb-schema:ref:`dynamictable` columns with boolean
39+
values. It is also encouraged practice for boolean columns to be named ``is_condition`` where ``condition`` is
40+
whatever the positive state of the variable is, e.g. you might create a column called ``is_correct`` that has boolean
41+
values.
4342

44-
Set the timestamps_reference_time if you need to use a different reference time. Rather than relative times, it can in practice be useful to use a common
45-
global reference time across files (e.g., Posix time). To do so, NWB:N allows users to set the timestamps_reference_time which serves as reference for all
46-
timestamps in a file. By default, timestamp_reference_time is usually set to the session_start_time to use relative times.
47-
electrodes: ‘location’
43+
The reason for this practice is two-fold:
44+
45+
(i) It allows for easier user comprehension of the information by intuitively restricting the range of possible values
46+
for the column; a user would otherwise have to extract all the values and calculate the unique set to see that there
47+
are only two values.
48+
49+
(ii) For large amounts of data, it also saves storage space for the data within the HDF5 file by using the minimal
50+
number of bytes per item. This can be especially importance if the repeated values are long strings or float casts of
51+
``1`` and ``0``.
52+
53+
An example of a violation of this practice would be for a column of strings with the following values everywhere;
54+
55+
.. code-block:: python
56+
57+
hit_or_miss_col = ["Hit", "Miss", "Miss", "Hit", ...]
58+
59+
This should instead become
60+
61+
.. code-block:: python
62+
63+
is_hit = [True, False, False, True, ...]
64+
65+
66+
Check function :py:meth:`~nwbinspector.checks.tables.check_column_binary_capability`
67+
68+
.. note::
69+
70+
If the two unique values in your column are ``float`` types that differ from ``1`` and ``0``, the reported values
71+
are to be considered as additional contextual information for the column, and this practice does not apply.
72+
73+
.. note::
74+
75+
HDF5 does not natively store boolean values. ``h5py`` handles this by automatically transforming boolean values
76+
into an enumerated type, where 0 maps to "TRUE" and 1 maps to "FALSE". Then on read these values are converted back
77+
to the ``np.bool`` type. ``pynwb`` does the same, so if you are reading and writing with pynwb you may not need
78+
to worry about this. However, this will be important to know if you write using PyNWB and read with some other
79+
language.
80+
81+
82+
83+
Timing Columns
84+
~~~~~~~~~~~~~~
85+
86+
Times are always stored in seconds in NWB. In :nwb-schema:ref:`sec-TimeIntervals` tables such as the
87+
:nwb-schema:ref:`trials <sec-groups-intervals-trials>` and
88+
:nwb-schema:ref:`epochs <epochs>`, ``start_time`` and ``stop_time`` should both be in seconds with respect to the
89+
``timestamps_reference_time`` of the :nwb-schema:ref:`sec-NWBFile` (which by default is the
90+
``session_start_time``, see :ref:`best_practice_global_time_reference` for more details).
91+
92+
Additional time columns in :nwb-schema:ref:`sec-TimeIntervals` tables, such as the
93+
:nwb-schema:ref:`Trials <sec-groups-intervals-trials>` should have ``_time`` as a suffix to the name.
94+
*E.g.*, if you add more times in :nwb-schema:ref:`trials <sec-groups-intervals-trials>`, such as a subject
95+
response time, name it ``response_time`` and store the time values in seconds from the ``timestamps_reference_time``
96+
of the :nwb-schema:ref:`sec-NWBFile`, just like ``start_time`` and ``stop_time``.
97+
This convention is used by downstream processing tools. For instance, NWBWidgets uses these times to create
98+
peri-stimulus time histograms relating spiking activity to trial events. See
99+
:ref:`best_practice_global_time_reference` for more details.

docs/conf_extlinks.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
# "nwb-widgets-docs": ("https://github.com/NeurodataWithoutBorders/nwb-jupyter-widgets", ""),
2929
# "nwb-widgets-src": ("https://github.com/NeurodataWithoutBorders/nwb-jupyter-widgets", ""),
3030
# "caiman-docs": ("https://caiman.readthedocs.io/en/master/", ""),
31-
"nwb-overview": ("https://nwb-overview.readthedocs.io/en/latest/", ""),
31+
"nwb-overview": ("https://nwb-overview.readthedocs.io/en/latest/%s", "%s"),
3232
# "nwb-overview-src": ("https://github.com/NeurodataWithoutBorders/nwb-overview", ""),
3333
# "nwb-main": ("https://www.nwb.org/", ""),
3434
"conda-install": (

docs/developer_guide.rst

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,27 @@ build a new data interface and a graphical overview of the object structure can
1111
:nwbinspector-contributing:`Contributing <>` page.
1212

1313
Otherwise feel free to raise a bug report, documentation mistake, or general feature request for our maintainers to address!
14+
15+
16+
17+
.. _adding_custom_checks:
18+
19+
Adding Custom Checks to the Registry
20+
------------------------------------
21+
22+
If you are writing an extension, or have any personal Best Practices specific to your lab, you can incorporate these
23+
into your own usage of the NWBInspector. To add a custom check to your default registry, all you have to do is wrap
24+
your check function with the :py:class:`~nwbinspector.register_checks.register_check` decorator like so...
25+
26+
.. code-block:: python
27+
28+
from nwbinspector.register_checks import available_checks, register_check, Importance
29+
30+
@register_check(importance=Importance.SOME_IMPORTANCE_LEVEL, neurodata_type=some_neurodata_type)
31+
def check_personal_practice(...):
32+
...
33+
34+
Then, all that is needed for this to be automatically included when you run the inspector through the CLI is to specify
35+
the modules flag ``-m`` or ``--modules`` along with the name of your module that contains the custom check. If using
36+
the library instead, you need only import the ``available_checks`` global variable from your own submodules, or
37+
otherwise import your check functions after importing the ``nwbinspector`` in your ``__init__.py``.

nwbinspector/checks/ecephys.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,28 @@ def check_electrical_series_dims(electrical_series: ElectricalSeries):
5050
def check_electrical_series_reference_electrodes_table(electrical_series: ElectricalSeries):
5151
if electrical_series.electrodes.table.name != "electrodes":
5252
return InspectorMessage(message="electrodes does not reference an electrodes table.")
53+
54+
55+
@register_check(importance=Importance.BEST_PRACTICE_VIOLATION, neurodata_type=Units)
56+
def check_spike_times_not_in_unobserved_interval(units_table: Units, nunits: int = 4):
57+
"""Check if a Units table has spike times that occur outside of observed intervals."""
58+
if not units_table.obs_intervals:
59+
return
60+
for unit_spike_times, unit_obs_intervals in zip(
61+
units_table["spike_times"][:nunits], units_table["obs_intervals"][:nunits]
62+
):
63+
spike_times_array = np.array(unit_spike_times)
64+
if not all(
65+
sum(
66+
[
67+
np.logical_and(start <= spike_times_array, spike_times_array <= stop)
68+
for start, stop in unit_obs_intervals
69+
]
70+
)
71+
):
72+
return InspectorMessage(
73+
message=(
74+
"This Units table contains spike times that occur during periods of time not labeled as being "
75+
"observed intervals."
76+
)
77+
)

nwbinspector/checks/tables.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,12 @@
88
from pynwb.file import TimeIntervals, Units
99

1010
from ..register_checks import register_check, InspectorMessage, Importance
11-
from ..utils import format_byte_size, is_ascending_series
11+
from ..utils import format_byte_size, is_ascending_series, is_dict_in_string, is_string_json_loadable
1212

1313

1414
@register_check(importance=Importance.CRITICAL, neurodata_type=DynamicTableRegion)
1515
def check_dynamic_table_region_data_validity(dynamic_table_region: DynamicTableRegion, nelems=200):
16+
"""Check if a DynamicTableRegion is valid."""
1617
if np.any(np.asarray(dynamic_table_region.data[:nelems]) > len(dynamic_table_region.table)):
1718
return InspectorMessage(
1819
message=(
@@ -158,6 +159,23 @@ def check_single_row(
158159
)
159160

160161

162+
@register_check(importance=Importance.BEST_PRACTICE_VIOLATION, neurodata_type=DynamicTable)
163+
def check_table_values_for_dict(table: DynamicTable, nelems: int = 200):
164+
"""Check if any values in a row or column of a table contain a string casting of a Python dictionary."""
165+
for column in table.columns:
166+
if not hasattr(column, "data") or isinstance(column, VectorIndex) or not isinstance(column.data[0], str):
167+
continue
168+
for string in column.data[:nelems]:
169+
if is_dict_in_string(string=string):
170+
message = (
171+
f"The column '{column.name}' contains a string value that contains a dictionary! Please "
172+
"unpack dictionaries as additional rows or columns of the table."
173+
)
174+
if is_string_json_loadable(string=string):
175+
message += " This string is also JSON loadable, so call `json.loads(...)` on the string to unpack."
176+
yield InspectorMessage(message=message)
177+
178+
161179
# @register_check(importance="Best Practice Violation", neurodata_type=pynwb.core.DynamicTable)
162180
# def check_column_data_is_not_none(nwbfile):
163181
# """Check column values in DynamicTable to enssure they are not None."""

nwbinspector/utils.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
"""Commonly reused logic for evaluating conditions; must not have external dependencies."""
2+
import re
3+
import json
24
import numpy as np
35
from typing import TypeVar, Optional, List
46
from pathlib import Path
@@ -7,6 +9,8 @@
79
FilePathType = TypeVar("FilePathType", str, Path)
810
OptionalListOfStrings = Optional[List[str]]
911

12+
dict_regex = r"({.+:.+})"
13+
1014

1115
def format_byte_size(byte_size: int, units: str = "SI"):
1216
"""
@@ -46,3 +50,25 @@ def check_regular_series(series: np.ndarray, tolerance_decimals: int = 9):
4650

4751
def is_ascending_series(series: np.ndarray, nelems=None):
4852
return np.all(np.diff(series[:nelems]) > 0)
53+
54+
55+
def is_dict_in_string(string: str):
56+
"""
57+
Determine if the string value contains an encoded Python dictionary.
58+
59+
Can also be the direct results of string casting a dictionary, *e.g.*, ``str(dict(a=1))``.
60+
"""
61+
return any(re.findall(pattern=dict_regex, string=string))
62+
63+
64+
def is_string_json_loadable(string: str):
65+
"""
66+
Determine if the serialized dictionary is a JSON object.
67+
68+
Rather than constructing a complicated regex pattern, a simple try/except of the json.load should suffice.
69+
"""
70+
try:
71+
json.loads(string)
72+
return True
73+
except json.JSONDecodeError:
74+
return False

0 commit comments

Comments
 (0)