-
Notifications
You must be signed in to change notification settings - Fork 90
Description
What would you like changed or added to the documentation and why?
Meta-context: looking at the most recent (i think?) version of the docs here: https://pynwb.readthedocs.io/en/dev/index.html according to #1478
I am here with love to raise a conversation that I feel is one that must have been being had internally but externally feels overdue: there is a lot of information in the docs, yes. there has been a lot of work done on them, clearly. but at the moment they are not approachable for most people.
I've read I think most of pynwb and HDMF at this point, wrote my first conversion guide in 2019, and have spent the last year or so specifically studying software accessibility in a number of domains (including data formats! which I am a fan of!) and I believe the docs are the single greatest hindrance to adoption. I, again, with love, as someone who shares the goal of realizing the benefits of standards, with the intention of making this tool more accessible, will try and articulate why from an outside perspective.
UX as in the Experience of the User
If I am a neuroscientist interested in converting to NWB, this is how I am greeted to the docs:
All but two of the entries in the TOC are not relevant to me. Hopefully the installation is just pip install pynwb, so that leaves just the tutorials.
The pages are sorted alphabetically! As far as I can tell as a naive user, the only thing that seems relevant to me to getting my bearings here seems to be NWB File Basics. OK!
NWB File Basics
If i spend awhile reading through that tutorial, I come away with the notion that I need to make a file and add things to an NWB file, a sort of smattering of different data types (which are interesting and sound like what I needed!), and some hints at reading the file. Great! To a programmer, this is useful, I know how inheritance works so I know what it means for things to be a subclass and how that makes shared function. I know that I can click through to the API docs and read them. But to a neuroscientist who is not a programmer, I still am not really sure how this all works! There are timeseries, yes I know that one, but then I need to add epochs to a timeseries? What does that look like? I am not sure what tutorial should come next for me, none of the other general tutorials are obvious places to go next.
Well the next entry in the TOC is Domain-Specific tutorials, maybe I can use those.
Extracellular Ephys
tbh you lost me!
I know that I have electrophysiological data, I know I used electrodes to record it, I don't know how to read that diagram! I don't know what a device is or how it's related to an electrode group aside from the fact that I need to add them yet. I'm not clear about the reasoning behind why I'm doing any of this yet either, for example this took me awhile to parse even as someone who knows the library a bit:
When I go through to the add_electrode method to try and understand it (which is helpfully linked!), I get linked to the ElectrodeGroup object which doesn't have a docstring, and so I have no idea what it actually means, or why add_electrode needs it!
again as a programmer I think it might be relatively easy to read the inheritance hierarchy, click through to the source, even make a live object and inspect it directly, which is why I'm completely sympathetic to thinking I'm being pedantic here or overly critical. I know you have talked people through the structure of the library a thousand times so you know what I'm missing and that I'm being overly naive. im not trying to say the library is bad, just trying to describe that it's hard to learn as is.
Same thing with Device, from the API docs, I have no notion of what other objects it might be used with (links to other objects like from before would be nice!), and since it's in its own module in a very flat namespace, I am not sure where I might go to learn more!
Going through the rest of that tutorial, I'm not really sure how my units relate to my electrodes, my raw timeseries, and etc.
Where Am I Now?
From here, I can browse through the other tutorials, but what I'm missing at this point is a basic lay of the land, How can these various things interact with each other? What even are the basic objects here? I learned in the general tutorial that there were only three things: timeseries, processing modules, and metadata. But what are these electrode groups? What is a dynamictable? if I forget a value from my electrode what do I do? If i want to add my data, how would I go about it aside from following exactly what is in the tutorials? If my data is slightly different than what is in the tutorials, how would I go about fixing that?
Role of Code Structure
I'll make this very brief (as some people in this group have had me come in and make PRs drastically restructuring their libraries before lol) in order to limit this issue to the structure of the documentation and how can we scaffold this process better, but I think in the long run what is really needed is a refactoring of the library: most of the code is entirely flat in the base pynwb namespace, there's one io submodule that has duplicates of a lot of the same file names in the top-level namespace, and so as a result the documentation doesn't structure itself and has to be done manually. Similar things should be grouped together, and honestly a very simple pyreverse diagram demonstrates that that structure already exists (and looks pretty dang reasonable!), it just isn't reflected in the code structure (when read by an outsider):
Ideas for Docs Structure Refactoring
The first goal here should be to make a clear pathway for someone interested in converting their data to NWB to do so! which I think we can agree on and work towards. They shouldn't need to come to a workshop (as much as I love them), they ideally shouldn't need an additional library, and they shouldn't have to resort to using their grant funding to pay someone to convert it for them. Aspirations yno?
What that should look like, as in literally visually look like, to a new user is to have much much more of the TOC and homepage devoted to them.
Tutorials Gallery
Starting from the way the docs are implemented: I think one very simple fix is to fix the way sphinx_gallery is being used. From the index, the tutorials are linked as tutorials/index. I'm not really sure what the sphinx_gallery really adds here, but what it subtracts is explicit control over the presentation of the tutorials.
There is explicit order to the tutorial groups:
Line 69 in e05b553
| 'subsection_order': ExplicitOrder(['../gallery/general', '../gallery/domain', '../gallery/advanced_io']), |
but then within a group they are sorted alphabetically:
Line 73 in e05b553
| 'within_subsection_order': ExampleTitleSortKey |
This makes browsing them very challenging! There is no scaffolding, I have to discover it for myself. There really isn't a good way to learn about the structure of the library from the docs page (I know there is more elsewhere!) -- I'm not talking about learning about it from a developer POV for contributing, as the software structure is described in the developer docs below, I mean just knowing at a basic level what exists as a casual neuroscientist wanting to freshen up their data.
I feel like part of this might be the relative brittleness of the format of sphinx_gallery -- that looks great for short examples, I know sklearn and scipy use it to great effect, but it looks like a real pain in the ass to write docs as RST within comments! It also is very much programmer-centric in the amount of literal code that is included in the documentation. myst has made it dramatically easier to use sphinx and i can't recommend it enough. In either case a few more explicit steps up the ladder would I think be a good change.
Introductions to the API
Tutorials are great! They are not the most straightforward way to teach the structure of a library, and the rest of the API documentation needs to be able to speak for itself if a new user is expected to go from tutorials -> API on their own. As is, the API documentation feels like it's in dire need of dogfooding (something I know well personally and always mess up) - they are written, understandably, from the perspective of someone who understands the library but does not use the API documentation in their own work. There is a lot of missing context about what the role of any particular object does, and given the lack of hierarchical structure in the documentation and code, there is relatively little way for someone to infer it without reading the source code.
It looks like all the API-level documentation is just generated with sphinx-apidoc at the time of building the docs. Doing that means that for the API docs to be useful the code needs to be structured in a way that supports readable documentation. As is, however, most of the modules don't have top-level docstrings explaining what they are, and many of the objects and functions lack or have only barebones docstrings.
What's missing is narrative API documentation -- Even a few short sentences introducing what each of the modules are and how they relate to one another would go a long way in helping someone understand the library. Using sphinx-apidoc is fine, but until the library has browsable structure it should be used to generate doc stubs that live in the /docs folder that you can then give explicit structure to by handwriting some of the autodoc directives.
As is, it's relatively clear that the developers don't use these docs because when I click on any of the headings in the API documentation tab I am actually led to some sub sub sublink in the API Documentation > PyNWB > Submodules > <literal module name> page -- and hopefully y'all wouldn't do that to yourselves on purpose! I don't mean to be harsh just to say this doesn't seem intentional!
Using all the existing material!
I know y'all do about a billion workshops, have a ton of users, and probably have a ton of teaching material. That is not reflected in the docs! The main NWB page links to this separate documentation page: https://nwb-overview.readthedocs.io/en/latest/index.html#
which is not linked anywhere from the pynwb docs! The nwb-overview docs themselves also largely seem to be overview docs with links out to pynwb and nwb_conversion_tools and don't reveal any additional structure to the format or library.
If the documentation was structured in a more limber way, maybe by using myst, maybe by using a wiki, maybe by figuring out some other way to incorporate all the materials that I know exist, then that would probably improve the documentation tenfold without writing anything new! I'm talking about all this stuff as a start! https://neurodatawithoutborders.github.io/nwb_hackathons/
and I also have seen a few dozen lab-specific conversion repos that would be great to link to from an "examples" subpage!
I will stop there for now, and am more than happy to PR, but I hope hope hope this is received in the spirit I am writing it, as someone who is interested in the same things, that wants to see NWB thrive, that has coached several labs through conversion, and likes and respects what y'all do here. I think that all the cool next-level stuff I see happening like linked analyses and widgets and all that simply won't have the same impact if most people (without the funds to hire a staff programmer et al. to do it for them) simply cannot fathom converting their data in the first place. I have just been sort of confused by the docs for a long time and feel like it was worth saying something, and am again very very very happy to do some of the work of restructuring and rewriting the docs with some guidance for what the team would accept.
Do you have any interest in helping write or edit the documentation?
Yes.
Code of Conduct
- I agree to follow this project's Code of Conduct
- Have you checked the Contributing document?
- Have you ensured this change was not already requested?
edit 1: some grammar
edit 2: when I try and be funny and friendly online I speak in hyperbole like as a joke but then realize that it comes across as serious and so I made more explicit annotations of uh tone lol







