Presentation

~~Notify v5's first alpha is out.~~ ~~Notify v5's beta is out.~~ ~~Notify v5 is out.~~ None of these yet, this is a draft.

If you don't particularly care more about why v5 is and why I think it's cool, then just know that it's easier for you to use, better for your users, and vastly more maintainable for me. However, it is a fairly severe break, so upgrade may not be simple. Details are on the repo and extensive documentation.

If you're interested or merely curious, though, read on!

The background for this is that filesystem notification is a very diverse area. There exists a half dozen of different kernel modules that fit the description, and several ways to achieve it without special access, and several other ways to tap into special filesystems or filesystem-like structures. Each of these was designed for a (sometimes wildly) different purpose. Interestingly, very few were designed for the purpose of “watching a file tree and doing something when one changes.”

FSEvent, for example, is a tool for archival and indexing. It was designed and built for two macOS systems: Time Machine and Spotlight. Many of its features and behaviours are wholly unsuited for general file watching (and yet that is what is used by most filesystem notification libraries and tooling out there). Rather, it is meant to be queried or streamed at long-ish intervals and used as an indication that something somewhere has changed, and that the consumer should rescan, reindex, rebackup, etc the whole thing.

Fanotify, largely hailed as the “successor” to inotify, is only incidentally useful for filesystem notification tasks: its main purpose and design is to intercept access calls to files and let a userspace daemon allow or deny those accesses at its leisure. The Linux Audit system is also incidentally useful, and was designed for, well, auditing.

Kqueue and kevent and such are general kernel object watching mechanisms. To watch a tree, one opens a handle (which is a kernel object) for every single file and directory the tree contains and places a kevent watch mask on it.

Even systems designed and purposed for file tree watching are amazingly different in how they do it, how they behave in various interesting cases, and how they report back.

Everything makes it hard to abstract the systems into something remotely coherent. (Do we need something coherent? Of course we do. We want to do things when files change, not care about all this trivia.)

Fortunately, filesystems are mostly similar, from the end-user’s perspective. They have files and folders. Files can be read and written, and sometimes executed. Files and folders have names, and some amount of metadata. Files and folders are created, modified, deleted, accessed.

So the foundation of Notify v5 is recognising this truth and redesigning the event system from that standing. Notify events have a kind, which is a hierarchical classification of both what generally the event is, and what exactly it’s about. Three examples:

Modify(Data(Size)) tells us the data of an object was modified, and we know that because its size changed.
Create(Folder) tells us a folder was created.
Remove(Any) tells us an object was removed but we don’t know specifics.

That classification allows a consumer to quickly filter what they’re looking for, as grossly or precisely as needed, while allowing producers to describe events as precisely as they can… but no more precisely than that.

Notify events also carry the path the event concerns, and an arbitrary metadata bin to store related rich information, where available (such as a reference to the process that made the change, how the event was collected, or additional known precisions to the event that don’t fit in the classification).

That’s the event problem solved. The second problem Notify v4 and many other such wrappers have is fallback. When the platform doesn’t have a native API to gather the relevant events, we must fall back to polling. That is simple enough to do. However, a related issue is runtime fallback: what if we know that the platform has a native API, but upon querying it we observe it’s not available, or at capacity, or some other thing makes it useless for us?

This is a frequent issue with inotify, because the number of watches is limited, and that limit is fairly low by default (to keep kernel memory manageable). Right now, consumers look for that error themselves and fall back to polling on their own. Often, they fall back for the entire set of paths they want to watch.

A more clever approach, and that is what Notify v5 does, is to manage the selection of event sources (“backends”) internally and not bother the user unless it really is impossible to watch a path. Notify itself watches for that error and falls back.

And that opens up interesting avenues: for one, there’s no need to fall back for the entire set of paths we want to watch. If inotify has enough capacity, we can use that for a set of paths, and use polling for the remainder. For another, we can use more than two backends at once. macOS has two kernel APIs. Linux has a staggering four. They all have different capabilities and restrictions, but if they’re available, nothing stops us from using them all, at once. For last, and as an example, inotify being at capacity now does not mean it always will be: we can check again later and switch back some of the watch set to the more efficient backend as it becomes available.

(Notify currently does some of the first, and some of the second, and the third not actively, but all that’s there to explore further in the future.)

The next two problems are solved together in my design. One: because different backends have different capabilities, we need some way of bridging the gap for the missing ones, in order to provide a coherent experience. One point five: because we might have several backends in play, whatever solution is used needs to apply only to those backends that need it! Two: event debouncing, where similar events close together are held back to avoid hammering effects.

For this, I introduced processors. They declare which capabilities they require, and which they supply, if any. They have access to some of the internal state. They can ask to add or remove watches. And they can let go, modify, create, or discard events along the stream they're hooked onto.

A consumer can add in their own processors, or enable provided but optional ones, or bring in a third party’s. Notify manages the lot, weaving the streams correctly, splitting and recombining where needed, and maintaining the watch list.

That’s how it works. At the end of the day, it’s definitely a more complicated set-up, but it delivers:

A more cohesive and less surprising filesystem notification interface.
Lots of potential. There’s many super-exciting things to explore, that the architecture encourages, rather than stifling customisation or expansion.
Maintainability. It is more modular, and the pieces are less complex. A key driver for the design was how it compartmentalises domain knowledge and contains the effect any piece, special or banal, has on the rest.

I hope that was an interesting overview, and that you enjoy using Notify v5!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Presentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally