Design overview

Introduction

The control software for the Spotlight setup is challenging to design. Among the sources of this difficulty are:

Too many hardware devices to interface with: the Euresys frame grabber, the JAI camera, the PCO camera, the Zaber translation stages, the CCS light control unit (illumination and optogenetics), the Thorlabs light control unit (blue excitation), and the Arduino. Each of these comes with its on particularities, and sometimes incompatibilities with the others.
Timing has to be precise: the behavior camera, the muscle camera, and the lights must be synchronized with as little latency as possible.
High rate of data coming in: We are recording ~1000x2000px behavior images at 350 FPS, and we want as little data loss as possible. This requires some non-trivial optimizations (and this is why the code is mostly written in C++).

On this Wiki page, I will explain the overall design decisions, particularly the use of parallel threads, shared memory, and hardware interfaces. This will hopefully make it less confusing to read the detailed API documentation.

In the Parallelism through multithreading section, I will motivate the use of multiple threads and enumerates the 10 types of threads used in the program. In the Inter-process and inter-thread communication section, I will explain how threads share access to the same information in a synchronized and consistent way. Then, in the Hardware interface section, I will explain how the program controls and accesses data from peripheral devices (i.e. cameras, Arduino, motion controller). In the Detailed description of shared data section, I will describe the detailed specifications C++ classes and structs that are shared among threads. With these in mind, in the Detailed description of worker threads, I will similarly describe the threads that implement the logics of the program. Finally, in the Control programs and Utility tools sections, I will describe the programs that the user should use to calibrate the experimental system, collect data, and post-process the data.

Parallelism through multithreading

To control the Spotlight setup, we run a number of threads in parallel. There are two main advantages for this:

First, parallelism serves as a means to encapsulate independent logics. As discussed, we have a lot of hardware and software components to handle. Implementing each in a separate thread offers a natural way of segregating them and avoids the entanglement of their states. For example, we have a thread dedicated to motion control. In this thread, we have to estimate the position of the fly in the most recently acquired behavior image, compute where the motion stages should move to in order to keep the fly centered in the field of view, and send such commands to the motion stage controller. If this logic were to be implemented in the same thread as image acquisition, we would have to to carefully synchronize image capturing with position estimation and stage movements, entangled motion control logic with image processing logic, and thus making the code overly complicated and hard to maintain.

Instead, by implementing the tracking logic in a separate thread, we can simply run an infinite loop where the thread fetches the latest image—doesn't matter how it is acquired; that's the job of another thread—and take it from there. When the desired target positions are computed, our thread can then dump the command to yet another thread—one that actually communicates with the motion controller—without having to worry about the low-level API calls to control the hardware. Of course, our thread still needs to receive the latest image and send the target stage locations, but these can be put in data holders that are much better defined (i.e. just the data, no program state, etc.; see Inter-process and inter-thread communication).

Second, running parallelizable tasks in separate threads massively enhances performance. In our previous example, aside from the two exchanges of the lates behavior frame (which needs to be synchronized; see section below), the calculation of where the fly is and the acquisition of the image can happen in parallel. By implementing them in separate threads, we allow the operating system to schedule them to run on different CPU cores, waiting for one another only when shared data have to be accessed in a consistent manner. This massively boosts the performance of the code. Additionally, for such works as compressing acquired frames and saving them to the hard drive, we can simply start a pool of multiple threads, so that the program would not lag even if data comes in faster than what a single image saving function can handle.

In the Spotlight control programs, we have the following threads:

Main thread
GUI
Behavior image acquiring thread
Behavior image saving threads (multiple)
Muscle image acquiring thread
Muscle image saving thread (multiple)
Motion control IO thread
Tracking control thread
Motion stage position logger thread
Arduino communication thread

Additionally, we have a dedicated process (slightly different from a thread) that acts as a server for the PCO camera used for muscle imaging. This will be described in Hardware interface.

Each of these threads will be explained in Detailed description of worker threads. However, before we proceed, we have to first describe how threads exchange data, what data are exchanged, and how the hardware components are controlled. These will be covered in the next three sections.

Inter-process and inter-thread communication

Shared memory

When we talk about memory in a program, we're usually referring to something abstract—variables, data structures, arrays. But under the hood, all of this data must live somewhere physical. The physical media of memory include RAM, hard drives, or external devices mapped as "files"—think a printer that "pretends" to be a regular file, and whatever you write to that file gets printed. Things in your program—variables, constants, etc.—reside at particular memory addresses. If you are familiar with programming in C/C++, a pointer is basically an integer address of the thing that it points to.

However, each program (or process) doesn't see the actual physical memory inside your computer. Instead, it sees its own private "illusion" of memory, called virtual memory. Addresses in the virtual memory are mapped to physical memory addresses by the operating system in a process called memory paging. This happens behind the scenes in while you run your program. There are several reasons for layer of indirection: for one, it's easier to manage your allocated memory if you can assume don't have to take into account what addresses are occupied by other programs. The OS moves memory around as needed without the program needing to know. Isolation and safety are another factor: virtual memories prevent a program from accidentally (or maliciously) access or corrupt the memory of another program.

More critically for our program, memory paging supports for shared memory. The OS can choose to map the same physical memory into the virtual address spaces of multiple programs. This is the foundation for efficient communication between processes.

Recall that our program consists of 10 types of worker threads and an additional process (the PCO camera server). It is now necessary to point out the differences between a thread and a process. A process is like a completely separate program running on your computer—it has its own memory and resources. A thread, on the other hand, is like a worker inside that program, doing a specific task, and all threads in the same process share the same memory (this page offers a more detailed explanation). The reason why we need a PCO server implemented as a separate process is a technical one and will be discussed later. Shared memories are handled differently between threads and between processes:

To share memory between threads, no special mechanism is needed. Threads naturally share the same address space. This means that if one thread creates a variable, another thread can directly read or modify it. In practice, we often use shared pointers (std::shared_ptr) in C++ to manage shared data safely. A shared pointer keeps track of how many threads (or parts of the program) are using a piece of data and automatically de-allocates it when no one needs it anymore.

Sharing memory between processes is a bit more complicated, since each process has its own private (i.e. virtual) memory space. To share memory across processes, we need to explicitly allocate a region of physical memory that both processes can map into their own virtual address spaces. In Linux, this is typically done using three system calls:

shm_open(...) (shared memory open): This open a section of memory that can be opened by other processes too. The section of memory behaves like files—they can be opened in read-only, write-only, or read-and-write modes. They can be found as "files" under /dev/shm on the filesystem.
ftruncate(...) (file truncate): Once the shared "files" have been opened, ftruncate defines the size of memory buffer. In other words, shm_open returns a file descriptor that defines the start of the memory address; ftruncate defines how large the "file" is.
mmap(...) (memory map): Once the file descriptor is created and its size set, mmap maps the memory object into the process's own virtual memory space so it can be read and written like regular memory. In other words, mmap creates a pointer to the shared resources that you can treat as any other pointer. You can dereference it to read from or write to it.

In Hardware interface, I will explain how we use these functions to stream images from the PCO camera server to the main recording program.

It's important to note that the aforementioned methods only allow multiple threads and processes to access shared resources. To ensure that the shared data are correct and consistent, we still need additional mechanisms to synchronize the accesses. The next sub-section will discuss data consistency and synchronization.

Data consistency

While our threads mostly work independently of each other, sometimes they have to share data with one another. The exchange of information between parallel threads and processes is called inter-thread communication and, more generally, inter-process communication (IPC). The main problem in IPC is to ensure the synchrony and consistency of shared data. More specifically, we want to avoid the following two situations:

Race conditions: These happen when two threads access and modify shared data at the same time without proper timing control. In real-time systems, this can cause confusion or loss of information.
- Example: Imagine a shared grocery list on a whiteboard at home. One person is updating the list by adding "eggs" and "milk." Meanwhile, another person (the consumer) is heading to the store and copies the list to their phone. Without deliberate timing control—say the shopper reads the list while it's mid-update, they might only see "eggs" and miss “milk,” or even get corrupted data like "mi…" before heading out.
Deadlocks: This happens when multiple threads are each waiting for the other to release a resource, so no one can continue.
- Example: Imagine two cars arrive at the same intersection and their planed paths intersect. Each car tries to proceed without yielding, so the cars just block one another and we have a traffic jam. A naive solution is to implement a priority rule (e.g. the car on the right always has priority). However, this also doesn't work: If we have four cars coming in all directions, we are still deadlocked. In real life, drivers will eventually figure the situation out because humans can negotiate, make eye contact, or break the rules to get things moving. However, in computing systems, threads follow strict rules and they will wait forever unless we use more sophisticated mechanisms implemented at the operating system level.

To avoid these issues, synchronization primitives are used. These are basic building blocks that help coordinate when and how threads can access shared resources, ensuring that only one thread acts on critical data at a time, or that threads wait for the right moment to proceed. In our program, we use two of the most common synchronization primitives: the mutex (short for mutual exclusion) and the condition variable.

Mutexes are like locks on a door. When a thread wants to access shared data, it first "locks" the mutex, blocking other threads from entering the critical section until it's done and "unlocks" it. This ensures that only one thread can modify or read the shared data at a time, avoiding race conditions.

Condition variables are used to make threads wait for a certain condition to become true before continuing. Think of it like waiting for a bell to ring—one thread might be waiting for a buffer to have data, and another thread signals (rings the bell) when it has finished producing that data. This is much more efficient that the alternative of busy waiting—the receiving threads constantly checking if the condition is met in an infinite loop. Therefore, condition variables help coordinate timing between threads and prevents both unnecessary work and deadlocks.

Hardware interface

Motion control

Before controlling the motion stages programmatically, we need to install the Zaber Launcher application (see its download page here or our wiki page here). This is a convenient GUI program that allows us to set up serial connection with the hardware and modify parameters such as the maximum allowed speed. We can also do such things as manually controlling stage positions and homing the stages in Zaber Launcher.

Once the hardware is set up using Zaber Launcher, we interface with the Zaber X-MCC controller, which controls two linear translation stages, using the Zaber Motion Library. We implement a simple wrapper around the Zaber Motion Library API in recorder/src/peripherals/motionControl.cpp and motionControl.hpp. This wrapper implements methods such as moveAbsolute (move to absolute position), moveRelative (move relative to the current position), home (go to home position, which also resets its position sensing), getPosition (read out the current position), checkIfIdle (see if motion stages are stationary), and waitUntilIdle (blocks until motion stages reach their target positions).

Arduino

The Arduino circuit is a key component of the Spotlight setup. It generates trigger signals (square waves) that control the cameras (behavior camera and muscle camera) and the lights (IR illumination, blue excitation, and optogenetics). The computer on which the control software is run communicates with the Arduino to modulate the trigger signals (e.g. adjust the frequency/frame rate).

Scientific cameras often support hardware triggering, which allows precise synchronization between image acquisition and external events. Instead of relying on software timing, which can be inconsistent due to operating system delays, hardware triggers use physical electrical signals—usually 3.3V or 5V transistor-transistor logic (TTL) pulses—to tell the camera exactly when to capture a frame. Additionally, some cameras also allow us to control the exposure time with the width of the pulse so that whenever there is a rising edge, the exposure starts, and whenever there is a falling edge, the exposure stops (or the opposite, depending on the configuration). Similarly, lights can be triggered in strobing mode, where they are only on when the TTL signal is high (or low, depending on the configuration). This allows light to be turned on briefly during the camera exposure. This reduces photobleaching and heat buildup.

In our setup, the Arduino sends square wave pulses to the cameras. Each rising edge acts as a "take a picture now and switch light on" signal, and the muscle camera is triggered once every $k$ times the behavior camera is triggered. This ensures that both the behavior and muscle cameras are perfectly synchronized, even at high frame rates. Altogether, the Arduino controls six output lines:

Infrared illumination LED, through the CCS PD3-3024-3-EI controller
Optogenetics channel 1 (an additional LED), through the CCS PD3-3024-3-EI controller
Optogenetics channel 2 (an additional LED), through the CCS PD3-3024-3-EI controller
The JAI (behavior) camera, through the Euresys Coaxlink Quad G3 frame grabber
The PCO (muscle) camera
The Thorlabs (blue excitation) LED, through the Thorlabs LEDD1B controller

Note

The lights for optogenetics are not strobed. However, by supplying a constant high/low voltage to the trigger line of the light when it is supposed to be on/off, we can control these lights using the strobing functions.

A C++ program, TriggerController, is loaded on an Arduino Nano ESP32 microcontroller. This program runs in an infinite loop. In each iteration, the program reads out the current time from a high-precision hardware clock and determines whether any action should be taken on any of the lines. It also checks if the computer has demanded a change in the triggering pattern (e.g. change of exposure time or frame rate). To specify the way in which the control program on the workstation sends commands to the program running on the Arduino, we need a protocol (i.e. a syntax of possible commands and the corresponding operations). We define the following as the allowed messages (in plain ASCII text) that the workstation should may send to the Arduino:

>SET_BEHAVIOR_FPS <fps>\n, where <fps> is an integer.
>SET_SYNC_RATIO <syncRatio>\n, where syncRatio is an integer and the muscle camera is triggered once every syncRatio times the behavior is triggered. In combination of the behavior frame rate, this variable thereby defines the frame rate of the muscle camera.
>SET_BEHAVIOR_EXPOSURE_TIME <exposureTimeUs>\n, where exposureTimeUs is an integer and is the desired exposure time of the behavior camera in microseconds.
>SET_MUSCLE_EXPOSURE_TIME <exposureTimeUs>\n, where exposureTimeUs is an integer and is the desired exposure time of the muscle camera in microseconds.
>START_RECORDING <protocolString>\n, where <protocolString> is a string defining the schedule by which the optogenetics lights should be turned on and off and the time at which recording should stop. This will be described in two paragraphs.
>STOP_RECORDING\n. This stops the recording (i.e. all output lines are at a constant low).

Note that there are two design decisions here. First, all messages start with the character >. Any message received by the Arduino that does not start with > will be read back to the computer. This, combined with logging, can be useful for debugging. Second, all messages are terminated by \n (the new line character). The Arduino program only starts to parse the message when \n is encountered.

In some experiments, one might want to precisely specify a schedule by which optogenetic lights should be turned on and off, or when the recording should stop. We call this schedule an experiment protocol and specify it in the <protocolString> (see point 5 above). The syntax of this string is defined in The experiment protocol string.

In the C++ program on the computer side, we define an ArduinoCommunication interface (implemented in recorder/src/peripherals/arduinoCommunication.cpp and arduinoCommunication.hpp). This interface, implemented as a class, provides methods such as setBehaviorRecordingFPS, setSyncRatio, setBehaviorExposureTime, setMuscleExposureTime, startRecording, and stopRecording. Upon calling these methods, the appropriate commands will be sent to the Arduino microcontroller. This way, that the caller does not have to deal with formatting the commands above or low-level serial communication via USB. Under the hood, the ArduinoCommunication object manages the Arduino communication thread, which sends and receives the messages defined above in the background.

Euresys frame grabber and JAI camera

The JAI SP-5000M-CXP4 camera, which we use for behavior recording, uses the CoaXPress interface. This interface allows much faster data transfer than the typical USB interface. However, unlike USB cameras, a frame grabber is required. We use the Euresys Coaxlink Quad G3 frame grabber, which is installed inside the computer as a Peripheral Component Interconnect (PCI) card (similar to a graphics card). The camera streams data into the frame grabber, and the frame grabber streams data into the computer memory through PCI interface. We interact with the camera through the Euresys frame grabber.

The configuration of the camera is documented in eGrabber and JAI camera configuration, example mode 1. Once we configure the hardware through the eGrabber Studio GUI program, we will interact with it through a BehaviorCamera class, implemented in recorder/src/peripherals/behaviorCamera.cpp and behaviorCamera.hpp. When an object of the BehaviorCamera class is created, a sequence of configuration steps are preformed under the hood. Then, interaction with the camera can be done through methods such as start (which starts acquisition), stop (which stops acquisition), and waitForOneFrame (which blocks until a new frame is available and returns it). The low-level interaction with the Euresys API is encapsulated and agnostic to the caller.

PCO camera

Unlike the JAI camera, the "pco.panda 4.2" camera that we use for muscle imaging communicates with the computer via USB. Therefore, no frame grabber is needed.

We interface with the PCO camera through the "pco.cpp" API, though APIs at other levels such as "pco.sdk" are also provided (see Software dependencies and configurations, PCO camera software). Due to the design of the pco.cpp API, certain C++ source code, particularly those under pco.cpp/pco.camera, have to be included when we compile our Spotlight control program. This proves to be problematic due to namespace issues (common words such as BYTE and Camera are redefined in these pco.camera code). As a result, to my best attempt, any program that includes the PCO code is incompatible with both Qt (the GUI engine of our recording program) and the Euresys software. Therefore, I implemented a separate programpco-camera-server—that fetches frames from the camera and streams them to shared memory in a perpetual loop. It does absolutely nothing else and therefore is fully decoupled from the main program. Therefore, it can be compiled separately (i.e. without Qt and Euresys libraries). The main program can simply read frames through shared memory, which have been supplied by pco-camera-server. For more technical information on how shared memories work, see the Inter-process and inter-thread communication section.

In the main program, I implemented a MuscleCamera class (sources: recorder/src/peripherals/muscleCamera.cpp and muscleCamera.hpp). Similar to BehaviorCamera, this class offers a key frame waitForOneFrame, which blocks until a new frame is available, and then returns it. Under the hood is the synchronized read operation from shared memory. This way, the caller can treat MuscleCamera just like BehaviorCamera and does not need to concern themselves with the additional layer of indirection that is the PCO camera server. See PCO camera server for how the server program works.

However, there are two minor differences between MuscleCamera and BehaviorCamera. First, because the PCO camera uses a rolling shutter (this is why it has very low noise and why we use it for muscle imaging), controlling the exposure time is not as straightforward as the JAI camera, which uses a global shutter. Therefore, whereas the JAI camera opens its shutter whenever the trigger signal is high (so that we can control the exposure time by modulating the duration of each pulse sent from the Arduino), the PCO camera needs to have its exposure time separately set via the USB interface. For this, the MuscleCamera class has an additional setExposureTime method. Second, we do not have to explicitly control the start and stop of acquisition, and therefore no start nor stop method is provided. Instead, upon creation of the MuscleCamera object, a new pco-camera-server process is spawn. Upon the destruction of the object, the pco-camera-server process is terminated.

Important

Although the exposure time is directly communicated to the PCO camera, we still need to inform the Arduino of the muscle exposure time because the Arduino still controls the triggering of the blue excitation light, which has to be synchronized with the camera.

Next, read Data in shared memory and Pseudocode of CPP threads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design overview

Introduction

Parallelism through multithreading

Inter-process and inter-thread communication

Shared memory

Data consistency

Hardware interface

Motion control

Arduino

Euresys frame grabber and JAI camera

PCO camera

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally