Skip to content

Refactor trajectory frame caching, align Dataset-owned state access, fix dataset lifecycle, enforce state vs dataset API boundaries, and introduce System architecture#139

Draft
Copilot wants to merge 10 commits intomasterfrom
copilot/remove-md-frame-cache-and-refactor
Draft

Refactor trajectory frame caching, align Dataset-owned state access, fix dataset lifecycle, enforce state vs dataset API boundaries, and introduce System architecture#139
Copilot wants to merge 10 commits intomasterfrom
copilot/remove-md-frame-cache-and-refactor

Conversation

Copy link
Contributor

Copilot AI commented Mar 12, 2026

This change removes viamd's use of md_frame_cache, updates the surrounding trajectory/state plumbing to match the new Dataset-owned application state layout, fixes the initialization, deinitialization, and loading lifecycle for datasets, enforces explicit dataset ownership across all supporting procedures, and introduces the foundational System struct from the new multi-dataset architecture.

  • Trajectory cache refactor

    • Removed viamd's remaining md_frame_cache usage.
    • Kept trajectory wrapper state local to the wrapped trajectory instead of a global registry in loader.cpp.
    • Preserved trajectory utility helpers such as raw-trajectory access, recenter targeting, and cache-related compatibility entry points used by existing call sites.
  • Loader cleanup

    • Kept loader responsibilities focused on opening and wrapping trajectories for supported file formats.
    • Removed the old hidden loader-owned cache implementation details from the active code path.
  • Dataset-owned state integration

    • Updated stale pre-refactor access patterns across main.cpp, viamd.cpp, and dataset-related components to use current_dataset(state) for dataset-specific data.
    • Ensured dataset-specific representations, trajectory-derived analysis data, selection state, animation state, and related views follow the active dataset.
    • Moved file/path handling to the current refactor shape using workspace_path, Dataset::sys_path, and Dataset::traj_path.
  • Dataset loading/unloading lifecycle

    • Redesigned load_dataset_from_file so that loading a new topology into an already-populated dataset creates a new Dataset (tab) and switches current_idx to it. Loading into an empty slot reuses it. Trajectory-only loading requires topology to already be present in the current dataset.
    • New dataset tab identifiers are set to the loaded filename.
    • Fixed create_new_dataset which was erroneously initialising per-dataset bitfields (selection_mask, highlight_mask, visibility_mask) on current_dataset() instead of the newly created slot.
    • Fixed free_molecule_data to re-initialise per-dataset bitfields after md_arena_allocator_reset via a new init_dataset_bitfields helper, preventing use-after-reset memory issues.
    • Global selection masks (query.mask, grow.mask) are now initialised with the persistent allocator on first creation only.
  • State vs dataset API boundary enforcement

    • Audited all procedures in viamd.h/cpp and main.cpp to ensure they operate at the correct level.
    • Pure dataset-level functions now take Dataset& directly instead of ApplicationState*: recompute_atom_visibility_mask, flag_all_representations_as_dirty, remove_all_selections, remove_selection, remove_representation, remove_all_representations, update_representation_info.
    • Mixed functions that also need state infrastructure (allocators, script IR) now take both ApplicationState* and an explicit Dataset&: create_representation, clone_representation, create_selection, create_default_representations, update_representation, update_all_representations, reset_view.
    • Functions that are legitimately state-level remain ApplicationState*-only: init_molecule_data, init_trajectory_data, interrupt_async_tasks, load_dataset_from_file, load_workspace, save_workspace.
    • All call sites updated to pass the dataset explicitly.
  • System architecture introduction

    • Introduced struct System as a pure data container aligned with the "Systems must not store visualization state" principle from the multi-dataset architecture.
    • System holds all molecular data: topology (md_system_t), trajectory (md_trajectory_i*), GPU representation (md_gl_mol_t), file paths, FrameCache, recenter target, bounding box, precomputed trajectory data, and interpolated properties.
    • Dataset now contains a System system member. Molecular data is accessed via dataset.system.X; visualization context (camera, animation, selection, representation, visuals, operations, dirty flags) stays at the dataset level.
    • Added current_system(state) helper alongside current_dataset(state) for ergonomic System& access.
    • ~330 call sites in viamd.cpp, main.cpp, dataset.cpp, ramachandran.cpp, and shapespace.cpp updated to use the new access pattern.
    • Dataset is documented as a transitional combined System+View container; the long-term goal is to separate these into independent System[] and View[] arrays in ApplicationState (the Workspace), with Views referencing Systems by ID.
  • Compile error fixes

    • load_workspace used the stale variable name data throughout its body; all references corrected to app_state.
    • Removed a conflicting static forward declaration of reset_view from main.cpp (the function is non-static) and added default arguments to its viamd.h declaration.
    • Fixed stale data->view.param.* references in the #if EXPERIMENTAL_GFX_API block and a #if 0 debug draw block in main.cpp to use current_dataset(*data).view.param.*.
  • Application state alignment

    • Kept global state on ApplicationState where it still belongs, such as workspace path, window visibility flags, and shared shader handles.
  • Source cleanup

    • Removed dead md_frame_cache-based code paths and corrected outdated assumptions that still referenced the old top-level ApplicationState layout.
    • Removed unused dirty_system_state field from Dataset.

Example of the updated ownership/access pattern:

// Dataset holds system data in a named sub-struct
Dataset& dataset = current_dataset(*data);
System&  system  = dataset.system;  // or: current_system(*data)

// Molecular data accessed via system
if (!param.sys_loader->init_from_file(&system.sys, path, arg, dataset.alloc)) { ... }

// Dataset-specific visualization operations take Dataset& directly
remove_all_representations(dataset);
create_default_representations(state, dataset);
reset_view(state, dataset, &dataset.representation.visibility_mask, true, false);

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits March 12, 2026 16:13
Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Copilot AI changed the title [WIP] Remove md_frame_cache and implement md_lru_cache8_t Refactor trajectory frame caching to a state-owned 8-slot LRU cache Mar 12, 2026
…ion state, Moved things into new struct Dataset which represents one active dataset. The goal here is to transition into supporting many active datasets open at once.
@scanberg
Copy link
Owner

@copilot I did not like how you put the frame cache inside of loader. I removed that. The loader should conceptually just focus on providing appropriate loaders for file formats. I also made substantian changes to the ApplicationState structure where I moved many portions of the state into a new struct called Dataset which is supposed to represent Dataset specific states. The idea is to transition into supporting multiple open datasets, which are exposed as tabs within the main view of the application where the user can swap between these tabs. The default behaviour when loading a new system (either by dropping a file on the application or by going through the menu) is to open this dataset in a new tab. Notice that some state data is not dataset specific and lives above it. I want you to make sure this runs. For now its sufficient that anything that lives outside of the scope of the dataset is only applied to the first loaded dataset.

Representations are going to be dataset specific for example, and when the tab changes to study another dataset, the representations defined for that dataset should be shown in the representation window.

Same for ramachandran and other things for now.

Godspeed Ai agent, you're going to need it!

Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot I did not like how you put the frame cache inside of loader. I removed that. The loader should conceptually just focus on providing appropriate loaders for file formats. I also made s...

Addressed in 391bcd9. I updated the stale ApplicationState accesses to use current_dataset(state) across the loader follow-up, viamd.cpp, main.cpp, and the dataset/ramachandran/shapespace components so dataset-specific representations and analysis state follow the active dataset. I also restored the loader utility declarations needed by the current call sites and moved workspace/system/trajectory path handling over to workspace_path, sys_path, and traj_path.

Copilot AI changed the title Refactor trajectory frame caching to a state-owned 8-slot LRU cache Refactor trajectory frame caching and align state access with Dataset-owned application state Mar 13, 2026
Copilot AI requested a review from scanberg March 13, 2026 09:20
@scanberg
Copy link
Owner

@copilot
I made more changes to this major refactor.
What is missing now is the initialization and deinitialization of system/molecule and trajectories to accompany them and to handle workspaces.

Workspaces can be put aside for the moment, but it should default to only storing / loading the first dataset.

This issue with this refactor is that alot of the supporting procedures where designed when there was only a single dataset supported. Therefore most of these procedures (which deals with loading and unloading of data, systems, trajectories) need to be redesigned.

The logical workflow is this:
When a new (topology / system) is loaded, this results in a new dataset to which a trajectory can be applied.
If only a trajectory is loaded, we should try to match it against the datasets and fail if we cannot match it. There needs to be topology before trajectory.

Make a full pass over main.cpp, viamd.h and viamd.cpp and try to make sure that the loading and unloading of datasets work.

Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot
I made more changes to this major refactor.
What is missing now is the initialization and deinitialization of system/molecule and trajectories to accompany them and to han...

Addressed in 9180ef1. Here's a summary of what was fixed:

Compile errors fixed

  • load_workspace used the old variable name data throughout its body, but the function parameter is app_state. All references corrected.
  • reset_view was forward-declared static in main.cpp but defined as non-static in viamd.cpp. Removed the conflicting forward declaration from main.cpp and added default arguments to the viamd.h declaration.

Dataset loading/unloading redesigned

  • load_dataset_from_file: when a topology (sys_loader) is loaded into an already-populated dataset slot, a new Dataset is now created and made current (i.e. a new tab opens). Loading into an empty slot reuses it as before. Trajectory-only loading still requires topology to be present in the current dataset.
  • The new dataset tab's identifier is set to the loaded filename so tabs are distinguishable.

Bitfield memory safety fixed

  • create_new_dataset was erroneously calling current_dataset(state) to init bitfields on the new dataset — it now uses the newly created dataset directly.
  • free_molecule_data called md_arena_allocator_reset(dataset.alloc) which invalidated per-dataset bitfields, but never re-initialised them. They are now re-initialised after the reset via the new helper init_dataset_bitfields(dataset).
  • Global selection masks (query.mask, grow.mask) are now initialised with the persistent allocator on first creation only, not re-initialised on every new dataset.

Stale data->view.* references in the #if EXPERIMENTAL_GFX_API and #if 0 debug blocks in main.cpp updated to current_dataset(*data).view.*.

Copilot AI changed the title Refactor trajectory frame caching and align state access with Dataset-owned application state Refactor trajectory frame caching, align Dataset-owned state access, and fix dataset loading/unloading lifecycle Mar 13, 2026
@scanberg
Copy link
Owner

@copilot I have not looked over this yet, but can you go over the procedures in viamd.h/cpp and main.cpp and ensure that they operate on the correct level state vs dataset. Many of them were targeting state but after this refactor should target a specific dataset instead. For example to clear all representations, this should target a dataset and not the complete system state.

…natures

Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 13, 2026

@copilot I have not looked over this yet, but can you go over the procedures in viamd.h/cpp and main.cpp and ensure that they operate on the correct level state vs dataset. Many of them were targeting...

Done in 6a88a97. Every procedure was audited and updated:

Pure dataset-level (now take Dataset& directly — no more implicit current_dataset() lookup):

  • recompute_atom_visibility_mask(Dataset&), flag_all_representations_as_dirty(Dataset&)
  • remove_all_selections(Dataset&), remove_selection(Dataset&, int)
  • remove_representation(Dataset&, int), remove_all_representations(Dataset&)
  • update_representation_info(Dataset&)

Mixed (now take both ApplicationState* and explicit Dataset&, using the allocator/script from state but operating on the passed dataset):

  • create_representation(ApplicationState*, Dataset&, ...), clone_representation(ApplicationState*, Dataset&, ...)
  • create_selection(ApplicationState*, Dataset&, ...), create_default_representations(ApplicationState*, Dataset&)
  • update_representation(ApplicationState*, Dataset&, Representation*), update_all_representations(ApplicationState*, Dataset&)
  • reset_view(ApplicationState*, Dataset&, const md_bitfield_t*, ...)

Legitimately state-level (unchanged): init_molecule_data, init_trajectory_data, interrupt_async_tasks, load_dataset_from_file, load_workspace, save_workspace.

All call sites in viamd.cpp, main.cpp, and components/dataset/dataset.cpp were updated to pass the dataset explicitly. Full build passes.

Copilot AI changed the title Refactor trajectory frame caching, align Dataset-owned state access, and fix dataset loading/unloading lifecycle Refactor trajectory frame caching, align Dataset-owned state access, fix dataset lifecycle, and enforce state vs dataset API boundaries Mar 13, 2026
@scanberg
Copy link
Owner

@copilot I'm posting some new design directions for VIAMD, try to to incorporate as fully as possible:

VIAMD Multi-Dataset Architecture (Agent Summary)

This document describes the core architecture principles for supporting multiple datasets and views in VIAMD. It is intended for implementation agents and should be treated as the canonical conceptual model.

The goal is to support multiple datasets, multiple views, and view-scoped analysis without duplicating large simulation data.


Core Principles

  1. Systems are datasets.
    They contain molecular topology and trajectory data.

  2. Views define context.
    A view determines:

    • what systems are visible
    • what frame is displayed
    • what analyses operate on
  3. Analyses operate on views.
    Any analysis automatically evaluates on all systems visible in the view.

  4. Data and visualization must be decoupled.
    Systems exist independently of views and can appear in multiple views simultaneously.

  5. Systems must never be duplicated in memory.
    Views reference systems by ID or handle.


Core Objects

Workspace

Root container for the application session.

Responsibilities:

  • stores loaded systems
  • stores views
  • stores analyses
  • manages session state

Conceptual structure:

Workspace
• Systems
• Views
• Analyses
• Plots


System

Represents a single molecular dataset.

Contains:

System
• Topology
• Trajectory

Systems are pure data containers and must not store visualization state.


Topology

Static molecular structure.

Examples of contents:

  • atoms
  • bonds
  • residues
  • chains

Trajectory

Time-dependent coordinates.

Contains:

  • coordinate frames
  • timestep metadata
  • simulation box information

Trajectories may be very large and must be handled efficiently.


View

A visual and analytical context.

Defines:

  • which systems are visible
  • camera configuration
  • current frame (timeline)
  • system representations
  • analyses operating in this view

Views reference systems but do not own them.

Conceptual structure:

View
• Camera
• Timeline
• VisibleSystems
• Representations
• Analyses

Views should be cheap to create and duplicate.


Timeline

Controls the current frame displayed in a view.

Typical fields:

  • current frame
  • playback state
  • playback speed

Timeline may be:

  • view-local
    or
  • global to the workspace

Representation

Defines how a subset of a system is rendered.

Structure:

Representation
• System reference
• Selection
• Style
• Coloring

Examples of styles:

  • cartoon
  • licorice
  • surface
  • VDW

Representations belong to views because rendering may differ between views.


Selection

Defines a subset of atoms within a system.

Selections are defined by expressions such as:

protein
name CA
resid 10-50

Selections are always evaluated relative to a specific system.


Analysis

Represents a computed property.

Examples:

  • distance
  • RMSD
  • radius of gyration
  • angle
  • distributions

Structure:

Analysis
• Expression
• Scope (View)
• Results

Analyses run on all systems visible in the scoped view.


Results

Stores computed outputs from analyses.

Results are typically stored per system.

Example:

Results
• System A → timeseries
• System B → timeseries

Possible result types:

  • scalar
  • timeseries
  • histogram
  • distribution

Plot

Visualization of analysis results.

Plots must not compute data themselves.

Instead they reference analyses and visualize their results.

Structure:

Plot
• Analysis reference
• Plot parameters


Ownership Rules

Workspace owns:

  • Systems
  • Views
  • Analyses
  • Plots

Views reference:

  • Systems
  • Representations

Representations reference:

  • Systems
  • Selections

Analyses reference:

  • Views
  • Systems indirectly through view scope

Large data structures (trajectories, topology) must never be duplicated.


Analysis Scope Rule

The analysis scope is defined by the view.

When an analysis runs:

  1. Get systems visible in the view
  2. Evaluate the expression on each system
  3. Store results per system

Hidden systems must not be included in calculations.


Naming Guidelines

Use domain-level names, not UI names.

Correct examples:

Workspace
System
View
Representation
Analysis
Selection

Avoid:

Window
Panel
Widget

The UI layer should map onto these domain concepts separately.


Canonical Model

Workspace
→ Systems (datasets)
→ Views (visual + analysis context)
→ Representations (rendering)
→ Analyses (computations)
→ Plots (visualization)

The key rule:

Views define analysis scope.

This keeps visualization and computation aligned with the user’s mental model.

… from view context

Co-authored-by: scanberg <706523+scanberg@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 14, 2026

@copilot I'm posting some new design directions for VIAMD, try to to incorporate as fully as possible:

VIAMD Multi-Dataset Architecture (Agent Summary)

...

Incorporated in bf8fc34. Here's what was implemented from the architecture document:

System struct introduced — pure data container aligned with "Systems must not store visualization state":

  • sys (md_system_t – topology), traj (trajectory), traj_loader, gl_mol, gfx_structure, sys_path, traj_path, frame_cache, recenter_target, aabb_min/max, trajectory_data, interpolated_properties

Dataset is now a combined System+View container — molecular data lives in dataset.system, visualization context (camera, animation, selection, representation, visuals, operations, dirty flags) stays at the dataset level. The comments document the intended future separation into independent System[] and View[] arrays in ApplicationState (the Workspace).

current_system(state) helper added alongside current_dataset(state) for direct System& access.

~330 call sites in viamd.cpp, main.cpp, dataset.cpp, ramachandran.cpp, and shapespace.cpp updated to use dataset.system.X for all data fields.

The full Workspace/View/Analysis/Plot decoupling as described in the document is a longer-term effort — this commit establishes the foundational System type and separation that those future steps build on.

Copilot AI changed the title Refactor trajectory frame caching, align Dataset-owned state access, fix dataset lifecycle, and enforce state vs dataset API boundaries Refactor trajectory frame caching, align Dataset-owned state access, fix dataset lifecycle, enforce state vs dataset API boundaries, and introduce System architecture Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants