Skip to content

Commit 3608608

Browse files
committed
Revision #1
1 parent d465560 commit 3608608

File tree

2 files changed

+56
-7
lines changed

2 files changed

+56
-7
lines changed

paper/paper.bib

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,53 @@ @article{Rolfo:2023
8686
doi = {10.21105/joss.05813},
8787
author = {S. Rolfo and C. Flageul and P. Bartholomew and F. Spiga and S. Laizet}
8888
}
89+
90+
@article{gerris,
91+
title = {An accurate adaptive solver for surface-tension-driven interfacial flows},
92+
journal = {Journal of Computational Physics},
93+
volume = {228},
94+
number = {16},
95+
pages = {5838-5866},
96+
year = {2009},
97+
doi = {10.1016/j.jcp.2009.04.042},
98+
author = {Popinet, S.}
99+
}
100+
101+
@article{basilisk,
102+
title = {A quadtree-adaptive multigrid solver for the Serre–Green–Naghdi equations},
103+
journal = {Journal of Computational Physics},
104+
volume = {302},
105+
pages = {336-358},
106+
year = {2015},
107+
doi = {10.1016/j.jcp.2015.09.009},
108+
author = {Popinet, S.}
109+
}
110+
111+
@article{boilingfoam,
112+
title = {Computational study of bubble, thin-film dynamics and heat transfer during flow boiling in non-circular microchannels},
113+
journal = {Applied Thermal Engineering},
114+
volume = {238},
115+
pages = {122039},
116+
year = {2024},
117+
doi = {10.1016/j.applthermaleng.2023.122039},
118+
author = {F. Municchi and C.N. Markides and O.K. Matar and M. Magnini}
119+
}
120+
121+
@article{flow36,
122+
title = {FLOW36: A spectral solver for phase-field based multiphase turbulence simulations on heterogeneous computing architectures},
123+
journal = {Computer Physics Communications},
124+
pages = {109640},
125+
year = {2025},
126+
doi = {10.1016/j.cpc.2025.109640},
127+
author = {Roccon, A. and Soligo, G. and Soldati, A.}
128+
}
129+
130+
@article{thinc,
131+
title = {Toward efficient and accurate interface capturing on arbitrary hybrid unstructured grids: The THINC method with quadratic surface representation and Gaussian quadrature},
132+
journal = {Journal of Computational Physics},
133+
volume = {349},
134+
pages = {415-440},
135+
year = {2017},
136+
doi = {10.1016/j.jcp.2017.08.028},
137+
author = {Xie, B. and Xiao, F.}
138+
}

paper/paper.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,20 +36,20 @@ CaNS-Fizzy -- Fizzy for short -- is a GPU-accelerated numerical solver for massi
3636
Fizzy is suited for large-scale direct numerical simulations of canonical incompressible two-phase flows, from simple laminar cases to bubble/droplet-laden turbulent suspensions. These flows may be computationally expensive due to the stringent resolution requirements imposed by the direct solution of immersed interfaces dispersed throughout the domain. This demands efficient use of the capabilities of modern computing systems; Fizzy has been developed to include key desirable features that enable this objective: a one-fluid formulation of the two-phase flow governing equations, the use of a fast direct solver for the pressure Poisson equation, and an efficient distributed GPU porting with an interface capturing strategy that is suitable for GPU acceleration.
3737
In addition to the momentum transfer and interface capturing, the code has the capability to solve heat transfer in both fluid phases, and thermal convection based on the Oberbeck-Boussinesq approximation.
3838
Finally, the code has been extensively validated with several benchmark cases that demonstrate the different features of the solver, which are incorporated in the continuous integration workflows of the repository.
39+
The GPU capabilities differentiate Fizzy from other commonly used open-source state-of-the-art incompressible two-phase flow solvers such as [boilingfoam](https://github.com/fmuni/boilingFoam-PUBLIC) [@boilingfoam] and [Basilisk](http://basilisk.fr/) [@gerris,@basilisk], which are more suited to smaller-scale direct numerical simulations in complex geometries. The recently published [FLOW36](https://github.com/MultiphaseFlowLab/FLOW36) [@flow36] is another GPU-ready code with similar features to Fizzy, differing in the interface capturing scheme and in the use of a pseudo-spectral instead of a finite difference approach.
3940

4041
# Mathematical model
4142

42-
A one-fluid formulation of the two-phase flow is employed, solving a single set of governing equations for both phases in the whole domain. The incompressible Navier-Stokes equation, the heat transport equation and the Accurate Conservative Diffuse Interface (ACDI) transport equation are evolved in time to compute the velocity and pressure, temperature and phase indicator fields respectively. The latter identifies the regions of the domain occupied by either phase: it is continuous and smooth over the whole domain, and the interface between phases is diffuse. The thermophysical and transport properties (density, viscosity, thermal conductivity, specific heat capacity) are linearly mapped over the phase indicator field, and thus also continuous and smooth. The surface tension at the interface is included as a Continuous Surface Force (CSF) [@Brackbill:1992] in the Navier-Stokes equation.
43-
See [@Costa:2018] and [@Jain:2022] for more details.
43+
A one-fluid formulation of the two-phase flow is employed, solving a single set of governing equations for both phases in the whole domain. The incompressible Navier-Stokes equation, the heat transport equation and the phase indicator transport equation are evolved in time to compute the velocity and pressure, temperature and phase indicator fields respectively. The latter identifies the regions of the domain occupied by either phase: it is continuous and smooth over the whole domain, and the interface between phases is diffuse. The thermophysical and transport properties (density, viscosity, thermal conductivity, specific heat capacity) are linearly mapped over the phase indicator field, and thus also continuous and smooth. The surface tension at the interface is included as a Continuous Surface Force (CSF) [@Brackbill:1992] in the Navier-Stokes equation.
4444

4545
# Methods and Implementation strategy
4646

4747
The governing equations are spatially discretized with a second-order finite difference scheme on a 3D Cartesian grid; a staggered grid arrangement is used for the velocity field, while all other quantities are stored at the cell centers; time integration is based on a low-storage three-step Runge-Kutta scheme. The incompressible Navier-Stokes equation is solved with a pressure correction scheme to enforce mass conservation, which yields a variable coefficient Poisson equation for the pressure correction: a splitting technique adapted from [@Dong:2012] and [@Dodd:2014] transforms this equation into a constant coefficient Poisson equation [@Frantzis:2019], enabling the use of the fast direct FFT solver of the CaNS code. Fizzy also allows for solving the conventional variable-coefficients problem using a geometric multigrid method through the [Hypre](https://github.com/hypre-space/hypre) library.
48-
The diffuse interface representation of the phase interface allows for continuous and smooth mapping of the physical fields across the interface, and it requires no explicit interface reconstruction thanks to the interface regularization flux, keeping the computational load constant regardless of the local interface topology and thus making the algorithm particularly suited for parallelization on GPU architecture. The momentum equation includes the flux associated with the diffuse interface regularization, which allows for a mass--momentum consistent discretization and enables stability at high density contrasts between phases.
48+
The interface capturing scheme for the phase indicator transport equation can be chosen between the Accurate Conservative Diffuse Interface (ACDI) scheme [@Jain:2022] and a tailored flavour of the THINC algebraic Volume-of-Fluid method [@thinc]. Both methods share a diffuse interface representation of the phase interface that allows for continuous and smooth mapping of the physical fields across the interface. The ACDI method requires no explicit interface reconstruction thanks to an interface regularization flux; for the VoF method the interface geometry in each cell is simplified to allow analytic calculation of the interface-cell intersection and of the advection fluxes at cell faces: thus in both methods the computational load is kept constant regardless of the local interface topology, making the algorithm particularly suited for parallelization on GPU architecture. For the ACDI method, the momentum equation includes the flux associated with the diffuse interface regularization, which allows for a mass--momentum consistent discretization and enables stability at high density contrasts between phases. The heat equation similarly includes the enthalpy flux associated with the interface regularization.
4949

50-
The code is written in modern Fortran, and is parallelized using MPI and OpenACC directives for GPU kernel offloading and host/device data transfer. As in CaNS, Fizzy leverages [cuDecomp](https://github.com/NVIDIA/cuDecomp) [@Romero:2022] for distributed memory calculations in pencil domain decompositions, and [cuFFT](https://docs.nvidia.com/cuda/cufft/) for computing Fourier transforms. On CPUs, the code uses [2DECOMP&FFT](https://github.com/2decomp-fft/2decomp-fft) [@Rolfo:2023] and [FFTW](https://www.fftw.org/) to perform the same operations.
50+
The code is written in modern Fortran, and is parallelized using MPI and OpenACC directives for GPU kernel offloading and host/device data transfer. As in CaNS, Fizzy leverages [cuDecomp](https://github.com/NVIDIA/cuDecomp) [@Romero:2022] for distributed memory calculations in pencil domain decompositions, and [cuFFT](https://docs.nvidia.com/cuda/cufft/) for computing Fourier transforms. These libraries are designed to work on NVIDIA GPUs; in the future Fizzy will support other GPU hardware, following updates on CaNS. On CPUs, the code uses [2DECOMP&FFT](https://github.com/2decomp-fft/2decomp-fft) [@Rolfo:2023] and [FFTW](https://www.fftw.org/) to perform the same operations.
5151

52-
Users can design and run a simulation by specifying the physical and computational parameters in a simple Fortran namelist input file. The code uses a modular, procedural design which makes extensions with different numerical methods or physical phenomena easy to develop. In the short term, we aim to allow for different interface tracking algorithms (e.g., based on the volume-of-fluid method), along with alternative schemes for spatial and temporal discretization.
52+
Users can design and run a simulation by specifying the physical and computational parameters in a simple Fortran namelist input file. The code uses a modular, procedural design which makes extensions with different numerical methods or physical phenomena easy to develop. In the short term, we aim to allow for alternative schemes for spatial and temporal discretization, and introduce different interface capturing schemes, e.g. geometric VoF.
5353

5454
Finally, the code was designed so that important new computational features in the parent solver CaNS (e.g. porting efforts to other architectures) are easily propagated to Fizzy.
5555

@@ -61,8 +61,7 @@ Finally, the code was designed so that important new computational features in t
6161

6262
# Computational performance
6363

64-
Fizzy is tailored for large-scale simulations that exploit the computational capacity of modern GPU clusters with full GPU occupancy. The most relevant metric of the parallel efficiency for such scenario is a weak scaling test that determines the penalty in increased wall-clock time occurring when the problem size is increased alongside the computational resources. The liquid-liquid emulsion in homogeneous isotropic turbulence case of \autoref{fig:examples} (Left) has been used for this test: the size of the computational domain has been extended in one direction linearly with the number of GPU nodes employed. The test has been carried out on the GPU partition of the supercomputer Leonardo from Cineca, Italy; each computing node is equipped with four NVIDIA A100 SXM6 64GB GPUs, and is able to fit at full memory a $1024^3$ ($\sim$ 1 billion grid cells) computational box. \autoref{fig:performance} shows the performance penalty as the problem domain size (i.e. the number of spatial degrees of freedom) is increased from occupying 4 nodes (16 GPUs) to 64 nodes (256 GPUs): the 16 times larger computation takes only about 1.7 times longer than the original 4-node computation.
65-
The key contributor to the parallel performance is the interface capturing approach used in the ACDI method, which prevents thread divergence in GPU kernels, as the computational load is independent of the local interface morphology due to the lack of explicit interface reconstruction. Indeed, very little sensitivity of the wall-clock time per iteration to the amount of interface area is observed, even for unsteady evolution of the interface with drastic topology changes during break-up events.
64+
Fizzy is tailored for large-scale simulations that exploit the computational capacity of modern GPU clusters with full GPU occupancy. The most relevant metric of the parallel efficiency for such scenario is a weak scaling test that determines the penalty in increased wall-clock time occurring when the problem size is increased alongside the computational resources. The liquid-liquid emulsion in homogeneous isotropic turbulence case of \autoref{fig:examples} (Left) has been used for this test: the size of the computational domain has been extended in one direction linearly with the number of GPU nodes employed. The test has been carried out on the GPU partition of the supercomputer Leonardo from Cineca, Italy; each computing node is equipped with four NVIDIA A100 SXM6 64GB GPUs, and is able to fit at full memory a $1024^3$ ($\sim$ 1 billion grid cells) computational box. \autoref{fig:performance} shows the performance penalty for the ACDI method, as the problem domain size (i.e. the number of spatial degrees of freedom) is increased from occupying 4 nodes (16 GPUs) to 64 nodes (256 GPUs): the 16 times larger computation takes only about 1.7 times longer than the original 4-node computation. A similar performance is obtained using the algebraic VoF method. The key contributor to the parallel performance is the interface capturing approach which prevents thread divergence in GPU kernels, as the computational load is independent of the local interface morphology. Indeed, very little sensitivity of the wall-clock time per iteration to the amount of interface area is observed, even for unsteady evolution of the interface with drastic topology changes during break-up events.
6665

6766
![Weak scaling performance on GPU nodes at full memory. The vertical axis shows the wall-clock time normalized by the four-node case.\label{fig:performance}](weak_scaling.png){ width=60% }
6867

0 commit comments

Comments
 (0)