Add a tool for plotting distribution functions in GK#183
Add a tool for plotting distribution functions in GK#183Maxwell-Rosen wants to merge 31 commits intomainfrom
Conversation
…data. This gets complex in gyrokinetics because of the velocity jacobian. Also, we are currently not capable of using a mapc2p or non-uniform mesh map for plotting distribution function. I will add non-uniform mesh grids in a future commit
… time. Finally, we can plot both non-uniform velocity and position space grids in command line pgkyl
|
One has to be a bit careful with this type of operation. There is a very significant aliasing error that is incurred when you interpolate Jf and J onto our interpolation points and then divide them. In Vlasov, we just make the choice to always write out f, performing a special alias-free division operation at the quadrature points we know the Jacobian at, and then when we restart we read in f, and then do the reverse (a multiplication at the quadrature points we know the Jacobian at). See: https://github.com/ammarhakim/gkeyll/blob/kitp-hamil-vlasov/vlasov/apps/vm_species.c#L540 Which is to say, you can do this and it will sort of give you what you want, but if you start chasing fluctuations in the distribution function down as a result of this post-processing technique (or if someone asks why when they plot f there is a ringing around sharp gradients) you are very likely chasing just an error in this plotting routine, not an error in the code. |
manauref
left a comment
There was a problem hiding this comment.
Nice work. Some requests:
- Please state what parts of this are generated by LLMs, and how you tested them.
- You need to add an option like -m or -b for multiblock, see gk-nodes.
- Please make sure it works from a python script, and put an example in the PR.
- Please check that later operations on f, like integrating over a dimension, or operation on with
evwork. - Please make it work with multiple frames, so that one can feed it to
animorcollect.
|
I really appreciate your positive and prompt feedback on what felt like a frustration-fueled coding session. I genuinely do think this will improve our pgkyl environment and experience. @JunoRavin
|
…an example script
import matplotlib.pyplot as plt
import postgkyl as pg
distf = pg.commands.build_gk_distf(
name="gk_mirror_boltz_elc_poa_2x2v_p1",
species="ion",
frame=0)
pg.data.select(distf, z0=0, z1=0.0, overwrite=True)
pg.output.plot(distf)
plt.show()
|
Here is an example python script which uses this tool. |
…s giving an error when collecting in time, then the grid gets messed up
…w we do data loading. This command is basically a loading command, but has some computation underneath, but is just loading some data.
Writing out f in GK is wasteful. The way to deal with this is to teach postgkyl how to do weak division. Now that we have all the kernels in a shared gkeyll library it'll be easy to tell python how to use them (I have examples). It'll bit a bit slow because it'll be done on a single CPU, but for postprocessing and just doing 1 or a few frames it's ok. |
|
Performing this operation with weak division and multiplication would be fantastic! Could you send me some examples of how to do this? I don't think it would be difficult to do this here if I have the correct examples and infrastructure. If not, perhaps we revisit the weak division when it is fully implemented. |
…he code base. Reduce the mutliblock infrastructure to not stitch together distribution functions, but just to specify 1 block to plot at a time
It doesn't have to double the disk storage because you can always reuse arrays in I/O (or in the Vlasov case, we do actually need an array for storing f without the Jacobian factor because we do the updates in general geometry differently than GK, constructing f from the volume Jf and J and then evaluating that quantity at the surface instead of utilizing ratios of Jacobians at the surface). Edit: I just want to add the biggest reason I chimed in is not because I think this is wrong or was discouraging you. I just was sharing an experience that doing the division after interpolation can have some rather large errors and if you find yourself chasing down where those errors come from or annoyed at how your distribution functions look, there's a potential very straightforward reason.
You can also further improve the accuracy of evaluation of f by performing the division at Gauss-Legendre quadrature points, because the way we have implemented general geometry utilizes the fact that we know the Jacobian at specific points and it is a static-in-time function. So it's a further accuracy improvement on the distribution function quantity to just divide out the variation of the Jacobian at the special points you know the Jacobian at (and then either convert back to modal -> interpolate, or interpolate from the subsequent nodal basis, or even do a nonuniform plotting from the quadrature points). Just another way to avoid aliasing in this case because of the special construction of the geometry factors we use. |
I think one thing we should be mindful of here as we use LLMs more is when the LLM generates something that is, at least to my eye, more sophisticated from an error checking perspective than other postgkyl tools. My experience with LLMs in Python is they can very much over-engineer solutions and ultimately clutter an output script/tool with lots of try-except statements. In this case, some of the error checking makes sense (is the Jacobian array compatible with the distribution function array?). Some of the other error checking I'm less sure about (it's not clear to me what matched_frames is doing). Plus, a sizable chunk of the sophistication of this script appears to be the ability to batch the division and read in multiple distribution functions at once. But is that really something we want to support this way? In post-processing, I can see an argument that distribution functions are sufficiently large as-is that the benefits of batching may be not transparent, and it may be better for tools like this to have another wrapper around the function call that allows you to loop over a number of frames and perform the operation. If you eliminated the batching procedure, would handling multi-block be easier because you wouldn't be trying to combine a batched read and manipulation with determining how the multi-block geometry stitches together? I'm not anti-LLM myself. I think these tools can be useful. But in situations like this where you have a well-defined output in a language like Python where there are dozens of ways to accomplish the same task, it's not clear to me that the LLM solution is necessarily the solution we should settle on. Especially when by default the way LLMs write Python is informed by tons of in-distribution code with thousands of lines of error-catching in any given bit of functionality. Error-catching is good, but if we look at any error-catching and wonder if that is even an edge case which should be possible, it may inform the iteration with the LLM for what we want the final output code to look like. |
There was a problem hiding this comment.
This is an over-complicated piece of code for something that is relatively easy to do. It has no maintainability, and I doubt anyone will be able to modify it or fix issues as/when they arise.
I want this simplified along the lines of other custom commands we have. In general, I am not in favor of custom commands, but it is fine for special tasks. Perhaps this is one of these cases. But the code, in that case should follow a regular pattern and not be completely different from every other custom tool.
In each custom tool we need to do similar things, and so there must be common patterns. Mana already has a custom command for computing energy etc. So please follow that pattern. This seems like an overly complicated and engineered piece of code that is unmaintainable.
To avoid such issues from creeping into critical pieces of software like Gkeyll and postgkyl I will likely not allow this PR to be merged unless this code is simplified significantly. I suggest @Maxwell-Rosen you work with @manauref, compare your custom tool with his and get the design patterns unified first.
… error checking, enhance frame parsing, and improve code organization for clarity.
…line frame resolution logic, and enhance grid application for clarity and efficiency.
…ation helper functions and removing unnecessary comment separators for better readability.
…rs and add frame loading log for improved clarity
…ency in file naming
|
Great comments. I've re-organized the code, removing the error detection and condensing the logic. I see the value in both ways of structuring the code. On one hand, now there is almost no error detection for using this tool incorrectly, so errors and failures will require digging into the code and debugging manually. On the other hand, lots of error detection makes it difficult to read. I'll compromise by just making sure the files exist when we are loading them. I think one big way that this code could be simplified is if we could load gkeyll_data with both a mapc2p and a mapc2p-vel mapping. Currently, the reader itself is not designed for this Otherwise, I think I refactored this to be more maintainable and readable. This function does do the interpolation under the hood because we divide the components of |
…n the loading of the data that didn't need to be there. Clarify why the jacobian needed to be resnaped. Also add a reshaping to the distribution function because it has 1 component after being interpolated
…_dimension function
…n't have a default
|
I think @ammarhakim and @manauref would like the changes I made to the script. I basically re-wrote it all. I removed all the error handling, and the script is 3x smaller. I removed a few functions that were < 3 lines of code. I added some clarifying comments to improve readability. I would think the current version is very maintainable and extendable. I do think the script would be better designed from a user standpoint with more elegant error detection, but I do also appreciate the compact and dense form that this has. |
manauref
left a comment
There was a problem hiding this comment.
Good work: Three more overarching comments:
A) In an earlier comment you showed how to call this from a script. Have you actually tested it? once you ensure it is works, please put that concise example in the main description of the PR
B) From your examples it seems like this command works with other commands. Have you/can you test that it works with
anim, collect, ev (specifically with things like avg, sqrt, pow, int), integrate, write ?
C) I don't think this command has to be a gk_ command. It could be made to work with Vlasov as well, if the J_x, J_v, B (or call it J_phase) are made optional.
src/postgkyl/data/gdata.py
Outdated
| if "grid_type" not in self.ctx: | ||
| if all(len(np.asarray(coord).shape) == 1 for coord in self._grid): | ||
| self.ctx["grid_type"] = "uniform" | ||
| else: | ||
| self.ctx["grid_type"] = "mapped" |
There was a problem hiding this comment.
why do you have to change gdata.py?
There was a problem hiding this comment.
This change was a bit outside the scope of the PR, but I did experience a crash when querying the info of the grid. The issue was that if the grid type is not specified, the info command will error and not display the info. Perhaps we should have this default to "undefined" when self._grid isn't specified. I wasn't really sure what to do here and this was a patch generated by copilot. It does fix the crash, and I can see why. When a grid object is set, we must also set the grid type.
There was a problem hiding this comment.
I don't think this should be necessary. You could post a minimum working example that reproduces the crash without this change so we can understand it (rather than blindly let copilot change the code).
I think you need to need to run the unit tests, and yourself run a number of commands on moment, vlasov and gk data to make sure it's ok.
I can see that the automatic checks below are not passing, so it could be due to this.
There was a problem hiding this comment.
Okay. I just removed it. If it causes bugs in the future, we can update
| from postgkyl.commands.mhd import mhd | ||
| from postgkyl.commands.parrotate import parrotate | ||
| from postgkyl.commands.gk_energy_balance import gk_energy_balance | ||
| from postgkyl.commands.gk_distf import load_gk_distf |
There was a problem hiding this comment.
why do we need to add this here? seems like it could just stay private to the gk_distf.py file
There was a problem hiding this comment.
This is so that we can call pg.commands.load_gk_distf from the script python
| def _convert_cell_centered_to_nodal(cell_centers: np.ndarray) -> np.ndarray: | ||
| """ Given an array defined at cell centers, return the corresponding nodal values | ||
| by interpolating half a cell width at the boundaries.""" | ||
| nodes = np.zeros(cell_centers.size + 1, dtype=cell_centers.dtype) | ||
| nodes[1:-1] = 0.5 * (cell_centers[:-1] + cell_centers[1:]) | ||
| nodes[0] = cell_centers[0] + (cell_centers[0] - nodes[1]) # Cell center plus half a cell width | ||
| nodes[-1] = cell_centers[-1] + (cell_centers[-1] - nodes[-2]) # Cell center plus half a cell width |
There was a problem hiding this comment.
there are probably instances of this elsewhere in postgkyl already. Check, and if equivalent, use code like that. If this is better, please state why.
There was a problem hiding this comment.
I did look for this chunk of code being somewhere else in pgkyl, and I couldn't find any. I can look again.
src/postgkyl/commands/gk_distf.py
Outdated
| jf_file = f"{prefix}-{species}_{frame_infix}{frame}.gkyl" | ||
| mapc2p_vel_file = f"{prefix}-{species}_mapc2p_vel.gkyl" | ||
| jacobvel_file = f"{prefix}-{species}_jacobvel.gkyl" | ||
| mc2nu_file = f"{prefix}-mc2nu_pos_deflated.gkyl" | ||
| jacobtot_inv_file = f"{prefix}-jacobtot_inv.gkyl" |
There was a problem hiding this comment.
one thing that i'm unsure about is whether the name of these files should be hardcoded here. Two reasons why it perhaps should be:
- The user may have differently named files, for a variety of reasons. So they may wish to have an option to specify the file name.
- Gkeyll may change the file names, and this would break.
Some possible solutions could be:
- Give the option to specify each of these files, and
- We store the name of these files in the metadata in Gkeyll.
There was a problem hiding this comment.
Part of what made the pgkyl command tedious is the repetition of the simulation name. That's why I have broken up the naming conventions in this way. This is the way GK does the file naming, so I think it's safe to hard-code these file names.
If gkeyll changes the file names, then this function should be updated
I can add a backup argument to accept arbitrary file names, but we should default to the standard that the gyrokinetics modules use
src/postgkyl/commands/gk_distf.py
Outdated
| def load_gk_distf( | ||
| name: str, species: str, frame: int, | ||
| tag: str = "f", source: bool = False, use_c2p_vel: bool = False, | ||
| use_mc2nu: bool = False, block_idx: int | None = None, |
There was a problem hiding this comment.
shouldn't there be a c2p option too? e.g. if I load the distribution for a cylindrical LAPD sim, and I want to plot f(z=0,vpar=0,mu=0) on the R-Z plane.
There was a problem hiding this comment.
I can add this, but it's not always clear which component maps to what. We can use the --mapc2p-deflated.gkyl file for this, but we shouldn't use the full 3D mapc2p. This is specifically for unmapping the non-uniform coordinates.
There was a problem hiding this comment.
Yeah --c2p using the c2p_deflated file (and we need to make sure that in 3d c2p_deflated.gkyl = c2p_deflated.gkyl if it's not already.
In general we need these things to have the option to take in files specified by the user, in case they want to use nonstandard files, as mentioned in another comment.
Another reason for this: in the case of c2p, a user my want to use a higher resolution c2p so we can make smoother plots (that's how @ dingyunl has been making smooth plots of ASDEX on the R-Z plane, and how I made a smooth plot of 3x WHAM sim on the R-Z plane).
src/postgkyl/commands/gk_distf.py
Outdated
| help="Simulation name prefix (e.g. gk_lorentzian_mirror).") | ||
| @click.option("--species", "-s", required=True, type=click.STRING, | ||
| help="Species name (e.g. ion or elc).") | ||
| @click.option("--source", is_flag=True, |
There was a problem hiding this comment.
i don't like this. It adds an argument for a very specific file from one of many other modules in Gkeyll. What about a Maxwellian from a BGK collision op? what about the heating term from our heating module? etc
In general this command should work with any phase-space quantity, with option to divide by J_x, J_v and/or B, or not.
There was a problem hiding this comment.
We could specify something like --suffix source instead of --source, and that would append the suffix of the file that is loaded. We can do this if it were more general.
There was a problem hiding this comment.
No extra flag is needed. I think it should be possible to do
pgkyl gk-distf <sim_name>-<species>_<frame>.gkyl
or
pgkyl gk-distf <sim_name>-<species>_source_<frame>.gkyl
without passing the optional arguments -n or -s. Likewise, --c2p, --c2p-vel, --mc2nu (and possibly --jacobx, --jacobv and --jacobgyro (the latter for B)) should be optional and possibly take the file name corresponding to these quantities. If the file is given, it uses that. If the flag is given without a file, it uses the hardcoded names. If the flag is not given, it doesn't do that transformation/scaling.
|
I have tested chaining this data to lots of commands, like ev, collect, anim. This command just puts data on the stack and can be manipulated with all our usual tools. The only one that won't work is interp, since this data is already interpolated. I think we should keep this as a gk_ command to avoid feature creep. Different modules have different conventions. It's okay to have a bespoke piece of code that's very useful for one module and not the rest. It might be more effective that if vlasov needs a similar function that they make a vm-distf command |
…ou can do --suffix source instead of --source, which is more general
…pdate usage example




So in Gyrokinetics (and general in Gkeyll) we do not save the distribution function. Instead, we save some horrible quantity
This is because when we evolve the distribution function,$JBf$ is a convenient quantity to work with. However, we like to make plots of the distribution function itself. This PR aims to make it simple to plot a distribution function with 1 command which chains to the rest of the pgkyl workflow. Here is an example of something before:
pgkyl --c2p-vel gk_lorentzian_mirror-elc_mapc2p_vel.gkyl gk_lorentzian_mirror-elc_0.gkyl -t jf gk_lorentzian_mirror-elc_jacobvel.gkyl -t jac ev -t df 'jf jac /' activ -t df interp -b gkhyb -p1 sel --z0 2.5 plThis is unsightly for many reasons. The tagging, the activating, the interpolation, the repetition of gk_lorentzian_mirror. Dividing a phase space array with a velocity space array and a configuration space array is horribly complex (this doesn't even divide by$J_x B$ . Here is the same command with the
gk-distftoolpgkyl gk-distf -n gk_lorentzian_mirror -s elc -f 0 --c2p-vel sel --z0 2.5 plFurthermore, there is an issue with this. We are selecting
--z0 2.5, which is 2.5 COMPUTATIONAL coordinates, not physical Z. So non-uniform space grids are fundamentally incompatible with non-uniform velocity in pgkyl. This can go beyond and get the true end of the simulation domain by just adding a kwargpgkyl gk-distf -n gk_lorentzian_mirror -s elc -f 0 --c2p-vel --mc2nu sel --z0 2.5 plHere are some examples of figures I've generated using a test file
pgkyl gk-distf -n gk_lorentzian_mirror -s ion -f 40 --c2p-vel --mc2nu sel --z2 16 ev 'f abs' pl &pgkyl gk-distf -n gk_lorentzian_mirror -s ion -f 40 --c2p-vel --mc2nu sel --z2 16 ev 'f abs' pl --logz &Notice that the mirror throat is actually at 0.98, as it should be
pgkyl gk-distf -n gk_lorentzian_mirror -s ion -f 40 --c2p-vel --mc2nu sel --z0 0.0 ev 'f abs' pl &Here is a 2x2v case

pgkyl gk-distf -n rt_gk_wham_2x2v_p1 -s ion -f 0 --c2p-vel --mc2nu sel --z0 0.0 --z1 0.0 plTo use this command in script mode, here is an example
Output
