-
Notifications
You must be signed in to change notification settings - Fork 18
WIP: Grudge array context #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
nchristensen
wants to merge
313
commits into
inducer:main
Choose a base branch
from
nchristensen:grudge-array-context
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 15 commits
Commits
Show all changes
313 commits
Select commit
Hold shift + click to select a range
3126548
Use transformations with 1D and 2D diff kernels
nchristensen f83a2c7
fix tests
nchristensen 5f6588d
Merge remote-tracking branch 'upstream/main'
nchristensen 7024356
Push current to update multiple dispatch branch
nchristensen c1d9e23
Move tags to separate file and rename
nchristensen cff6e74
add initial ParameterValue tag
nchristensen dd8e365
uncomment ParameterValue tags
nchristensen 2991339
fix variable name
nchristensen 0e46617
Use diff kernel from generator function for now
nchristensen e2dbb4b
Make all diff kernels use the same generator function
nchristensen 829267b
optimize face_mass kernel, fix strides on prefetches
nchristensen 5219df7
remove ptx file
nchristensen bae19e9
re-implement accidentally blown away autotuner changes
nchristensen 9eddf34
re-add elwise_linear autotuner support
nchristensen e656a74
move face_mass kernel generator to separate function
nchristensen d40e385
add face_mass support in autotuner + update exhaustive search
nchristensen 0e3010d
move option settings to grudge_array_context
nchristensen 98d08d0
testing order 4
nchristensen c280dee
Merge remote-tracking branch 'upstream/main'
nchristensen 1754f00
Add basic on-the-fly autotuning support
nchristensen cabca66
Merge up to c7e79e5
nchristensen 89bda2a
fixes to make work with newer loopy
nchristensen d381b43
push wave-op with temporary delation
nchristensen 3ee450c
Add generic test function for mxm kernels
nchristensen 5a4414b
More autotuning support
nchristensen 0044fa5
Add autotuner support for nodes
nchristensen 3d04db9
set memory layout in flatten
46ea990
push current in preparation for merge
nchristensen 4a9c19f
Merge changes from main branch
nchristensen e51bbcf
Auto tuner improvements
nchristensen bc8b49d
Enable resample by mat autotuning
nchristensen fb73133
distinguish between time steps
nchristensen c6c870d
allow elwise_linear kernel to accept nonsquare operations
nchristensen 68a9368
fix output labels
nchristensen ee33013
Add support for testing runs restarted at arbitrary point
nchristensen 3d27e23
turn on face_mass optimization in random search
nchristensen e138314
Åremove old comments
nchristensen 6ab57e0
use spaces instead of tabs
nchristensen 1262b68
workaround for Lassen weirdness
nchristensen 4139ddc
change starting parameters
nchristensen 4a3f142
Merge branch 'grudge-array-context-update' of github.com:nchristensen…
nchristensen 1433f11
Allow semi-constant total number of points
nchristensen 6832a1e
merge changes
nchristensen dde17bf
merge changes
nchristensen 9b836e8
Delete HEAD marker
nchristensen b9911ce
fix bandwidth calculation
fc83ff4
variable for search method
nchristensen 307b325
Merge branch 'grudge-array-context-update' of github.com:nchristensen…
nchristensen f681840
preparing for merge
nchristensen b67da86
pull down upstream
nchristensen 0b28da7
add comment to run_tests.py
nchristensen eed156d
remove binary file
nchristensen c7cf575
push hjson files
nchristensen 1074080
merge upstream
nchristensen ae0509d
wrap tags in list
nchristensen e1075ef
Merge commit 'b197e6198689fcac2c0dfd503286e28806e9502f' into grudge-a…
nchristensen 59cb35a
Use default_entrypoint.name
nchristensen 958d5d4
use default_entrypoint
nchristensen 821b14d
Merge commit '9e88251de1c1b8754447c9c8cfdf416f1f6112bc' into grudge-a…
nchristensen 64314c9
add stack implementation
nchristensen 12ed9cc
Add KernelData tags to rest of einsum calls for wave-op-mpi example
nchristensen a77ec74
Fix area_elements merge conflict
nchristensen 72ae5ca
Put opencl event in dictionary
nchristensen 7946267
Stub for parameter list creation
nchristensen 6af48d4
Create autotune parameter list
nchristensen 55a1688
add spock changes to run_tests.py
nchristensen aaee886
fix merge conflict
nchristensen a90a51a
start parallel autotuning with charm++
nchristensen fd737d4
Merge branch 'grudge-array-context-update' of github.com:nchristensen…
nchristensen f1b9632
fix indentation
nchristensen e008ff0
Don't specify output parameter in dt_non_geometric_factors
nchristensen 35de234
Use array instead of map
nchristensen 3241e5b
Functional pool autotuner
nchristensen 9940c58
Working parallel autotuning
nchristensen 592c5ee
push parallel_autotuning_v2.py
nchristensen dbf1b72
Overwrite Pool __init__ function
nchristensen 7d8ce71
cleanup parallel autotuning code
nchristensen be8b914
Merge remote-tracking branch 'upstream/main' into grudge-array-contex…
nchristensen 70aee06
Update parallel autotuning code
nchristensen c894ae0
More parallel autotuning code
nchristensen 7d607c6
Separate parameter space creation and transformation application
nchristensen 127c6cd
More wave-op autotuning support
nchristensen 0aacb25
Merge upstream changes
nchristensen 61ab347
Transformation generator improvements, move KernelDataTag
nchristensen ca0ef57
Fix file writing, refactor array contexts a bit
nchristensen caa3a2c
Change wave op defaults
nchristensen ffe5a19
Merge upstream changes to wave-op-mpi
nchristensen 666e0af
Array context refactoring
nchristensen d681328
Comment possibly unneeded thaw implementation
nchristensen 5f93db9
Shrink search space a bit
nchristensen ff25c72
merge upstream changes
nchristensen 378813f
Merge branch 'grudge-array-context-update' of github.com:nchristensen…
nchristensen 67a9118
Fix tag imports
nchristensen 82b7fcb
Get events directly from pyopencl array object
nchristensen 6dc7aef
memoize einsum
nchristensen c7b7189
Move appication of einsum tags to separate method
nchristensen e418174
Fix tag imports
nchristensen 2ddd97f
fix import
nchristensen cb0ff6e
Fix imports
nchristensen ae754e6
Fix merge conflict
nchristensen 0454155
Handle EinsumArgsTags
nchristensen 256414b
Implement thaw
nchristensen fe5608b
Merge branch 'grudge-array-context-update' of github.com:nchristensen…
nchristensen 0b6c6c2
Update property name
nchristensen 120f108
update actx_special calling code
nchristensen 0cec662
Hack to fix unflatten for Fortran ordering
nchristensen 4416224
Transform ctof kernels
nchristensen edb799e
resample by picking transformations
nchristensen c42c1b8
Remove test_results.txt
nchristensen 4cc83d8
remove ci-support.sh
nchristensen d18888e
Remove empty file
nchristensen ee13843
Re-add blank line
nchristensen 17ffcf6
Remove wave-min.py
nchristensen 4301392
Remove unneded file
nchristensen e12aa6b
remove nvprof file
nchristensen 4e8b2bc
remove unneeded nvprof and hjson files
nchristensen 33ab088
remove more unneeded files
nchristensen daf4872
update requirements.txt
nchristensen 38f0420
Cleanup compiler.py
nchristensen 577650d
Remove blank lines
nchristensen b0072cd
flake8 fixes
nchristensen 3cea7e2
Add note
nchristensen f0b4164
flake8 fixes
nchristensen f954f07
Merge remote-tracking branch 'upstream/main'
nchristensen d618e8f
grudge_array_context.py changes
nchristensen 4c4e75e
Remove ptx files
nchristensen 6001534
Merge branch 'upstream/main' into grudge-array-context
nchristensen f961465
remove transformations directory
nchristensen d1b9f3c
Single return tin dagrt-fusion.py
nchristensen 63410ba
Allow pickling kernels
nchristensen 14aac4a
merge changes
nchristensen 0ff50a8
Update parallel autotuning code
nchristensen b0ac435
Update parallel autotuning code
nchristensen b631d3d
Parallel autotuning updates
nchristensen 6011daf
fix merge conflicts
nchristensen 6dcbd3d
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 0832a81
fix setup.py
nchristensen 8b1e596
Inherit from MPIPyOpenCLArrayContext
nchristensen 3558b25
Use strings instead of raw numbers for axis sizes
nchristensen 40bc9ab
Actually remove quad_tag_to_group_factory
inducer 0c44361
Remove get_distributed_boundary_swap_connection, DGDiscretizationWith…
inducer 76725d6
Fix sectioning, remove DColl.order
inducer f42f141
Rip _get_dist_boundary_connections_single_vol out of DColl.__init__
inducer ef39455
Drop removed stuff from pylintrc
inducer 85d9812
Support multiple volumes in a discretization collection
inducer dc600ec
Parallel autotuning changes
nchristensen 3aa8f6e
Merge branch 'main' into multi-volume
majosm b3de25e
use partition-ID-enabled meshmode
majosm ea570ee
temporarily change meshmode branch
majosm 7cc98f6
fix docs
majosm 3aa2dcb
parallel autotuning fixes
nchristensen 577b9ec
parallel autotuning fixes
nchristensen ccd01a1
Comment print statements, support for new resample kernel
nchristensen 9a825a9
Transformations for small matrices
nchristensen 053e484
Fix transformations for small einsum kernels
nchristensen f37e97d
Workaround for CUDA timing bug
nchristensen 08740d4
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 4358331
Change unique program id method
nchristensen 4d31441
Guard against zero length args in resampling kernel
nchristensen 43bb708
KernelSavingAutotuningArrayContext
nchristensen 18b11ff
resample by picking transformation fixes
nchristensen 6d88ed5
Fix merge conflict
nchristensen e8ffd47
better bandwidth count
nchristensen a44b2ff
Shared memory fix for autotuner
nchristensen b7a3fb8
limit use of local memory
nchristensen ff6c73e
Merge remote-tracking branch 'origin/main' into multi-volume
majosm 7ea9d73
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm ee41a3f
exit if transformation file not found
nchristensen e66d68f
Restructure array contexts
nchristensen 14f2762
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 2a04c59
point requirements.txt to loopy branch
nchristensen 49f8589
merge upstream
nchristensen d0b93cd
Merge branch 'main' into grudge-array-context
nchristensen 31b59bf
Move bs4 import
nchristensen 88bcd26
Fix file creation
nchristensen c72b81a
Use absolute path to file creation
nchristensen fdc4f57
fix file creation
nchristensen 5534936
Set to fortran layout before pickling
nchristensen 6027392
Memory layout fixes
nchristensen e8fd9b2
Fix merge conflicts
nchristensen 4737b1a
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 5ee00b4
Merge remote-tracking branch 'origin/main' into multi-volume
majosm e705304
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm 7d2c974
Merge remote-tracking branch 'origin/main' into multi-volume
majosm c1b68c7
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm ea9d929
Merge remote-tracking branch 'upstream/main' into grudge-array-context
nchristensen 2264117
Update autotuning script
nchristensen 53e0b6b
Parallel autotuning fixes, support for element-contiguous data layout
nchristensen 9762dbc
Test different local memory data layouts
nchristensen 993ca1f
fix stride calculation
nchristensen 98d4119
Allow testing different local memory data layouts
nchristensen 656fc4b
Add alternative pool implementations
nchristensen f8efd52
Merge remote-tracking branch 'origin/main' into multi-volume
majosm d072923
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm b1372c3
Create queue outside of task
nchristensen 69ded2e
Update charm4py script
nchristensen 318e0c8
update charm4py script
nchristensen 5620c0b
Merge remote-tracking branch 'origin/main' into multi-volume
majosm fe54464
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm 09bac09
promote to part ID instead of using part ID helper
majosm c46a6b5
clarify part vs. partition terminology
majosm 06ff8f6
account for explicit part_id attribute in InterPartAdjacencyGroup
majosm cac1b24
Merge remote-tracking branch 'origin/main' into multi-volume
majosm d7cd9db
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm fc4c3cb
Merge remote-tracking branch 'origin/main' into multi-volume
majosm 4743538
Merge remote-tracking branch 'origin/multi-volume' into multi-volume
majosm 49d627c
use dataclass instead of tuple for PartID
majosm b2219d8
move PartID conversion setup stuff into _normalize_mesh_part_ids
majosm a05c626
reset requirements.txt
majosm fb8a34b
accept general integral types as MPI ranks
majosm 6397cdf
redesign inter-volume trace pair functions
majosm 91b60e7
reapply zero addition to local_bdry_data
majosm 65649df
handle all-Number cases
majosm 429ce7b
cosmetic change
majosm 917c212
fix bug
majosm 337b994
tweak as_dofdesc normalization
majosm 3c4e912
fix bugs
majosm ac98e49
don't try to create trace pair if there's no adjacency
majosm 04f7330
forget about heterogeneous inter-volume trace pairs for now
majosm 2d369e4
add FIXME
majosm 2b13e5a
fix bug
majosm fd644cf
ignore flake8-bugbear error
majosm 27b43f5
add temporary workaround for make_discretization_collection + EagerDG…
majosm e936a20
add volume_dd to make_visualizer
majosm 06fcacc
implement multi-volume support in op
majosm f1f4cfc
fix doc in characteristic_lengthscales
majosm 2a5fe53
fix memoization in a couple of spots
majosm e004677
get contextual volume tags for BoundayDomainTags as well
majosm bcfde03
tag some axes
majosm dd2ca36
Account for fp_bytes
nchristensen ffab94c
Fix merge conflict
nchristensen 3894883
fix ArrayOrContainerT import
nchristensen 08133d1
fix merge conflict
nchristensen e2980fe
Merge upstream
nchristensen e074d29
Fix merge conflicts
nchristensen 9d035d7
Move queue creation back outside of task again
nchristensen 71680b6
fix variable name
nchristensen 1cb73b7
single indirection autotuning
nchristensen 0248b32
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 1bb842c
add axis tags back
nchristensen c0e20dc
Fix some dimensions
nchristensen 8f39651
Follow same pattern as upstream
nchristensen 7b88934
Guard against DOF arrays with zero elements
nchristensen 0a49f4d
guard against 0 elements
nchristensen 9b7bf17
Guard against 0 element in 2to2 einsum
nchristensen e31106d
Guard against 0 elements
nchristensen d132fe3
Guard against 0 elements
nchristensen e007b5f
single indirection transformation generator
nchristensen d7a3829
Merge branch 'grudge-array-context' of github.com:nchristensen/grudge…
nchristensen 1504227
Allow randomized index entries
nchristensen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,129 @@ | ||
| from meshmode.array_context import PyOpenCLArrayContext, make_loopy_program | ||
| from meshmode.dof_array import DOFTag | ||
| from grudge.execution import VecDOFTag, FaceDOFTag | ||
| import loopy as lp | ||
| import pyopencl | ||
| import pyopencl.array as cla | ||
| import loopy_dg_kernels as dgk | ||
| import numpy as np | ||
|
|
||
| ctof_knl = lp.make_copy_kernel("f,f", old_dim_tags="c,c") | ||
| ftoc_knl = lp.make_copy_kernel("c,c", old_dim_tags="f,f") | ||
|
|
||
| # Really this is more of an Nvidia array context probably | ||
| # Maybe not if loading from file? | ||
| class GrudgeArrayContext(PyOpenCLArrayContext): | ||
|
|
||
| def __init__(self, queue, allocator=None): | ||
| super().__init__(queue, allocator=allocator) | ||
|
|
||
| def empty(self, shape, dtype): | ||
| return cla.empty(self.queue, shape=shape, dtype=dtype, | ||
| allocator=self.allocator, order='F') | ||
|
|
||
| def zeros(self, shape, dtype): | ||
| return cla.zeros(self.queue, shape=shape, dtype=dtype, | ||
| allocator=self.allocator, order='F') | ||
|
|
||
| # Probably can delete this | ||
| ''' | ||
| def call_loopy(self, program, **kwargs): | ||
|
|
||
| #print("Program: " + program.name) | ||
| if program.name == "opt_diff": | ||
| #diff_mat = kwargs["diff_mat"] | ||
| #result = kwargs["result"] | ||
| #vec = kwargs["vec"] | ||
| print(kwargs) | ||
|
|
||
| # Create input array | ||
| #cq = vec.queue | ||
| #dtp = vec.dtype | ||
|
|
||
| # Esto no deberia hacerse aqui. | ||
| #_,(inArg,) = ctof_knl(cq, input=vec) | ||
| #inArg = vec.copy() | ||
|
|
||
| # Treat as c array, can do this to use c-format diff function | ||
| # np.array(A, format="F").flatten() == np.array(A.T, format="C").flatten() | ||
| #inArg.shape = (inArg.shape[1], inArg.shape[0]) | ||
| #inArg.strides = cla._make_strides(vec.dtype.itemsize, inArg.shape, "c") | ||
| #outShape = inArg.shape | ||
|
|
||
| # Really should be passed in rather than re-allocated each time | ||
| #... c'est avec kwargs["result"] | ||
| #argDict = { "result1": cla.Array(cq, vec.shape, dtp, order="f"), | ||
| # "result2": cla.Array(cq, vec.shape, dtp, order="f"), | ||
| # "result3": cla.Array(cq, vec.shape, dtp, order="f"), | ||
| # "vec": vec, | ||
| # "mat1": diff_mat[0], | ||
| # "mat2": diff_mat[1], | ||
| # "mat3": diff_mat[2] } | ||
|
|
||
| #super().call_loopy(program, **argDict) | ||
|
|
||
| #result = [argDict["result1"], argDict["result2"], argDict["result3"]] | ||
| #print(result) | ||
| #result = argDict["result1"] #kwargs["result"] | ||
| #print("HERE") | ||
| #print(result) | ||
| #exit() | ||
| # Treat as fortran style array again | ||
| #for i, entry in enumerate(["result1", "result2", "result3"]): | ||
| # argDict[entry].shape = (argDict[entry].shape[1], argDict[entry].shape[0]) | ||
| # argDict[entry].strides = cla._make_strides(argDict[entry].dtype.itemsize, argDict[entry].shape, "f") | ||
| # This should be unnecessary | ||
| # Il est necessaire pour le moment a cause du "ctof" d'ici. | ||
| #ftoc_knl(cq, input=argDict[entry], output=result[i]) | ||
| result = super().call_loopy(program, **kwargs) | ||
| #else: | ||
| result = super().call_loopy(program,**kwargs) | ||
|
|
||
| return result | ||
| ''' | ||
|
|
||
| #@memoize_method | ||
| def _get_scalar_func_loopy_program(self, name, nargs, naxes): | ||
| prog = super()._get_scalar_func_loopy_program(name, nargs, naxes) | ||
| for arg in prog.args: | ||
| if type(arg) == lp.ArrayArg: | ||
| arg.tags = DOFTag() | ||
| return prog | ||
|
|
||
|
|
||
| # Side note: the meaning of thawed and frozen seem counterintuitive to me. | ||
| def thaw(self, array): | ||
| thawed = super().thaw(array) | ||
| if type(getattr(array, "tags", None)) == DOFTag: | ||
| cq = thawed.queue | ||
| _, (out,) = ctof_knl(cq, input=thawed) | ||
| thawed = out | ||
| # May or may not be needed | ||
| #thawed.tags = "dof_array" | ||
| return thawed | ||
|
|
||
| #@memoize_method | ||
| def transform_loopy_program(self, program): | ||
|
|
||
| #print(program.name) | ||
| for arg in program.args: | ||
| if type(arg.tags) == DOFTag: | ||
| program = lp.tag_array_axes(program, arg.name, "f,f") | ||
| elif type(arg.tags) == VecDOFTag: | ||
| program = lp.tag_array_axes(program, arg.name, "sep,f,f") | ||
| elif type(arg.tags) == FaceDOFTag: | ||
| program = lp.tag_array_axes(program, arg.name, "N1,N0,N2") | ||
|
|
||
| if program.name == "opt_diff": | ||
| # TODO: Dynamically determine device id, don't hardcode path to transform.hjson. | ||
| # Also get pn from program | ||
| filename = "/home/njchris2/Workspace/nick/loopy_dg_kernels/transform.hjson" | ||
| deviceID = "NVIDIA Titan V" | ||
| pn = 4 | ||
|
|
||
| transformations = dgk.loadTransformationsFromFile(filename, deviceID, pn) | ||
| program = dgk.applyTransformationList(program, transformations) | ||
| else: | ||
| program = super().transform_loopy_program(program) | ||
|
|
||
| return program | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.