Feature/wrtobin/time hist (#830)

wrtobin · web-flow · commit c0db3e2ecd03 · 2020-08-25T21:37:41.000-07:00
* first pass impl at basic hdf5 time history output of arrays

* initial impl of indexed array time history output

* initial impl of table output for arrays using indices on arbitrary cell dimension and allowing retrieval of arbitrary array components, requires an internal copy and full component specification (no ranges or slices)

* add table clearing and initial check for prexisting table in file

* verify the table is able to be used for the array when opening the table

* adding init tensor type support, checking if output file already exists

* add constructors to handle using indexing and/or/nor component output and add a general array flattening capability based on using a cell-indexer and component-indexer

* remove tensor types (yay); add support for dense raw array table io (for the time-tables); reimplement time history table io using new classes

* addition of an output executable event

* almost a total rewrite/simplification based on new specs from meetings

* adding utility functions to make specifying history tables using arrays easy, starting in on time history output and collector more seriously

* work on test case, simplifying column additions, m_col_sizes wasn't initialized before

* refactor to incorporate multiple time histories in the same output table if they have the same collection cadence and to add an actual time column instead of a seperate time table

* adding time history event unit test, some tweaks to the TimeHistory and TimeHistoryCollector classes

* removing metadata from packing functions, also very much in-progress work on moving things into the data repository to establish the desired xml schema

* low-level rewrite: new file writer to support parallel correctly, changed metadata type, added some mpiwrapper operations needed to determine global offset and sizing, continuing work on the data repo integration, unit tests seem to be passing correctly

* refactor of the collectors trying to get the high-level schema spec finalized

* initial cherry-pick of task manager and tasks from pr #383

* adding mpi comm to hdf file io to determine scope of paralell ops to allow for simultaneous use for parallel quantities and the time data itself which should only be written by one process, also shifting
collector away from directly targeting underlying types and just targeting the wrapper which now can return the necessary metadata and pack the object which is all that is required to collect history

* not bothering to fix the copy task atm since that is lower priority than the time history, it needs to find/create a copy-to field during initialization and then copy to that field since (1) creating the field at execution time limits the task to only executing once due to name conflict and (2) the field creation method no longer works due to changes to improve const-ness

* adding an is_array_view trait to make it easier to detect array/views for functions that can work on both (like the history metadata functions), also adding an option to wrapper packing to suppress packing strings which is needed to pack wrapped arrays for history output since that should result in just the raw array contents being packed as the rank and dimension extents are also suppressed in the current implementation, also added a check that the history collector is associated with a buffer provider (currently on TimeHistoryOutputs have the BufferedIO objects which provide the buffers, and we only have an hdf buffered io implemented

* debugging time output, some parallel stuff, and starting in on final merge prep

* adding support for specifying named sets in time history, outputing the indices for each set, and allowing a single history output event to target multiple collection events

* working on adding SortedArray/View output to output the history file indices associated with each set specified in input

* finalizing set indices output and debugging associated issues with hdf5 output of size 0 on some processes and time-independent data (no unlimited data space dimension means no chunking, chunking with a size 0 process is an error)

* tasks documentation

* splitting packing into seperate data and metadata+data functions to retain debugging metadata usage along with time-history usage, exposing through up through wrapper call hierarchy

* removing unneeded work on wrapperHelper template sfinae stuff, fixing the hdf file io tests by using the correct (no metadate) packing functions

* using buffer type to pack time history into pinned memory on lassen, changing buffer resizing to resize less often and not shrink buffer after write, fixing small mistake in exposing no-metadata packing functions

* resolving wrapper packing interface bugs, adding comm_split, have to create subcomm with only nonzero history writers so hdf5 doesn't freak out about using a chunk size of 0

* cleanup after debugging multi-dimensional array and zero-sized output bugs

* simplifications and a fix for lassen, still an conflict between caliper and hdf5/mpio

* reworking set index usage in order to reduce data movement wrt time history packing

* splitting HDFFile.hpp

* moving/renaming/spliting TimeHistoryCollector.hpp

* splitting up the time history output implementation

* fixing chunking issue that only cropped up at larger problem sizes

* adding serial time history writer to write out a file per mpi rank to work on lassen

* removing uneeded test used only for some debugging, delete existing non-hdf5 to get rid of badly formatted files, re-add time history to sedov, uncrustify

* removing hdf5 high level libraries, adding .hdf5 to filename in parallel access mode

* guard against using copyfield until it is implemented

* simplifying historymetadata

* switch back over to using mpio to write to a single output file instead of one per rank

* switch over to putting out a separate dataset for each setname in the field instead of one dataset for the field, precludes us from having to calculate/write datasets for the set indices in the time history files

* remove unneeded functionality that can be replaced with an improved version by making packcollection slightly more general, doxygen, uncrustify

* updating to newly-installed TPLs

* no-mpi build fixes and removing commented out old parallel file init

* small amount of cleanup, fixed a bug that could arise under some circumstances with multiple set output

* switching internal buffer and file reserve growth from a multiplier of 4 to 2

* initial simple time history reader script to show how to access and plot time history data

* updating submodule references

* update tpl tag, and submodule hashes
diff --git a/timehistory_package/setup.py b/timehistory_package/setup.py
@@ -0,0 +1,9 @@
+from distutils.core import setup
+
+setup(name='time_history_plotting',
+      version='0.1.0',
+      description='Scripts to plot time-series data from GEOSX time-history output files.',
+      author='William Tobin',
+      author_email='tobin6@llnl.gov',
+      packages=['plot_time_history'],
+      install_requires=['matplotlib', 'hdf5_wrapper', 'h5py', 'numpy'])
diff --git a/timehistory_package/timehistory/__init__.py b/timehistory_package/timehistory/__init__.py
@@ -0,0 +1,3 @@
+
+from .plot_time_history import getHistorySeries
+from wrapper import hdf5_wrapper
diff --git a/timehistory_package/timehistory/plot_time_history.py b/timehistory_package/timehistory/plot_time_history.py
@@ -0,0 +1,143 @@
+import numpy as np
+from wrapper import hdf5_wrapper as h5w
+import matplotlib as mpl
+import matplotlib.pyplot as plt
+import os
+import sys
+import argparse
+
+import re
+
+def isiterable(obj):
+    try:
+        it = iter(obj)
+    except TypeError:
+        return False
+    return True
+
+def getHistorySeries( database, variable, setname, indices = None, components = None ):
+    """
+    @brief retrieve a list of (time, data, idx, comp) timeseries tuples, each tuple is a single time history data series suitable for plotting in addition to the specific set index and component for the time series
+    @param database an hdf5_wrapper database to retrieve time history data from
+    @param variable the name of the time history variable for which to retrieve time-series data
+    @param setname the name of the index set as specified in the geosx input xml for which to query time-series data
+    @param indices the indices in the named set to query for, if None, defaults to all
+    @param components the components in the flattened data types to retrieve, defaults to all
+    """
+
+    set_regex = re.compile( variable + '(.*?)', re.IGNORECASE )
+    if setname is not None:
+        set_regex = re.compile( variable + '\s*' + str(setname), re.IGNORECASE )
+    time_regex = re.compile( 'Time', re.IGNORECASE ) # need to make this per-set, thought that was in already?
+
+    set_match = list( filter( set_regex.match, database.keys( ) ) )
+    time_match = list( filter( time_regex.match, database.keys( ) ) )
+
+    if len(set_match) == 0:
+        print(f"Error: can't locate time history data for variable/set described by regex {set_regex.pattern}")
+        return None
+    if len(time_match) == 0:
+        print(f"Error: can't locate time history data for set time variable described by regex {time_regex.pattern}")
+        return None
+
+    if len(set_match) > 1:
+        print(f"Warning: variable/set specification matches multiple datasets: {', '.join(set_match)}")
+    if len(time_match) > 1:
+        print(f"Warning: set specification matches multiple time datasets: {', '.join(time_match)}")
+
+    set_match = set_match[0]
+    time_match = time_match[0]
+
+    data_series = database[set_match]
+    time_series = database[time_match]
+
+    if time_series.shape[0] != data_series.shape[0]:
+        print(f"Error: The length of the time-series {time_match} and data-series {set_match} do not match: {time_series.shape} and {data_series.shape} !")
+
+    if indices is not None:
+        if type(indices) is int:
+            indices = list(indices)
+        if isiterable(indices):
+            oob_idxs = list( filter( lambda idx : not 0 <= idx < data_series.shape[1], indices) )
+            if len(oob_idxs) > 0:
+                print(f"Error: The specified indices: ({', '.join(oob_idxs)}) "+"\n\t"+f" are out of the dataset index range: [0,{data_series.shape[1]})")
+            indices = list( set(indices) - set(oob_idxs) )
+        else:
+            print(f"Error: unsupported indices type: {type(indices)}")
+    else:
+        indices = range(data_series.shape[1])
+
+    if components is not None:
+        if type(components) is int:
+            components = list(components)
+        if isiterable(components):
+            oob_comps = list( filter( lambda comp : not 0 <= comp < data_series.shape[2], components) )
+            if len(oob_comps) > 0:
+                print(f"Error: The specified components: ({', '.join(oob_comps)}) "+"\n\t"+" is out of the dataset component range: [0,{data_series.shape[1]})")
+            components = list( set(components) - set(oob_comps) )
+        else:
+            print(f"Error: unsupported components type: {type(components)}")
+    else:
+        components = range(data_series.shape[2])
+
+    return [ (time_series[:,0], data_series[:,idx,comp], idx, comp) for idx in indices for comp in components ]
+
+
+def commandLinePlotGen():
+    parser = argparse.ArgumentParser(description = "A script that parses geosx HDF5 time-history files and produces time-history plots using matplotlib")
+    parser.add_argument("filename", 
+                        metavar="history_file", 
+                        type=str, 
+                        help="The time history file to parse")
+
+    parser.add_argument("variable", 
+                        metavar="variable_name", 
+                        type=str, 
+                        help="Which time-history variable collected by GEOSX to generate a plot file for.")
+
+    parser.add_argument("--sets", 
+                        metavar="name", 
+                        type=str, 
+                        action='append', 
+                        default=[None], 
+                        nargs="+", 
+                        help="Which index set of time-history data collected by GEOSX to generate a plot file for, may be specified multiple times with different indices/components for each set.")
+
+    parser.add_argument("--indices", 
+                        metavar="index", 
+                        type=int, 
+                        default=[], 
+                        nargs="+", 
+                        help="An optional list of specific indices in the most-recently specified set.")
+
+    parser.add_argument("--components", 
+                        metavar="int", 
+                        type=int, 
+                        default=[], 
+                        nargs="+", 
+                        help="An optional list of specific variable components")
+
+    args = parser.parse_args()
+    result = 0
+
+    if not os.path.isfile(args.filename):
+        print(f"Error: file '{args.filename}' not found.")
+        result = -1
+    else:
+        with h5w( args.filename, mode='r' ) as database:
+            for setname in args.sets:
+                ds = getHistorySeries( database, args.variable, setname, args.indices, args.components )
+                if ds is None:
+                    result = -1
+                    break
+                figname = args.variable + ("_"+setname if setname is not None else "")
+                fig, ax = plt.subplots( )
+                ax.set_title(figname)
+                for d in ds:
+                    ax.plot(d[0],d[1])
+                fig.savefig(figname+"_history.png")
+
+    return result
+
+if __name__ == "__main__":
+    commandLinePlotGen( )

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+`
	`2`	`+from .plot_time_history import getHistorySeries`
	`3`	`+from wrapper import hdf5_wrapper`