Dates are in YYYY-MM-DD format.
- Improved the help messages of various subtools, including
run,debug build, anddebug reduce. - Added a default value for
--artifacts-dirindebugsubtools.
- Fixed a bug in
surgeon insertwhere data types of graph output tensors would not be preserved. - Fixed broken links in various READMEs.
- Added an
OnnxFromBytesloader that can deserialize ONNX models. - Added an
obey_precision_constraintsargument toCreateConfigand corresponding--obey-precision-constraintsCLI argument.
- Deprecated
strict_typesoption inCreateConfigand corresponding--strict-typesCLI argument.
- Added various examples, a CLI User Guide and directory for how-to guides.
- Added an experimental
template trt-configtool to generate template scripts that create TensorRT builder configurations. - Added
--hide-fail-outputto makedebugsubtools suppress output from failed iterations. - Added experimental support for DLA.
- Added a
data to-inputtool that can combine inputs/outputs created by--save-inputs/--save-outputs. The resulting file is compatible with--load-inputs.
- Updated
debugsubtools to show captured output on failed iterations. - The logger will now emit all
CRITICALmessages tostderrinstead ofstdout. - Renamed
CompareFunc.basic_compare_functoCompareFunc.simple. The old name is preserved as an alias. - The
--goodand--badarguments indiff-tacticscan now also accept single files instead of directories.
- Fixed a bug where
debug reducewould crash when ONNX models includedConstantnodes whose outputs needed to be marked as model outputs.
- Added support for
K,M, andGsuffixes to CLI arguments that expect a number of bytes (e.g.--workspace). These correspond toKiB,MiB, andGiBrespectively. For example,--workspace=16Mis equivalent to--workspace=16777216. - Added a
copy_outputs_to_hostparameter inTrtRunner.infer(), which, when set toFalse, will cause the runner to returnDeviceViews instead of NumPy arrays for inference outputs. This allows us to avoid a device-to-host and host-to-device copy if we want outputs to remain on the device. - Added a
view()method toDeviceArrays to create read-onlyDeviceViews over their data. - Added a
PluginRefRunnerwhich provides CPU reference implementations for TensorRT plugins and a corresponding--pluginrefrunner option inpolygraphy run.
-
Marked old shape syntax (
<name>,dim0xdim1x...xdimN,<dtype>) as deprecated since it leads to ambiguity when parsing shapes including named dynamic dimensions.For example, compare:
--input-shapes input0,xxyxzand:
--input-shapes input0:[x,y,z]For now, the old syntax continues to work for shapes without named dimensions, but it will be removed in a future version of Polygraphy.
The newer syntax, which was originally introduced in Polygraphy 0.25.0, uses the list syntax already present in other parts of Polygraphy. For example,
--val-range [0,1]inrunand--attrs axes=[0,1]insurgeon insertuse the same syntax. -
Made several performance improvements in the Polygraphy CUDA wrapper.
-
Added a loud warning when the deprecated
--int-min/--int-maxor--float-min/--float-maxoptions are used. These are superseded by--val-rangewhich allows you to specify data ranges on a per-input basis.
- Removed various deprecated aliases:
ModifyOnnx,SessionFromOnnxBytes,ModifyNetwork,ModifyGraph - Removed the
to-jsontool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON. Polygraphy 0.27.0 and later only support reading and writing data in JSON format. - Removed deprecated legacy submodule
polygraphy.util.miscwhich was just an alias forpolygraphy.util.
- Improved the quality of several examples and added information on how to load serialized TensorRT engines as well as how to use custom input data.
- Added
inspect capabilitysubtool that will partition a ONNX graph into supported and unsupported subgraphs for usage within TensorRT. - Added Windows support to the CUDA wrapper in
polygraphy/cuda/cuda.py.
SaveOnnxwill now create parent directories if they do not already exist.
- Fixed a bug where
ExtractSubgraphwould modify the original graph instead of creating a new graph.
- Fixed various typos, added more details to some tool READMEs.
- Added
polygraphy.configas a top-level import so that it no longer needs to be imported separately (i.e.from polygraphy import config).
- Fixed a bug where
surgeon sanitizewould not re-run shape inference after overriding model input shapes, causing constant folding to be sub-optimal.
- CLI tools will no longer print long stack traces on user error.
- Fixed a bug where
surgeonsubtools would not work with ONNX models without an.onnxextension. - Fixed a bug where
surgeon insertwould not correctly run shape inference if the inserted node replaced the graph outputs. - Fixed a bug where
POLYGRAPHY_AUTOINSTALL_DEPSwould not work correctly for nested modules, e.g.mod.lazy_import("onnx.shape_inference").
- Added an
--ignore-fail-codeoption todebugsubtools to ignore certain types of failures. - Added a highly experimental
OnnxLikeFromNetworkloader that can generate a file using the ONNX format based on a TensorRT network. The resulting model is not valid ONNX, but is useful for visualization. - Added a
onnx-like-trt-networktype inconvertto generate ONNX-like models from TensorRT networks. - Added support for custom installation commands during dependency autoinstall.
This can be configured using
config.INSTALL_CMDor by setting thePOLYGRAPHY_INSTALL_CMDenvironment variable. - Added support for loading external data in
InferShapes. - Added a
--no-per-pass-shape-inferenceargument tosurgeon sanitizeto disable shape inference between constant-folding passes. - Added a
--external-data-size-thresholdCLI option for saving external data for ONNX models. - Added a
--no-save-all-tensors-to-one-fileCLI option to avoid saving ONNX external data to a single file.
- Improved logic for auto-permuting tensors in
basic_compare_func. The new logic can handle an arbitrary number of dimensions. For example, if two tensors with shapes(1, 3, 45, 45, 45)and(1, 45, 45, 45, 3)are being compared,basic_compare_funcwill now guess that the latter should be transposed using a permutation of(0, 4, 1, 2, 3)to match the former. - Improved display of
Profilein logging messages. - Updated NumPy array encoding to use
base64. In some cases, this can reduce file sizes by a factor of 4. - Updated
debug precisiondefault direction toforwardas this typically leads to better results. - Added a
--no-strict-typesflag todebug precisionin case strict types needs to be disabled for any reason. FoldConstantswill no longer run shape inference if shape folding is disabled.InferShapeswill now automatically write large models to the disk to work around the 2 GiB protobuf size limitation. The threshold can be configured using thesave_to_disk_threshold_bytesparameter.
- Fixed a bug in
inspect modelwhere engine output bindings would all be printed on one line. - Fixed a bug where using
set_profilein theTrtRunnerwould sometimes cause input shape checks ininferto fail even when shapes were valid. - Fixed a bug in
inspect modelwhere engine output bindings would display the wrong shapes for profiles after the first. - Fixed a bug where
debug precisionwould incorrectly mark constant layer outputs and non-execution tensors to run in higher precision. - Fixed a bug where
debug precisionwould crash if engine building failed. It now continues to the next iteration, counting the previous one as a failure. - Fixed a bug where
InferShapeswould require--external-data-dirto be set even if the external data were in the same directory as the model. - Fixed a bug where
--data-loader-scriptwould not provide data in theruntool if int8 calibration were enabled in TensorRT.
- Added a
--log-fileoption to CLI tools to store logging output to a file. - Added an
--iteration-infoargument todebugsubtools so that--checkcommands can get information about the current iteration. - Added an experimental
debug repeatsubtool, which is more generic than the existingdebugsubtools.
- Swapping NumPy arrays to the disk is now disabled by default. It can be re-enabled by setting the
POLYGRAPHY_ARRAY_SWAP_THRESHOLD_MBenvironment variable.
- Added support for per-output
check_error_stat, which allows different metrics to be checked for different outputs.
- Moved JSON utilities into a separate
polygraphy.jsonsubmodule. For backwards compatibility, they remain accessible viapolygraphy.utilas well. - The
maxvalue forcheck_error_statinbasic_compare_funcnow only checks the maximum absolute/relative tolerances. The previous behavior of checking the values element-wise is preserved in theelemwiseoption, which is now the default.
- Fixed a bug where data loader would not cast value ranges provided for
boolinput types, which could lead to generating out-of-bound values.
- Fixed a bug where NumPy arrays smaller than 8 MiB would be serialized to disk unnecessarily.
- Added a
check_error_statoption inbasic_compare_funcand corresponding--check-error-statCLI option to control which statistic (e.g. mean, median, max) of the error is used to determine whether outputs matched.
- A histogram of output/error values will now be displayed at INFO severity on comparison failures. Otherwise, it is displayed at VERBOSE severity.
- Fixed a bug where histogram display would wrap to subsequent lines.
- Added more information about absolute/relative difference in
basic_compare_func. For example, it will now print a histogram of the distribution of the outputs and errors.
- Added mean absolute/relative error to
OutputCompareResult, which is returned byComparator.compare_accuracy. This makes it easier to programmatically access this information.
- Several improvements to the quality of error messages and warnings.
- Fixed a bug where
basic_compare_funcandDataLoaderwould issue warnings when default tolerances/value ranges were used.
- Fixed a bug where command-line tools would fail if a
--timing-cacheargument was provided but the file did not exist.
basic_compare_funcwill now issue warnings ifatol/rtolcontain invalid keys.DataLoaderwill now issue warnings ifval_rangecontains invalid keys.
- Added a
tactic_sourcesparameter inCreateConfigto control TensorRT's tactic sources. - Added a
--tactic-sourcesargument to CLI tools. - Added a
DeviceViewclass in thecudasubmodule to represent views of GPU memory.DeviceArrayis now a subclass ofDeviceView. - Added support for accepting
DeviceViews or device pointers in theCalibrator. This means that you can now run calibration using data already on the GPU. - Added support for
DeviceViews inTrtRunner.infer(). Note thatDeviceViews cannot be used for input shape-tensors, which must be allocated on the host. - Added support for using
trt.IInt8Calibratoras theBaseClassofCalibrator. - Exposed some lower level functions like
malloc,free, andmemcpyin the Polygraphy CUDA wrapper. - Added a
set_profile()method toTrtRunnerto control the active optimization profile. - Added
-q/--quietoption to CLI tools. This can be used to suppress logging output without eliminating all output like--silentdoes. - Added a
to_trt()method toProfileto convert it to a TensorRTIOptimizationProfile. - Added
--force-fallback-shape-inferenceoption todebug reduce. - Added
--fail-regexoption todebug reduceto distinguish among different types of falures based on command output.
- Changed
TRT_LOGGERtoget_trt_logger()to make it work properly with lazy imports. - Further improved lazy imports such that no modules are required in order to import Polygraphy modules. Using functionality from Polygraphy modules still requires dependencies.
- Various submodules have been restructured. The old import structure is preserved for backwards compatibility.
- Added
Profile.fill_defaults()which makes it possible to automatically fill a TensorRT optimization profile with sane default values. - It is now possible to provide TensorRT optimization profile shapes for a subset of the network inputs. In such
cases, the rest of the profile will be populated automatically with
Profile.fill_defaults() surgeon extractwill no longer run shape inference unless it is required, e.g. ifautois specified for one of the shape/data type arguments.- ONNX shape inference will now be skipped when
--force-fallback-shape-inferenceis enabled insurgeon extract/sanitize. debug reducewill now freeze intermediate shapes in the model if--model-input-shapesis provided.IterationResults now storeLazyNumpyArrayrather thannp.ndarray. The public interface forIterationResultwill automatically pack or unpacknp.ndarrays into/fromLazyNumpyArray, so the change is completely transparent. This can significantly reduce memory usage for tools likedebug reduceandrun.
- Attempting to load a non-existent file will now cause a friendly error message to be displayed rather than crashing.
surgeon sanitizewill no longer override shapes other than those specified in--override-input-shapes.
- Removed optional
symbolparameter fromlazy_import.
- For security reasons, all serialization/deserialization code in Polygraphy has been updated to use JSON
instead of
pickle. Use the includedto-jsontool to convert data serialized withpickleto JSON format. - Split
TacticReplayerinto separateTacticRecorderandTacticReplayerclasses. This provides more fine-grained control over whether to record or replay tactics. - Deprecated
--tactic-replayin favor of--save-tacticsand--load-tactics. - Changed
check_finiteparameter inComparator.validate()tocheck_inf, since it checks whether values are non-finite rather than the opposite.
- Polygraphy will now validate command-line arguments so that code-injection is not possible.
debug diff-tacticswill now work correctly when replay files are in nested directories.
- Added
--force-fallback-shape-inferenceoption tosurgeon sanitizein case ONNX shape inference doesn't work well enough to allow for folding. - Added a
--calibration-base-classoption to allow changing base class for the TensorRT int8 calibrator.
FoldConstantswill no longer fail if a constant folding pass fails. Seterror_ok=Falseto disable this behavior.
- Added support for saving/loading ONNX models with externally stored weights.
- Added support for automatically installing dependencies as they are needed. This behavior can be enabled by
setting the
POLYGRAPHY_AUTOINSTALL_DEPSenvironment variable to1. When auto-install is enabled, Polygraphy can also automatically upgrade existing packages if a newer version is requested. - Added
error_okoption inInferShapes, which can be set toFalseto make the loader raise an error when shape inference fails.
val_rangein the DataLoader now falls back to the default range if no range is specified for an input.atolandrtolinCompareFunc.basic_compare_funcnow fall back to the default tolerance values if no tolerance is specified for an output.- Folding shapes is now optional in
FoldConstants. surgeon sanitizenow includes a--no-fold-shapesoption to disable shape folding.
- Fixed a bug in
surgeon insertwhere input tensors would be disconnected from all their consumers. Previously, in a model with branches, if one entire branch was replaced bysurgeon insert, the other branch would be invalidated. This is no longer the case. runwill now attempt to avoid introducing a dependency on theonnxPython package when using an ONNX model when--trtis the only specified runner.- When
--force-fallback-shape-inferenceis set insurgeon extract, it will now correctly ignore shapes already inferred in the model. - ONNX loaders will no longer make a copy of the model unnecessarily. If a copy is desired, the
copyparameter can be set toTruefor loaders that may modify the model. InferShapes/infer_shapeswill now work with ONNX models larger than 2 GiB if a path to the model is provided instead of anonnx.ModelProto- Fixed a bug where
FoldConstantswould not count nodes within subgraphs.
- Removed
OnnxTfRunnerand associated CLI options.
- Added
--partitioningflag tosurgeon sanitizeto control how ONNX-GraphSurgeon partitions the graph during constant folding. - Added
--cleanupflag tosurgeon sanitizeto remove dead layers in ONNX models.
ExtractSubgraphloader will now fallback to using shapes/dtypes defined in the model when none are specified.
surgeon sanitizeno longer runs inference when the--override-input-shapesoption is set. Instead, intermediate shapes are cleared.surgeon extractwill no longer override shapes or data types already set in the model when running fallback shape inference.
- Added support for list attributes in
surgeon insert. - Added
val_rangeparameter to data loader, which is more generic thanint_range/float_range, which are now deprecated. - Added support for per-input data ranges to
val_rangeparameter. - Added
--val-rangeCLI option to set input ranges on the command-line. - Added
:as a valid separator for various options and[dim0,...,dimN]as valid syntax for shapes. For example, you can now optionally use:instead of:--inputs input0:[3,4]:int64 input1:[4,64,64]:float32
The new and old styles cannot be mixed.--inputs input0,3x4,int64 input1,4x64x64,float32
- Added support for specifying per-output top-k values to CLI tools.
- Added
--trt-config-scriptargument, which allows CLI tools to accept scripts that define functions that create TensorRT builder configurations. - Added
--data-loader-scriptargument, which allows CLI tools to accept scripts that define data loaders. - Added a new example for the
convertCLI tool, which shows how to use a custom data loader for int8 calibration on the command-line.
- Fixed a bug where
debug reducewould remove branches even if they were required to reproduce failures.
- Added support for string input types in
OnnxrtRunner. - Added
reducesubtool todebugwhich can reduce failing ONNX models to the smallest possible failing subgraph.
- ONNX loaders will no longer modify the original model provided, but instead make a shallow copy.
- Added an example to
dev/showing how to write new command-line tools.
- Verbose TensorRT network logging will no longer fail to show attributes for layers on older versions of TensorRT.
convertcan now automatically determine the output model type based on the file extension.- Added immediately evaluated functional variants for all loaders exported by Polygraphy.
The variants use the same name as the loaders, except
snake_caseinstead ofPascalCase. See the example for details.
- Polygraphy no longer has
numpyas an install requirement. Note however that most, but not all, APIs and CLI tools in Polygraphy still do requirenumpy.
- Removed
func.invoke()since immediately evaluated functions now supersede it.
- Fixed a bug where some
debugsubtools would write engines to the wrong path.
- Added
FoldConstantsloader for ONNX models. - Added
ExtractSubgraphloader for ONNX models.
- Moved
--fp-to-fp16option toconvert.
- Added
ConvertToFp16as a separate loader for ONNX models. - Added
InferShapesloader for ONNX models.
surgeon sanitizewill now run shape inference by default.Modify<X>loaders have been renamed toModify<X>Outputsto better reflect their purpose.surgeon sanitizecan now run multiple passes of constant folding to deal with nodes that may not be folded after the first pass (for example,Shapenodes in cases where ONNX shape inference does not complete).
- Added an experimental
debugsubtool, which includesbuildanddiff-tactics(formerly part offlaky) andprecision(formerly a separate tool).
flaky diff-tacticswill now only show layers that have potentially bad tactics. To view an entire tactic replay, useinspect tacticsflaky repeatwill now only log commandstderroutput withERRORseverity if the command failed. Otherwise,stderroutput is logged withWARNINGseverity.TacticReplayercan now accept aTacticReplayDatainstance directly.TacticReplayDatacan now be constructed manually instead of relying on TensorRT types.
flakyandprecisiontools have been removed and replaced by thedebugsubtool, which includes the functionality of both.
- Added a
POLYGRAPHY_INTERNAL_CORRECTNESS_CHECKSenvironment variable to enable internal correctness checks at runtime. By default, these checks are disabled. A failure in such a check typically indicates a bug in Polygraphy. - Added context managers for CUDA helper classes. This helps ensure they are correctly freed.
- Added
sparse_weightsparameter toCreateConfig, which enables TensorRT optimizations related to sparse weights. - Added a
--sparse-weightsoption to various CLI tools.
- Added checks for cases where paths provided to
BytesFromPathdid not exist.
- Added
__enter__/__exit__toCalibratorso that device buffers can be reliably freed after calibration using a context manager. - Added
fp_to_fp16parameter toModifyOutputswhich will useonnxmltoolsto convert float tensors in the model to 16-bit floats. - Added
--fp-to-fp16CLI argument to various tools. - Added support for
float,int, andstrattributes tosurgeon insert. - Added
InvokeFromScriptloader, which can import and invoke a function from a Python script. - Added support for loading TensorRT networks from Python scripts to various CLI tools.
CLI tools can now accept a Python script in place of a model file.
The script should define a
load_networkfunction that takes no arguments and returns a TensorRT builder, network, and optionally parser. See the example for details. - Added an experimental
templatetool that can generate template files.- Added a
trt-networksubtool that can generate a template script for defining TensorRT networks using the network API.
- Added a
- Added a
SaveBytesloader to facilitate writing bytes to a file between loaders. - Added an experimental
flakytool that can help debug flaky failures.- Added
repeatsubtool, which will run a command repeatedly and sort artifacts intogoodandbaddirectories. - Added
diff-tacticssubtool, which compares known-good and known-bad tactic replay files to determine which tactics may be the source of error.
- Added
EngineFromNetworkandCreateConfigno longer use the global timing cache by default.- Changed
--timing-cachedefault in CLI tools toNone. - Changed
timing_cacheparameter toload_timing_cacheandsave_timing_cacheinCreateConfigandEngineFromNetworkrespectively. - Runners will now raise errors in
inferif the provided input data types or shapes do not match expected types and shapes. This behavior can be disabled by settingcheck_inputs=False. - Changed
--toposortdefault to off insurgeontools as ONNX models are typically topologically sorted. - The logger will now log messages with
WARNINGor greater severity tosys.stderrinstead ofsys.stdout
- Removed
CNTKRunnerand--cntkCLI option. - Removed experimental
--network-apiflag in CLI tools. This is superseded by thetemplate trt-networksubtool.
- Fixed memory leaks in
EngineFromNetwork,EngineFromBytes, andTrtRunner.
- Added support for timing caches in
EngineFromNetworkandCreateConfig. The former can generate caches, while the latter can load them, resulting in much faster engine builds. By default, Polygraphy will use a global timing cache in the temporary directory. - Added a
--timing-cacheoption to various CLI tools. - Added an
EngineBytesFromNetworkTensorRT loader to provide serialized engines directly. - Added a
BytesFromEngineTensorRT loader to provide a means of in-memory engine serialization. - Added an experimental
convertsubtool, which can convert models to various other formats. - Added an
algorithm_selectorparameter toCreateConfigto allow the user to override TensorRT's tactic choices. - Added a
TacticReplayeralgorithm selector to allow for recording and replaying tactics in the TensorRT builder. This makes it possible to make the TensorRT builder behave deterministically. - Added an experimental
--tactic-replayoption to various CLI tools to make it possible to record to and replay from tactic replay files. - Added an experimental
inspectsubtool,tacticswhich can display tactic replay files in a human readable format.
- The
surgeon sanitizesubtool can now also modify model outputs. surgeon insertwill now preserve graph input and output names.
- Fixed a bug where the CUDA wrapper could not allocate buffers larger than 3GiB.
TrtRunnercan now optionally accept acontextdirectly instead of anengine.basic_compare_funcwill now show mismatched indices in addition to mismatched values.
- Added an experimental
surgeonsubtool,insert, which can insert new nodes into an ONNX model. - Added an experimental
surgeonsubtool,sanitize, which can remove unused nodes and fold constants in an ONNX model. - Added
--load-inputsand--save-inputsto provide a mechanism to supply custom input data on the command line. - Added
func.invoke(), a function that calls a provided callable. This can be useful to make it more obvious that a loader is being immediately evaluated. For example:EngineFromNetwork(...)()vs.func.invoke(EngineFromNetwork(...)) - Added per-output tolerance support in
basic_compare_func. - Added per-output tolerance support to the
--atoland--rtolcommand-line options.
- Renamed
inspect resultstoinspect datasince it can now also be used to inspect input data, not just results. Comparator.compare_accuracynow supports comparing a single runner against itself.
- Removed experimental surgeon subtool
prepareandoperateas they were difficult to maintain and not very useful.
- Fixed a memory leak due to
IBuilderConfignot being properly freed in theEngineFromNetworkloader. - Fixed memory leaks on exceptions in TensorRT loaders.
- Fixed a bug in
inspect modelwheredim_params in ONNX models would show up as-1.
- Shape values in
TensorMetadatacan now be strings to indicate dynamic dimensions. TRT_LOGGERis now exported underpolygraphy.backend.trt
- Fixed a bug in
surgeon extractwhere ONNX models usingdim_paramwould be rejected.
- Added missing copyright headers
- Added an
--input-shapesalias for the--inputsoption inrunto better reflect its purpose.
inspect modelwill no longer showdtype/shapeasNoneif the information is not present in the model. Instead, these are now omitted.
- Fixed a bug where boolean outputs would cause a crash in
basic_compare_func
- Fixed a bug where
TrtRunnerwould use the wrong shapes for empty tensor outputs .
- Fixed a bug where the
Calibratorwould not re-check the cache whenreset()
- Added
-v/--versionflag topolygraphy
- Cleaned up unnecessary logging output, and fixed formatting.
- Added new modes to
inspect model, to control whether to show weights in the model. - Added
-s/--show-valuesoption toinspect resultsto display output values. - Added an experimental
--top-kflag torun, which will apply a Top-K before comparing outputs. - Added
exclude_outputstoModifyOutputsandModifyNetworkOutputs - Added an experimental
--onnx-exclude-outputsand--trt-exclude-outputsto selectively unmark outputs.
- Fixed a bug in
inspect modelfor ONNX models containing nodes with Tensor attributes. - Fixed a bug where
DeviceArray.copy_fromwould segfault in rare cases.
- General cleanup and addition of missing docstrings.
- Fixed a bug where
DataLoaderwould use a shape provided by the user even for static shapes in the model. - Fixed a bug where
DataLoaderwould incorrectly report certain tensors as shape tensors. - Fixed a bug where the
DataLoaderCachewould stop checking the cache after the first miss.
- Added an
extenddecorator, which makes it easier to extend existing loaders. - Added more API examples.
Comparator.compare_accuracywill now display an accuracy summary after processing all iterations.- Added a
CreateNetworkloader to create new TensorRT networks - Added experimental
--network-apioption that works with--gento allow manually defining a TensorRT network.
Calibratorcan now accept a file-like object forcacheinstead of just a file path.
- Fixed various errors in API documentation.
EngineFromByteswill now calltrt.init_libnvinfer_pluginsbefore attempting to deserialize the engine.
- Added HTML docs for the Python API
- Fixed a bug where the data loader would not support cases where
int_min==int_maxwhen bounding input data - Fixed a bug where OnnxrtRunner would report incorrect metadata for ONNX models using
dim_paramfor dynamic dimensions.
CreateConfignow accepts astrict_typesargument.- Added a new
polygraphybinary, which includes several tools - Added an experimental new tool:
precision, which can be used to figure out what layers to run in higher precision in TensorRT to achieve the desired accuracy.- Added
bisectsubtool that does binary search - Added
linearsubtool that does a linear search - Added
worst-firstsubtool that marks the layers that introduce the most error first.
- Added
- Added a new tool:
inspectto inspect supported files- Added
modelwhich displays information about models. - Added
resultswhich displays information about savedRunResults
- Added
- Added back
subprocess_polling_intervaltoComparator.run(), as this is still required in certain rare cases. - Optimization passes are now optional in
OnnxFromTfGraph, and can be disabled by settingoptimize=Falsein the constructor. - Runners now include an
is_activeproperty, which indicates whether the runner is currently activated. - Added an experimental new tool:
surgeon, which can be used to modify ONNX models more easily than using ONNX-GraphSurgeon.- Added
prepareandoperatewhich can be used to modify an ONNX model using a JSON configuration. - Added
extractwhich can extract ONNX subgraphs with a single command.
- Added
- Added
--onnx-outputsand--trt-outputsto set outputs in the corresponding loaders - Added a passthrough loader,
LoadPlugins, that can wrap any other loader, and load plugins
EngineFromNetworkwill no longer free the the builder, network and parser if they are provided directly (as opposed to via a loader).TrtRunnerwill no longer free the the engine if it is provided directly (as opposed to via a loader).- All file saving arguments now take file paths instead of directories. This makes it easier to know exactly where each file is being written.
compare_funcinComparator.compare_accuracynow accepts a function that returns anything convertible to a boolean, rather than requiring a boolean.basic_compare_funcnow will return information about required tolerances afterComparator.compare_accuracy.Calibratorcan now be configured to inherit from a different TensorRT calibrator base class.- ONNX GraphSurgeon is no longer required to mark outputs in ONNX models.
TrtLegacyRunnerno longer depends onpycudaTrtRunnerwill now only reset context shapes if the shapes changed. This should improve performance.DataLoadernow takesint_rangeandfloat_rangeparameters, so min/max can be provided more concisely.- All
LoadersandRunnerwere renamed to better reflect their purpose, and to improve readability. - Renamed
warm_up_runstowarm_up. Calibrator'sdata_loaderparameter now accepts any generator or iterable instead of requiring a special type.Comparator.run'sdata_loaderparameter now accepts any generator or iterable instead of requiring a special type.- The included
DataLoadercan now be used as an iterable, and its iteration length can be controlled via theiterationsparameter. - Renamed
--input-shapeto--inputs - Renamed
--min-shape/--opt-shape/--max-shapeto--trt-min-shapes/--trt-opt-shapes/--trt-max-shapes DataLoadernow accepts aninput_metadataparameter which can be used to override shapes and data types.- Split off
layerwiseandoutputsfunctionality into separateModifyloaders. - Split off artifact saving functionality into separate
Saveloaders. - Renamed
--readoptions to--load, and--writeto--save - Renamed
--read-outputs/--write-outputsto--load-results/--save-results Calibratorno longer requiresinput_metadatato be set if the data loader does not need itTfRunnernow uses aCreateConfigloader to supply configuration.TfRunnerandOnnxrtRunnernow take aBuildSession, so that custom sessions can be used.
- Removed iteration arguments from
Comparator.run()andCalibrator. Instead these now iterate the provided data loader until it runs out of data. - Removed
--load-engineoption frompolygraphy. Engines can now be provided as models directly, e.g.polygraphy run example.engine --trt polygraphy_execandpolygraphy_genwere removed. They are superseded by therunsubtool ofpolygraphy.--layerwiseandlayerwiseoptions have been removed. Layerwise behavior is now possible withoutputs=constants.MARK_ALLor--<framework>-outputs mark all
- Fixed bugs in
Comparator.validatethat would cause it not to correctly display non-finite values. Calibratorwill now warn if a cache exists but is emptyDataLoaderwill now used a fixed seed value unless otherwise specified. This ensures consistent run-to-run behavior.- The default
find_output_funcwill no longer compare outputs whose names don't match if there is another output that does match. - Fixed a bug where custom names provided to runners would still be suffixed with a timestamp.
- Fixed a bug where regular TensorRT calibrators could not be used with
CreateConfig - The missing subtool warning will no longer be displayed if that subtool is not being used.
basic_compare_funcnow accepts afind_output_funcparameter, allowing users to control which outputs are compared between results.- The
--load-outputsargument can now accept multiple different files. Outputs from each of these will be read in order. - Added an implicit batch ONNX network loader for the legacy TensorRT runner. This will not work with recent versions of the parser.
- Added
RunResultsclass which replaces theOrderedDictthatComparator.runpreviously returned (structure is unchanged).
layerwisemode will no longer mark constants as outputs.- The default
compare_funcinComparator.compare_accuracywill now always iterate over the output names in the firstIterationResultand attempt to find them in the second. The order of theIterationResults provided to this function can be modified either by settingcomparisonsinComparator.compare_accuracy, or changing the order of runners inComparator.run - Improves
polygraphy_genoutput formatting - Renamed
RunResulttoIterationResultto better reflect its purpose. - Default runner names now include timestamps to disambiguate when saving and loading multiple runners.
graphsurgeonis no longer a dependency of Polygraphy
- Logger settings in
polygraphy_exec/polygraphy_genare now set prior to any logging output. - Comparator will no longer attempt to decompress all
bytesobjects sent over the queue when using subprocesses
- Added
OnnxExtWeightsNetworkLoaderto support loading ONNX models with externally stored weights into TensorRT. - Added a
TensorMetadataclass to replace dictionaries that were used across Polygraphy. - Added
CaffeNetworkLoaderfor theTrtLegacyRunner
polygraphy_execandpolygraphy_genwill no longer use subprocesses by default. To revert to the old behavior, the--use-subprocessflag must now be explicitly provided.SerializedEngineLoadernow accepts abuffer_loader, so that a function that loads a serialized engine may be provided instead of the serialized engine itself.- Default opset for
OnnxFromTfGraphhas been updated to11
polygraphy_execandpolygraphy_gennow correctly handle cases where no model file is provided
- Added a
PolygraphyExceptionclass to serve as a base class for exceptions raised by Polygraphy.
ConfigLoadernow accepts a list ofProfiles to support multiple optimization profiles.- Changed the format of CLI shapes arguments to enable specifying multiple profiles.
- Moves
outputsargument from TfRunner to the tensorflow loaders.
- Polygraphy now includes a thin
ctypeswrapper around the CUDA runtime library, accessible inutil/cuda.py
TrtRunnerno longer depends onpycuda, and instead uses the included CUDA wrapper.- Loader parameters may now be loaders themselves, or the result of invoking a loader.
- Improves the quality of Comparator messages when there are mismatches
basic_compare_funcwill now preserve output ordering in the results.- Makes
EngineFromNetworkcompatible with TensorRT 7.0
- Restructures ONNX Runner, and adds layerwise functionality (using ONNX-GraphSurgeon).
- Added
--timestampand--line-infooptions topolygraphy_execto enable logging of timestamp and line numbers respectively. - Added
--no-letteroption to disable severity letter prefixes in log messages - Added
register_callbackto Logger, which registers a callback that will be called whenever the severity changes. - Added
Logger.verbosity()which returns a context manager that can be used to temporarily change logging severity. - Added new variants to
--model-typeinpolygraphy_exec:keras,ckpt, renamedtftofrozen - Added
ConfigLoaderwhich can be passed toEngineFromNetworkto customize the build configuration prior to building.
- The logger no longer displays timestamps and line numbers. These can be enabled by setting the
timestamp/line_infoproperties respectively toTrue. - Logger now relies on the
coloredmodule to provide colored output polygraphy_execnow runs runners in the order in which they were specified.- Greatly shortens import paths by removing
_runnersuffixes and shortening framework names (e.g.tensorflow_runner->tf) runnerssubmodule has been renamed tobackendTrtRunnerhas been renamed toTrtLegacyRunnerTrtRunnerV2has been renamed toTrtRunnerpolygraphy_genis now at parity withpolygraphy_exec
- Removed
--tftrtas a separate runner inpolygraphy_exec- instead it is now an option for the--tfrunner. - Removed
--tftrt-gpu-memory-fractionand renamed--tf-gpu-memory-fractionto--gpu-memory-fractioninpolygraphy_exec - Removed
--tfonnx, and instead adds this functionality in--onnxrtwhen using a TensorFlow model inpolygraphy_exec - Removed
Experimentalargument section inpolygraphy_exec. All functionality has now been integrated into non-experimental arguments. - Removed
preprocess_networkargument fromEngineFromNetwork. This functionality can be achieved by wrapping the network loaders instead.
Comparator.runwill now forcefully terminate the subprocess if it does not exit on its own.
- Added TF32 support to legacy TrtLegacyRunner.
- Various improvements to automatic shape matching for cases where shapes between runners do not match exactly.
- Changed
BaseRunnerso that runners can now implementactivate()/deactivate()instead of__enter__()/__exit__() polygraphy_execnow defaults to running just a single iteration of inference.
- The
--accuracyflag has been removed frompolygraphy_exec, as this is now the default behavior.
- TensorRT runners now use the same builder to build the network and engine, instead of using a separate builder for each.
- Fixes a bug in
try_match_shape
- Added a
tf32parameter as well as--tf32flag for TensorRT. - Added support for
dim_paramin ONNX.
fp16_modeandint8_modeparameters have been renamed tofp16andint8respectively.polygraphy_execwill now use the runtime shapes specified rather than always usingOPTshapes from the TensorRT profile.- Improves shape matching logic in
DataLoaderCache
- Added a
start_indexandend_indextoComparator.runto make it easy to skip over inputs from the data loader. - Added
CompareFuncto provide built-in comparison functions. - Added
PostprocessFuncto provide built-in post-processing functions. Comparator.compare_accuracynow returns anAccuracyResultobject, which contains much more information about the results of the comparisons.- Added
percentage()function toAccuracyResultto provide an easy way to figure out the percentage of passed iterations.
- Replaces
RunInfowithIterationResult. The latter only stores information about a single iteration for a single runner. compare_funcinComparator.compare_accuracyis now aCallable(IterationResult, IterationResult) -> Dict[str, bool]warm_up_runsnow defaults to0, andend_indexto1- Ordering of outputs in a single iteration is now preserved in
CompareFunc.basic_compare_func use_subprocessnow defaults toFalseinComparator.run()(still defaults toTrueinpolygraphy_exec).Calibratornow takes astart_indexandend_indexargument instead ofmax_items.
- Removed
Comparator.comparefunction sinceComparator.compare_accuracyincludes all of its functionality. iterationsinComparator.runhas been removed and replaced bystart_indexandend_index- Removed
subprocess_polling_intervalargument, asComparatorcan now properly detect when the subprocess terminates.
Comparator.run()will no longer hang if there is a segfault in the subprocess.
- Added
--int-min,--int-max,--float-min, and--float-maxarguments topolygraphy_exec - Added
--explicit-precisionoption topolygraphy_execto enable QAT models in TRT. - Added empty tensor support. Empty tensors are tensors whose shapes contain one or more 0s.
- When
--load-outputsor--save-outputsis specified topolygraphy_exec,seedwill default to1to ensure consistent inputs across runs.
- Added a
--calibration-cacheoption topolygraphy_execto enable supplying a calibration cache - Added a
--no-coloroption to disable color logging.
- Added
GraphOptimizerLoaderfor freezing TensorFlow graphs and--freeze-graphoption topolygraphy_exec. - Added
--load-outputsand--save-outputstopolygraphy_execfor comparing across executions. - Added
KerasLoaderfor loading models stored inhdf5format. - Added constant folding pass to
GraphOptimizerLoaderfor TensorFlow graphs.
- Updates
Calibratorso that it will now use the opt dimension of a profile for networks with dynamic shapes. - Updates Legacy TensorRT runner to use
Loadersfor easier UFF debugging.
Calibratorwill no longer allocate buffers if a calibration cache was provided.
- Added generation of ONNX code to
polygraphy_gen - Added default implementations of some
BaseRunnermethods. - Added
last_inference_time()toBaseRunnerso thatinfer()now only needs to return outputs. - Added
Calibratorfor int8 calibration, along with additional parameters toEngineFromNetwork
- Better warnings for user-defined implementations of various APIs.
DataLoaderCachewill now warn loudly when a set of inputs needs to be regenerated.- Cleans up
Comparatorrun()function. - Moves most
save_*options into loaders rather than runners. - Changed
BaseDataLoader.next()to take index as an argument. This way, inputs can be reliably repeated across runners. - Moves all
layerwiseparameters into loaders rather than runners. Loaders are now interchangeable with PythonCallablesDataLoaders are now interchangeable with PythonCallables
DataLoaderno longer generates allTruevalues for boolean types.- Various bug fixes in
polygraphy_gen DataLoaderCacheis now sent over the queue when runners are run in subprocesses. This resolves an issue where the cache was not being updated correctly.Comparatornow updates runners correctly when using a subprocess.
- Added
--no-fold-constantoption to preventOnnxFromTfGraphfrom doing constant folding in the TensorFlow graph. - Added experimental
polygraphy_genscript that enables generation of template Python scripts for running Polygraphy.
- Bug fix for cases where TensorFlow nodes with no outputs are recognized as graph outputs by
GraphSurgeon.
- Added
nameparameter toCheckpointLoaderin case the checkpoint does not include acheckpointfile.
TFTRTLoadernow accepts any kind of TensorFlow Graph loader
- Bug fix in
TrtRunnerBuffersso that no-op reshapes (no reallocation) are handled correctly.
- Added
check_inf,check_nan, andfail_fastoptions toComparator.validate()
- Cleans up
Buffersimplementation forTrtRunner- eliminates an unnecessary copy that was happening on the host input. - Improved logic for matching output names in
util.find_in_dict()
TrtRunnerwill no longer callcontext's shape setting functions on non-dynamic inputs.
- Bug fix for volume computation for scalars.
- Updates
DataLoaderto handle scalars correctly, adds several tests.
- Added various utility functions as static members of
TrtRunner, e.g.create_networkfunction to simplify TensorRT's network flags.
EngineFromNetworkwill now mark network outputs whenlayerwise=True
- Added support for
booloutputs inComparator
- Replaces
OnnxEngineLoaderwithOnnxNetworkLoaderandEngineFromNetwork. This allows for more flexibility in building engines from TensorRT networks.
- Added
allow_growthoption to TfRunner to work aroundCUDNN_STATUS_INTERNAL_ERROR. Whenallow_growthis enabled, the error disappears.
DataLoaderCachewill now attempt to permute inputs in cases where shapes do not match exactly (e.g. NCHW vs NHWC inputs).
- Fixes a bug in
polygraphy_execwhich caused it to ignore user-defined profiles.
- Added support for many more ONNX data types.
- Added support for
int8and explicit precision mode inTrtRunner - Added
preprocess_networkparameter toOnnxEngineLoaderso that the network can be modified before it is used for building.
TrtRunnerwill now attempt to generate sane default shapes in cases with dynamic shapes where no profiles are provided.
DataLoaderno longer overrides static shapes in the model, but issues a warning if an override is requested.DataLoadernow accepts shape tensor inputs in itsdefault_shapesparameter.
- Added timestamps to logging output.
Comparatorcan now catch segfaults in runners properly.
- Added options for
DataLoaderto be able to specify input bounds - Added smarter matching for input metadata in the
DataLoaderCache
- Default
subprocess_polling_intervalis now 30 seconds. Comparatornow attempts to partially match output names when no exact matches are found.
- Added
subprocess_timeoutparameter toComparator.runto prevent hangs when a subprocess does not terminate. - Added
subprocess_polling_intervalparameter to allow the process to be polled so that failing processes can be terminated before the fullsubprocess_timeout.
- If ONNX checker fails due to the IR version of the model being too new, Polygraphy now ignores the error and continues.
OnnxEngineLoadernow accepts anonnx_loaderfor better flexibility in loading models.polygraphy_execnow supports running TF models in TRT via the tf2onnx converter.- Legacy
TrtLegacyRunnernow only supports UFF models.
- Added
BaseModelLoaderthat can be used to load models. This allows for reuse of existing runners with different import paths. For example,OnnxrtRunnercan be used withOnnxFromTfGraphin order to run a TensorFlow frozen graph via ONNX Runtime. - Implements
ModelLoaders forTfRunner, including a frozen model loader, checkpoint loader, and TF-TRT loader.
OnnxFromTfGraphnow accepts a TensorFlow ModelLoader to support a wider variety of input formats.- Updates legacy
TrtLegacyRunnerto useget_input_metadataAPI, so it is usable for UFF models.
- Comparator will now look at the union of all outputs from all runners when checking for common outputs.
TrtRunnerwill no longer mark layers within the loop body as network outputs inlayerwisemode.DataLoaderCachenow falls back to reusing inputs based on order if names do not match exactly.DataLoadernow accepts adefault_shapesparameter to override dynamic shapes.
- Added
get_input_metadataAPI to BaseRunner. Overhauls runners so they no longer need to handle dynamic input shapes individually. - Added
DataLoaderclass which can be used to feed data to the Comparator. - Added
DataLoaderCacheso that the data loader does not have to load inputs multiple times for each runner.
Comparator.compare_accuracynow fails if no outputs were compared.
- Removed support for implicit batch ONNX models in
TrtLegacyRunner. You should useTrtRunnerfor ONNX models instead.
- Removed
python2support.
- Bug fixes for TensorFlow Graphs
- Bug fixes for
polygraphy_execwhen using legacyTrtLegacyRunner - Bug fixes for
TrtRunnerfor cases with multiple outputs
- Added support for compression during communication between the runner subprocesses and the main
Comparatorprocess. This is becausePipes andQueues can only send objects smaller than 2GB. - Added timeouts to reduce the possibility of hangs in runners.
- Added
--fail-fastoption topolygraphy_execand correspondingfail_fastoption toComparator.compare(). Useful for determining the first layer at which two models diverge. - Added
TrtRunnerthat can be used to run TRT networks with dynamic shapes. Currently only supports ONNX.
- Runners no longer need to specify inputs up front - they can now be specified after
__enter__is called. This greatly simplifies much of the logic in several runners. RunInfono longer contains data about the inputs used.TFOnnxrtRunnernow accepts an opset option when converting graphs to ONNX.
- All runner files are now suffixed with
_runnerto disambiguate them from system packages. - Fixes an issue that prevent EXTRA_VERBOSE logging output from TRT from being displayed.
- Added a
--uff-orderoption in case the automatically determined order is wrong. - Added an experimental
--build-onlyoption topolygraphy_exec - Comparator will now attempt to permute outputs with mismatched shapes when
check_shapesis disabled. - Lowers the default GPU memory fraction, as TensorFlow has OOM issues when it is set too high.
- Added
TFOnnxrtRunnerand--tfonnxoption topolygraphy_exec - Added
OnnxrtRunnerand movesTFOnnxrtRunnerintoonnx_runner.py. - Added
--save-onnxoption forOnnxrtRunner - Changed
--onnxpolygraphy_execoption toonnxtfto disambiguate from--onnxrt - Added
CNTKRunnerand--cntkoption topolygraphy_exec - Changed default shape value to 1. This is the value that is set when no input dimension is specified.
- Added support for loading TF checkpoints.
- Added support for overriding automatically determined outputs in the TF and TF-TRT runners. Added
--tf-outputsargument topolygraphy_exec - Fixes input shape mismatches between ONNX-RT and TF.
- Added
--pluginsoption topolygraphy_execfor loading TRT plugins.
- Added a function in comparator to perform output validation, and a corresponding flag in
polygraphy_exec. - Runners now use OrderedDict for outputs, meaning that the ordering of the outputs will match the order of the layers in the network in most cases.
- Improved TensorFlow output tensor deduction by excluding certain ops that cannot behave like outputs in TensorFlow.
- Version information is now logged at INFO logging severity.
- Removed prepare_inputs/prepare_outputs functions. Instead, runners now do timing on their own in the infer function.
- Changed runner inputs to use dictionaries that map input names to their numpy buffers.
polygraphy_execwill no longer fail if the extension for the model file is unrecognized.- Added
fp16_modeoption to TfRunner for TF-TRT.
- Added an option to limit TensorFlow GPU memory usage
- Added an option to specify minimum segment size to TF-TRT.
- Added an option to write out engine(s) from the TF-TRT graph.
polygraphy_execnow exits when unknown arguments are encountered- Improves timestamps to be human-readable instead of using seconds from epoch.
- Added support for dynamic ops in TF-TRT
- Added an option to write out tensorboard visualizations.
- Added an option for enabling XLA in the TensorFlow runner.
- Added nicer error messages on failed TF-TRT imports
- If a TensorFlow graph specifies a dynamic shape, Polygraphy now automatically populates it with concrete values.
- Added argument groups and moves some unstable arguments to Experimental section.
- Polygraphy will now refuse to write artifacts to the disk if a file already exists wherever it can detect such cases.
polygraphy_execnow emits warnings when unknown command line parameters are used.- Added capability to write out TensorFlow timelines.
- Changed --save* options to accept directory names instead, and the resulting files are timestamped and named based on the runner name.
- Changed command-line parameters to use dashes instead of underscore.
- Modifies TrtLegacyRunner to pass along input order to UFF, instead of permuting the order to CHW.
- Comparator now prints runner output in the same order in which they were specified.
- Added per-inference-inputs command-line arguments for running multiple comparisons.
- Seed is now displayed correctly during Comparator.run().
- User-friendly Comparator output - now suggests command-line flags to get what you were looking for.
- Added layerwise comparison support for TrtLegacyRunner and TfRunner.
- Renamed to TRT Polygraphy.
- Overhauled README.md
- Modified project structure - created runners, comparator, and logger submodules.
- polygraphy_exec now uses batch size specified by model if none is specified by the user.
- Added framework dependencies to setup.py
- TrtLegacyRunner now displays ONNX parsing errors and exits early on parsing failures.
- Initial integration