- Packaging / infrastructure improvements:
- npm package
kaitai-struct-compilernow returns the compiler object itself instead of a constructor function (calledKaitaiStructCompiler). Make sure to adapt your code: replace(new KaitaiStructCompiler()).compile(...)withKaitaiStructCompiler.compile(...)(#222)
- npm package
- General compilation improvements:
- Prevent referring to non-existent enum members as
my_enum::(8dcd1be)unknown_member - Prevent duplicate member names in enum definition (1cbaff9) - they're incompatible with the concept of enum in all target languages
- Ensure that IDs of
paramsare unique and don't collide withseqfields orinstanceswithin a type (#923) - Allow whitespace in type invocation: even
type: ' nested :: type ( 1 + 2 , data ) 'now works (#792) - Add style warnings reporting non-standard names for size fields (should use
len_+ subject) and repeat count fields (should usenum_+ subject) - see style guide- they are only recommendations and don't prevent compilation
- only available in the command-line
kaitai-struct-compileron the JVM platform (not in the Web IDE or in the JavaScript build at npm)
- Add the ability to report multiple problems at once instead of stopping after the first error - used for "type validation" errors and style warnings for now (only on JVM compiler builds, not JS builds)
- Improve readability of problems listed in the compiler output
- Force UTF-8 as output encoding in generated files (don't rely on system defaults)
--ksc-json-output: addwarningsat the same level aserrors, don't use octal escapes (e.g. "" ⟶ "\274\u00bc") in string values (invalid in JSON)- Use SnakeYAML (the YAML parser used by JVM compiler builds)
1.25⟶ 1.28, which no longer contains the DoS vulnerability allowing a "billion laughs" attack (50f80d7)
- Prevent referring to non-existent enum members as
- Runtime API changes:
- C++:
kstream::to_stringnow works for all integer types up to 64 bits (not justintas before), has better performance and portability (cpp_stl#50) - Go:
ReadBitsInt{Be,Le}now accept the number of bits as⟶uint8int(go@a5c5c1e) - Java:
readBytesTerm,processXornow accept a single byte value as⟶intbyte - JavaScript: update UMD envelopes to support Web Workers and modules (in the runtime library, generated parsers and JS compiler builds)
- JavaScript:
readBitsInt{Be,Le}now throw⟶ErrorRangeErrorwhen trying to read more than 32 bits - Lua: add zzlib as a submodule to support
process: zlib - Python: validation errors now extend
⟶BaseExceptionExceptionfor easier catching (python#53) - Python: add
API_VERSIONtuple used by generated modules to check their compatibility with the runtime library (python#49)
- C++:
- Notable improvements:
- Make methods
read_bits_int_{be,le}for reading bit integers reliable (fix all bugs) and faster (#949) - No longer preallocating arrays to the capacity of
repeat-exprentries, which could cause excessive memory allocations in invalid files (f5fe28e) - Fix
valid(andcontents) on unnamedseqfields (forcontents, this was a 0.9 regression: #825) - Construct: add support for enums
- Go: implement
encoding: UTF-16{BE,LE} - Go, Lua: implement
valid/expr(#435) - Java: fix broken parse
instanceson Java 7 and 8 when using prebuiltio.kaitai:kaitai-struct-runtime:0.9from Maven Central (java#34) - Java: fix
terminatorvalues from0x80to0xff(java#35) - Lua: map 1-bit
type: b1to boolean to match Kaitai Struct design (see docs) - Lua: fix undecided calculated endianness incorrectly treated as big-endian
- Lua: implement
process: zlib(see Installation section of Lua runtime for how to enablezlibsupport) - Nim: fix
encoding: ASCIIon Windows (#960) - Perl: fix array literals, implement all byte array operations,
substringandstr.to_i(2)methods - PHP: support PHP 8 (php#8)
- Python: generated parsers no longer import
pkg_resources, which caused performance and usability issues (#804) - the runtime library API version check now compares tuples instead - Python:
read_byteschecks if a large read request (8 MiB or more) can be satisfied, even before any bytes are read (python#61) - Ruby: validation error messages now display byte arrays as hex dumps, similar to Java (ruby#4)
- (Java - already in 0.9), Lua, PHP: fix translation of unsigned 64-bit integer literals - i.e. from
2**63 = 0x8000_0000_0000_0000to2**64 - 1 = 0xffff_ffff_ffff_ffff(fd7f308, Lua: #837)- these languages don't have actual 64-bit unsigned integers, but they do have 64-bit signed integers, so the result will be negative, but all 64 bits of precision will be preserved
- Fix translation of integer
-2**63 = -0x8000_0000_0000_0000(e33828a)
- Make methods
- Generated code style improvements:
- Go: change header comment to match Go conventions for generated sources (#847)
- Lua: fix broken indentation after a
repeat: untilfield - Python: simpler
returnstatements in instance getters
- Infrastructure updates:
- Bintray was sunset on 2021-05-02: move stable compiler artifacts to GitHub Releases in the kaitai_struct_compiler repo
- Web IDE: improve error reporting (no more useless stack traces)
- https://formats.kaitai.io/: add pointers to runtime installation (#571)
- https://ci.kaitai.io/: group columns by language for better usability (#823)
- New targets support:
- Python with Construct library
- HTML - intended for documentation, preliminary support
- Nim - entry-level support (51% tests pass score)
- New KSY language features:
doc-refsupports list of references (#269)meta/tagsallows specification of multiple tags to allow better navigation in the format gallery (#572)- Allow accessing nested types using
::syntax:foo::bar(#275) - Implement parsed data validations using
validkey (#435) - Implement compile-time
sizeofandbitsizeofoperators (#84)- Type-based:
sizeof<u4>,bitsizeof<b13>,sizeof<user_type> - Value-based:
file_header._sizeof,flags._bitsizeof(file_header,flagsare fields defined in the current type)
- Type-based:
- Implement little-endian bit-sized integers (docs)
- Support choosing endianness using
le/besuffix:type: b12le,type: b1be - Add
meta/bit-endiankey for selecting default bit endianness (le/be)
- Support choosing endianness using
- Expression language:
- General compilation improvements:
- Support Maven-like directory trees by not adding subdir
srcfor outputs of Go+Java anymore, see #287. While this breaks existing builds most likely, it puts those languages in line with all others and adding subdirs is easier for the user than removing some added by Kaitai automatically. - Better error messages (#488)
- Support for .ksy files with UTF-8 BOM (#499)
- Error messages are routed to stderr rather than stdout (#509)
--debugmode split into--no-auto-readand--read-pos(#332)- C++: add C++11 mode
- Add
--cpp-standardCLI option: pass--cpp-standard 11to enable C++11 mode (98is default) - C++11 target:
- uses
#pragma once(instead of#ifndef FOO_H_header guards) - uses
std::unique_ptr<foo>for owning pointers, raw pointersfoo*for non-owning - supports array literals
- uses
- Add
--no-auto-readimplemented for C++- C++: official Windows and Visual C++ support
- Fix case conversions to be locale-independent (#708)
- Support Maven-like directory trees by not adding subdir
- Runtime API changes:
- Add exceptions
Validation{Not{Equal,AnyOf},{Less,Greater}Than,Expr}Errorinheriting from common ancestorValidationFailedError- thrown on failed validations defined withvalidorcontentskey (#435) - Add method
read_bits_int_lefor parsing little-endian bit-sized integers (docs) - Deprecated classes and methods:
⟶ explicitensure_fixed_contentsifthat assertsreadBytes(n)to be equal to the expectedn-byte array (throwingValidationNotEqualErrorif it fails)⟶UnexpectedDataErrorValidationNotEqualError⟶read_bits_intread_bits_int_be
- Add exceptions
- Major bugfixes:
params/type- add support for:- specific user types
enumtypes (#413)- byte arrays (
bytes) - arrays (
u2[],struct[], etc.)
enumwith undefined values in enum list never crashes a parser (#523 for Python, #300 for Java)- Fix coercing different string/bytearray/enum/boolean types (e.g. parsed from stream and created from literal value) in conditional op (
? :) or array literal - Substring
notcannot be used in expressions (#556) - Bit-sized integers were not accounted for properly in
repeat: eos(#548) - Fix switching with else case (
_: foo) only (#595) - C++: fix all known memory leaks
- C++: fix absolute imports (#794)
- Java: more consistent closure of underlying IO streams on forced
close()(#497) - Java: fix reading user types in type-switching in
--no-auto-readmode (#204) - Python: work around circular dependencies generation
- PHP: fix invalid
namespacedeclarations when no--php-namespacespecified (#637)
- Tooling around the compiler updates:
- Kaitai Struct compiler available as Maven plugin and as Gradle plugin
- Infrastructure updates:
- Unstable binary builds are available for all platforms after every CI build at Bintray (#63)
- KSY language reference replaced with documentation generated from JSON schema
- https://formats.kaitai.io/ is rebuilt automatically with CI/CD
- Brand new modular CI/CD system for compiler, underlying CI-agnostic, working on multiple different OSes in parallel (Linux, Windows, macOS) and showing status at https://ci.kaitai.io/
- Generate test assertion specs from language-agnostic KST specs
- New target languages:
- Lua (96% tests pass score)
- initial support for Go (15% tests pass score)
- New ksy features:
- Switchable default endianness:
meta/endiancan now contain a switch-like structure (withswitch-onandcases), akin to switchable types (docs). - Parametric user-defined types: one can use
type: my_type(arg1, arg2, arg3)to pass arguments into user type (docs). - Custom processing types: one can use
process: my_process_name(arg1, arg2, arg3)to invoke custom processing routine, implemented in imperative language (docs). - In repetitions, index of current repetition can be accessed using
_indexin expressions (docs). - Verbose enums: now one can specify documentation and other useful information relevant to enums using verbose enum declaration format (docs).
meta/xrefkey can be used for adding cross-references of a format specifications (like relevant RFC entries, Wikidata entries, ISO / IEEE / JIS / DIN / GOST standard numbers, PRONOM identifiers, etc).
- Switchable default endianness:
- General compilation improvements:
- Imports/includes for all languages are now managed properly, no duplicate / unnecessary imports should be added
- Python: basic docstring support
- More strict ksy precompile checks (less likely to accept ksy that will result in non-compilable code), better error messages
- CLI options:
- Python target now allows to specify package with
--python-package - Java target now allows custom KaitaiStream implementations and
thus allows to specify default implementation for
fromFile(...)using--java-from-file-class.
- Python target now allows to specify package with
- Expression language:
- New methods:
- floats:
to_i - arrays:
min,max
- floats:
- Added byte array comparison
- New methods:
- Packaging / infrastructure improvements:
- ksc is now available as npm package, which now a build dependency of a web IDE
- Runtime API changes:
- C++: now requires
KS_STR_ENCODING_ICONVorKS_STR_ENCODING_NONEto be defined to how to handle string encodings - Java:
KaitaiStreamis now an interface, and there are two distinct classes which implement it:ByteBufferKaitaiStreamprovides KaitaiStream backedByteBuffer(and thus using memory-mapped files)RandomAccessFileKaitaiStreamprovides KaitaiStream backed byRandomAccessFile(and thus uses normal OS read calls, as it was done in older KaitaiStruct circa v0.5)
- JavaScript: Error classes are now subclasses of
KaitaiStreamand were renamed in the following way:KaitaiUnexpectedDataError->KaitaiStream.UnexpectedDataError
- C++: now requires
- Major bugfixes:
- C++: adjusted to made compatible with OS X and Windows MSVC builds
- Fixed broken generation of byte array literals with high 8-bit set in some targets
- Fixed float literals parsing, fixed larger integer keys YAML parsing
- Fixed inconsistency of debug mode vs non-debug mode behavior for
repeat-* - Fixed chain of relative imports bug: now all relative imports work always relative to the file being processed, not to current compiler's dir
- Many problems with switching: invalid common type inferring,
invalid code being generated, added failsafe
if-based implementations for languages which do not support switching over all possible types. - Fixed most memory leaks in C++ (only exception-related leaks are left now)
- New ksy features:
- Type importing system:
meta/importscan be used to import other types as first-class citizens in current compilation unit; "opaque types" are now disabled by default (see below) - Byte-terminated notation (
terminator,includeandconsume) can be now used not only for strings, but also for any byte types and user types pad-rightto remove declare excess right padding (usually with 0s)- User types can now use
parent: expressionto enforce a specific parent for an object, orparent: falseto disable parenting at all (and, subsequently, remove it from parent type inferring process) - Type inferring: value instances are now allowed to use
_parent doc-refto add references to external documentation for types / attributes
- Type importing system:
- Improved compilation process:
- Compilation is now clearly separated in 3 phases: YAML parsing, precompilation, compilation. Phases 1 and 2 are language-agnostic and "precompilation" now does all possible sanity checks preliminary, making sure that language-specific "compilation" doesn't have to deal with invalid data.
- Improved compilation results reporting: now all error messages reported by compiler have file / code location and proper user-readable text. Added more than 50 tests for erroneous input files. Exceptions thrown directly are considered a compiler bug from now on.
- Generated code now checks for runtime library version compatibility and fails to compile / run with non-compliant runtime
- Command-line compiler options:
--opaque-types=trueto enable opaque types (disabled by default, i.e. using unknown type would be treated as error)--verbosenow allows fine-tuned verbose logging for various compiler's subsystems; using--verbose=allexposes a lot of internal logic.--ksc-json-outputto dump compilation results in machine-readable JSON format (simplifies ksc integration in other tools, like visualizers)
- Console visualizer: faster loading, automatic handling of imports (no more need to specify all .ksy files manually on invocation)
- Expression language:
- Two string types: single quotes (verbatim), double quotes (interpolating with escape characters)
- New type casting operator:
.as<foo> - New methods:
- arrays:
size - booleans:
to_i - byte arrays:
to_s(encoding) - enums:
to_i - strings:
reverse
- arrays:
- Runtime API changes:
- All bytearray to string functions are named
bytes_to_strin all languages - Added
read_bytes_term(akin to whatread_str_termdid previously to strings) - Removed
read_str_*methods, they are to be replaced now with combination ofread_bytes_*+bytes_to_str - Added
bytes_strip_rightandbytes_terminate - Perl module now uses
IO::KaitaiStructpackage name (instead ofKaitai)
- All bytearray to string functions are named
- Major bugfixes:
- Recursive top-level types
- Unaligned bits reading with enums on top of bit-level integers
repeat-untilhandling with substreams
- Unaligned bit parsing support
- Use
type: b12to parse 12 bits as integer from a stream (obviously, one can useb1,b2,b3, etc) b1is parsed as a boolean value- If several
bXXare chained in a sequence, can be used to parse bit masks/fields - Using of regular types (i.e.
u1,s4,str, etc) starts parsing normally, aligning to next byte
- Use
- More meta information, documentation and non-standard keys usage:
docfor docstrings is allowed on type levelmetacan now include:title(to give proper full title for type)license(to specify work licensing)ks-version(to specify minimal version of Kaitai Struct compiler that must be used to process a .ksy - i.e.0.6)ks-debug(to enforce generation of classes as if--debugmode was specified in command line)
metais non-global now, but can be used on multiple levels and inherited from closest one- Non-(yet)-standard keys can be used everywhere now using
-keysyntax: for example, Web IDE uses-webide-representationkey which is ignored by the compiler, but useful for clearer debugging
- Enums are proper first-class citizens now:
enum: XXXspecifications are not just strings, but proper references to declared enums, thus they're checked for validity, can reference upper level nested enums from lower levels, etc - this fixes majority of existing enum namespacing problems in JavaScript, Python, PHP and Perl idinseqelements in now optional: it can be useful for quick exploration mapping (one can always assign identifiers later), or for unused ("reserved for later use") attributes - such attributes would be assigned numbered IDs automatically- Allow value instances to use
ifandenum - Proper support for "opaque" external types: one can use an undeclared data type, it's expected to be declared in some other .ksy file and it will be properly imported/included in current file
- Expression language:
- Support for integer literals with underscores for readability: one can use stuff like
123_456_789or0b0101_0011now to_smethod for integer types to convert them to strings
- Support for integer literals with underscores for readability: one can use stuff like
- Language-specific improvements:
- C++: clearly separated "null" (no result, for example, due to failed
ifcondition) and "not yet calculated" results - introduced_is_null_XXX()method for check for true null result in generated API - JavaScript: generated enums can be queried for both ID => name and name => ID
- PHP: dropped type generation for now due to nullable types - one day they might return strictly for PHP 7.1+
- GraphViz: major compatibility fixes, diagram readability improvements, support for switch types
- C++: clearly separated "null" (no result, for example, due to failed
- Runtime API changes:
ensure_fixed_contentsno longer requires both expected byte array and its length, only array is required- Java: all methods no longer use checked exceptions, i.e.
IOException
- Bugfixes:
- Type derivation of parent types when using switched
type, array types, and type combining on switching / ternary operators - Multiple translator fixes: type derivation, parenthesis generation
- Assorted code generation bugfixes in C++, Python, Ruby
- Type derivation of parent types when using switched
- Refactorings and optimizations:
- Type derivation engine
- Parse instances use more optimal order of conditionals / debug / IO management applications
- Improved error messages
- Target languages support:
- C++/STL - fully supported, all tests pass
- Python - made compiled code and runtime compatible with both Python 2 and 3, enforced by CI
- PHP7 - new target language, 98% supported
- Perl - new target language, 85% supported
- Graphviz - allows generation of visual diagrams of data formats, to be laid out with GraphViz (
.dotformat)
- New KSY language features:
- Switch-like conditional structure to determine
typebased on value of expression (instead of tons ofifs) - Attribute field
docto annotate fields - will generate docstrings relevant to language (i.e. JavaDoc, JSDoc, YARD/RDoc, etc) repeat-untilallows repetition of a field until a condition is met- Boolean type support
- Switch-like conditional structure to determine
- Expression language:
_io.eofreturns boolean value - whether the end of stream was reached or not_io.posreturns current position in the stream_io.sizereturns size of the stream
- .ksy parsing improvements:
- New unified type derivation engine allows compile-time type error checks and full support of target languages which require absolute type designations (like C++, Python, Perl or PHP)
- Same YAML parsing code is now used for both JVM and JS platforms
- Stricter checks on all parsing stages: lots of invalid combinations are now prohibited (instead of choosing one of variants)
- Better error messages: now in most cases compiler clearly indicates source of the problem
- Build and release process:
- Compiler: added building as pom module
- Java runtime: added building as pom module
- Python runtime: added building as pip module
- Windows CI: now all commits are built also on Windows, with .msi package available for download
- Debug mode:
- Support implemented for Java and JavaScript (to allow creation of visualizer tools in these languages - see Java GUI for Kaitai Struct and Web IDE for Kaitai Struct
- Added generation of
SEQ_FIELDShelper const array that allows clear separation of sequence attributes vs instance without guesswork - Exception in debug mode now tries to save as much parsed data as possible (to aid diagnosing the error)
- Incompatible changes:
- Identifiers are now strictly checked to conform to
lower_underscore_casepattern (that would be converted to language-specific style on ouput) - Java:
_parsemethod renamed to_readprocess*methods are now static
- JavaScript:
positionin runtime is renamed topos(to conform to general KS API spec) - Compiler API: now all compilers accept unified
RuntimeConfigfor configuration instead of individual options
- Identifiers are now strictly checked to conform to
- Bugfixes:
- Java:
- having
ifon a sequence attribute now makes it automatically boxed (to allow it to benull) - work around some
intvslongincompatibilities - proper boxing of floating types
- having
- Integer modulo (
%) operation now behaves exactly the same in all languages, always returning positive result (as opposed to remainder operation%in languages like C++ or Java)
- Java:
- Languages support:
- New target language, fully supported: C# (modules should be usable all across the .NET platform, i.e. from C++/CLI, VB.NET, F#, etc.)
- Preliminary support for C++ (with STL containers / IO implementation) - note that not all features are implemented.
- Data types:
- Floating point data types support (available as
f4andf8for single and double precision IEEE754 floats) - Separate data type for byte arrays (including support for literal byte arrays)
- Floating point data types support (available as
- Expressions language:
- Added new testing framework for expression translators
- Added
.firstand.lastfor arrays (getting first and last element of array) - Added
.to_ifor strings (string -> int conversion) - Support for accessing
_ioobject (IO stream) to access current stream's size (_io.size)
- Processing: extended "xor" processing to support XORing with multi-byte keys
- Runtime libraries:
- Lots of cleanup - now all libraries try to follow the same strict standard (with method naming, parameters, order of methods, etc).
- JavaScript: implemented full streaming API (both signed & unsigned integer, ensuring fixed contents fields, approximated 64-bit integers, etc).
- New process: "ror/rol" (for simple circular bit shift)
- Ruby: runtime classes reside in a proper namespace:
Kaitai::Struct::StructandKaitai::Struct::Stream, now justKaitaiStructandKaitaiStream - Scala.js build: fully implemented, now compiler can be called on a web page as a JavaScript library
- Implemented
process:for pre-processing input buffer of user types - Translator: allow coercing of different int types into each other
- General code cleanup
- Initial public release