Skip to content

Address issues parsing XSD for ORE#12

Open
mcraveiro wants to merge 18 commits intocraflin:masterfrom
mcraveiro:fix-abstract-element-refs
Open

Address issues parsing XSD for ORE#12
mcraveiro wants to merge 18 commits intocraflin:masterfrom
mcraveiro:fix-abstract-element-refs

Conversation

@mcraveiro
Copy link

@mcraveiro mcraveiro commented Jan 25, 2026

Summary

This PR fixes several issues that prevented xsdcpp from generating valid C++ code for complex XSD schemas like the Open Source Risk Engine (ORE) XSD files.

As per detail in #11.

Bug Fixes

  • Handle abstract element references - Abstract elements without type attributes serve as placeholders for substitution groups. Previously, following a reference to such an element would fail with "Could not find 'element', 'complexType' or 'simpleType'". Now these references are properly recorded for later resolution.

  • Handle elements without type definitions - Elements like <xs:element name="Type" maxOccurs="unbounded"/> without a type attribute should default to xs:anyType. Previously this caused an error; now it defaults to string type.

  • Escape C++ reserved keywords - Extended the keyword escaping list to include bool, true, false, NULL, TRUE, FALSE, int, double, float, char, void, long, short, signed, unsigned, const, volatile, static, extern, register, auto, inline, virtual, explicit, friend, typedef, typename, template, namespace, using, public, private, protected, new, delete, this, return, if, else, switch, case, default, while, do, for, break, continue, goto, try, catch, throw, sizeof, alignof, typeid, noexcept, nullptr, constexpr, decltype, static_assert, thread_local, mutable, operator, asm, and linux.

  • Handle duplicate enum values - When XSD enum values normalize to the same C++ identifier (e.g., 30E/360.ICMA and 30E/360 ICMA both become _30E_360_ICMA), append a counter suffix to make them unique.

  • Support SimpleRefKind inheritance - Types inheriting from SimpleRefKind (typedef'd primitive types) now correctly use the xsd::base<> wrapper, similar to BaseKind and EnumKind.

  • Fix duplicate struct members - Deduplicate elements when processing <xs:choice> branches to prevent duplicate struct members when the same element appears in multiple branches.

  • Fix duplicate ElementInfo definitions - Track generated ElementInfo by C++ type name (not XSD type name) to avoid duplicates when multiple XSD types map to the same C++ type (e.g., nonNegativeInteger, positiveInteger, unsignedLong all map to uint64_t).

  • Fix vector proxy issues - Added xsd::vector<bool> specialization using char storage to avoid std::vector<bool> proxy issues where .back() returns a proxy object instead of a reference.

  • Improved error messages - Include the element name in error messages when type resolution fails.

New Features

  • Separate output directories - New -H/--header-output and -C/--cpp-output options to place header files and implementation files in separate directories.

  • Wrap namespace - New -w/--wrap-namespace option to wrap all generated code in an additional C++ namespace. Supports nested namespaces using :: syntax (e.g., -w myproject::xml).

  • Skip inner namespace - New -N/--no-inner-namespace option to place types directly in the wrap namespace without the schema-derived inner namespace.

  • GitHub Actions CI - Added .github/workflows/ci.yml for automated build and test on push/PR.

  • ORE XSD test suite - Added test/ore/ directory with 23 XSD files from the Open Source Risk Engine project to validate code generation against a real-world complex schema.

  • README documentation - Added "Command Line Options" section documenting all options with examples.

Test plan

  • All existing tests pass (XmlParser_test, Reader_test, Generator_test, Features_test, XsdLib_test, Ecic_test, Ecoa_test)
  • New Ore_test passes - validates code generation for complex ORE XSD schemas
  • CI workflow runs successfully
  • Generated code compiles with new namespace options

🤖 Generated with Claude Code

mcraveiro and others added 15 commits January 25, 2026 10:14
Helps debugging by showing which element is missing a type definition.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Abstract elements serve as placeholders for substitution groups and have
no type definition. When processing a ref to an abstract element, skip
the recursive processXsElement call and just record the reference for
later resolution by resolveElementRefs().

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two fixes:

1. Refine abstract element handling: only skip processing for abstract
   elements that have no type attribute. Abstract elements with a type
   attribute still need processing for substitution groups to work.

2. Handle elements with no type attribute and no inline type definition.
   In XSD, these default to xs:anyType. Map this to a string element type.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Builds and tests the project on Ubuntu with CMake.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add the ORE (Open Source Risk Engine) XSD files as a test case.
These XSDs exercise complex patterns including abstract elements
without types and elements without type definitions.

Note: The generated code does not yet compile due to incomplete
type ordering issues in the generator. This will be addressed
in a follow-up commit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The ORE XSD files expose a type ordering issue in the code generator.
Generated structs reference types that are defined later in the file,
causing incomplete type errors. This requires implementing proper
topological sorting of type definitions in Generator.cpp.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
XSD type names and enum values can conflict with C++ reserved keywords.
Extended the keyword list in toCppIdentifier() to include:
- All C++ reserved keywords (bool, true, false, class, struct, etc.)
- C++ operator keywords (new, delete, sizeof, etc.)
- Common macros (NULL, TRUE, FALSE)

These identifiers get an underscore suffix to avoid conflicts.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Handle duplicate enum values by appending counter suffix
- Support inheritance from SimpleRefKind types using xsd::base<> wrapper
- Enable Ore_test now that all issues are resolved

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Deduplicate elements when same name appears in multiple choice branches
- Track generated ElementInfo by C++ name to avoid duplicates for mapped types
- Add xsd::vector<bool> specialization using char storage to avoid
  std::vector<bool> proxy issues with .back()

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
TradeType is generated as oreTradeType enum, not a string.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New options:
- -H/--header-output: Separate directory for header files (.hpp)
- -C/--cpp-output: Separate directory for implementation files (.cpp)
- -w/--wrap-namespace: Wrap generated code in additional namespace(s)

The wrap namespace option supports C++11-compatible nested namespace
syntax (e.g., -w outer::inner generates separate namespace declarations).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When -N/--no-inner-namespace is specified, types are placed directly
in the wrap namespace without an additional inner namespace derived
from the schema name.

Example: ores::ore::domain::Portfolio instead of
         ores::ore::domain::input::Portfolio

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a "Command Line Options" section with a table of all options
and examples showing common usage patterns.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add symmetric save_file() and save_data() functions to complement the
existing load_file() and load_data() functionality. The generated API:

  void save_file(const std::string& file, const Type& t);
  std::string save_data(const Type& t);

Implementation includes:
- New XmlWriter class for building XML output with proper indentation
- get_string() overloads for all primitive types
- escape_xml() for XML character escaping
- XSDCPP_MAYBE_UNUSED macro for cross-platform compatibility
- _serialize_*() functions generated for all types
- Handles vectors, optionals, inheritance, and substitution groups
- Round-trip tests to verify save/load consistency

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Command line changes:
- Add -P/--include-prefix option for #include directive prefix in
  generated .cpp files when headers are in a different directory

Documentation:
- Add -P/--include-prefix to command line options table
- Document save_file() and save_data() alongside load functions
- Fix parameter name consistency (List -> list)

Tests:
- Add WrapNamespace_test for -w option
- Add RenameNamespace_test for -w and -n options combined
- Update Generator_test for new generateCpp() signature

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
mcraveiro added a commit to OreStudio/OreStudio that referenced this pull request Jan 26, 2026
@mcraveiro
Copy link
Author

mcraveiro commented Jan 27, 2026

Issues with ORE schema

I started adding roundtrip tests and picked up on the following issues:

  The XSD code generator (xsdcpp) has three main issues:
  Issue: Group references not expanded
  XSD Pattern: <xs:group ref="oreTradeData"/>
  Expected: 80+ trade data types
  Generated: Nothing
  ────────────────────────────────────────
  Issue: Substitution groups not handled
  XSD Pattern: <xs:element ref="nettingSetGroup"/>
  Expected: NettingSetId, NettingSetDetails
  Generated: Nothing
  ────────────────────────────────────────
  Issue: Unbounded in choice broken
  XSD Pattern: maxOccurs="unbounded" inside <xs:choice>
  Expected: xsd::vector<lgm>
  Generated: xsd::optional<lgm>
  Specific Gaps

  1. envelope.NettingSetId - Missing due to substitution group not resolved
  2. trade.{SwapData, FxForwardData, ...} - 80+ trade types missing due to group ref not expanded
  3. InterestRateModels.LGM - Should be vector, generated as optional

  Impact on Tests
  ┌────────────────┬────────┬────────────────────────────────────────────────┐
  │  Domain Type   │ Status │                 Blocking Issue                 │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ Portfolio      │ Fails  │ Missing NettingSetId, missing trade data types │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ Simulation     │ Fails  │ LGM should be vector not optional              │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ Conventions    │ Works  │ -                                              │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ CurrencyConfig │ Works  │ -                                              │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ TodaysMarket   │ Works  │ -                                              │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ PricingEngines │ Works  │ -                                              │
  ├────────────────┼────────┼────────────────────────────────────────────────┤
  │ CurveConfig    │ Works  │ -                                              │
  └────────────────┴────────┴────────────────────────────────────────────────┘

mcraveiro and others added 2 commits January 27, 2026 09:25
… choice

- Add xs:group definition storage and xs:group ref expansion
- Process elements with substitutionGroup attribute properly
- Fix maxOccurs="unbounded" inside xs:choice to generate vectors
- Add forward declarations for serialize functions to handle recursive types
- Change processedElements2 to dynamic vector to support types with many children

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Abstract elements without a type attribute (used as substitution group
heads) were silently dropped when referenced via xs:element ref. Two
issues caused this: the element name was never set for abstract refs,
and the caller skipped elements with empty typeName even when a refName
was pending resolution.

Also add substitution group registration for elements with inline
complex types, and improve resolveElementRefs() to remove unresolvable
refs instead of leaving typeless elements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@craflin
Copy link
Owner

craflin commented Feb 14, 2026

Many thanks for your massive contribution! I'll review and probably merge it soon.

@craflin craflin self-assigned this Feb 14, 2026
@mcraveiro
Copy link
Author

Thanks for looking into this. As far as I can see, the entire ORE schema is now parseable. My only concern at present is that we are generating massive header and implementation files, but I can't think of a good way around this. I was thinking perhaps of adding a clang based tool to post-process as a separate project - unless you can think of a better approach.

Cheers

… minOccurs

Three bug fixes:

1. Generator.cpp: xs:any processContents="lax" on complex types (e.g. AdditionalFields)
   was incorrectly getting SkipMode instead of SkipProcessingMode because the
   SkipProcessContentsFlag check only applied inside the StringKind branch. Moved
   the check before the kind-specific logic and set addText=nullptr for complex types
   in SkipProcessingMode.

2. XmlParser.cpp: Added CDATA section support. skipText was infinite-looping on
   '<![CDATA[' sequences because it called skipSpace which returned without advancing.
   Added CDATA skip logic with line-number tracking. Also updated stripComments to
   extract CDATA content so string fields receive the script text.
   Added null guard on addText call for xs:any complex types.

3. Reader.cpp: Two fixes:
   - When multiple anonymous attribute simpleTypes share the same generated name
     (e.g. all stFreeStyle* types produce "type_t"), merge their enum values rather
     than overwriting. This fixes type_t having only "currency" instead of all 9
     stFreeStyle enum values.
   - When a <xs:group ref="..."> expands a <xs:choice>, mark each choice element as
     minOccurs=0 (they are mutually exclusive alternatives, not individually required).
     Previously, group refs without explicit minOccurs="0" caused all choice elements
     to be marked minOccurs=1, breaking parsing of any non-first alternative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants