Skip to content

Latest commit

 

History

History
326 lines (245 loc) · 13.8 KB

File metadata and controls

326 lines (245 loc) · 13.8 KB

TagDataTranslation Library - Claude Code Context

Project Overview

TagDataTranslation is a C# library implementing GS1's Tag Data Translation (TDT) specification for encoding and decoding EPC (Electronic Product Code) identifiers used in RFID tags.

Supported Standards

  • TDT 2.2 - Full support for all standard EPC schemes
  • TDS 2.3 - Support for '+' and '++' schemes with hostname encoding
  • Standards in markdown are in the parent Mimasu repo: docs/standards/md/gs1/tdt/ and docs/standards/md/gs1/tds/

Tech Stack

  • .NET 8.0 / 9.0 / 10.0 (multi-targeting)
  • xUnit for testing
  • JSON-based scheme definitions (in src/TagDataTranslation/Schemes2/ folder)

Key Components

TDTEngine.cs

The main translation engine. Key methods:

  • Translate(input, inputFormat, parameterList) - Main translation method
  • ProcessInput() - Parses input and extracts fields
  • ProcessOutput() - Formats output to requested level

Scheme Files (src/TagDataTranslation/Schemes2/*.json)

JSON definitions for each EPC scheme containing:

  • Level definitions (BINARY, BARE_IDENTIFIER, GS1_DIGITAL_LINK, etc.)
  • Field patterns and extraction rules
  • Encoding/decoding rules

src/TagDataTranslation/Encoding/ folder

Helper classes for specific encoding methods:

  • HostnameEncoder.cs - TDS 2.3 hostname encoding with optimizations

TDS 2.3 Implementation Details

'+' Schemes (Plus schemes)

Single-plus schemes like SGTIN+, SSCC+, etc. that:

  • Use variable-length serial encoding
  • Support GS1 Digital Link URIs with id.gs1.org hostname
  • Do NOT encode custom hostnames

'++' Schemes (Plus-plus schemes)

Note: The '++' scheme JSON files are custom implementations created by Claude, not official GS1 scheme definitions. They are based on the TDS 2.3 specification but the JSON schema files themselves are not from GS1.

Double-plus schemes like SGTIN++, SSCC++, etc. that:

  • Include all features of '+' schemes
  • Additionally encode a custom hostname in binary
  • Support branded GS1 Digital Link URIs (e.g., https://coca-cola.com/01/...)

Hostname Encoding (Section 14.5.16)

Two methods supported:

  1. Code 40 (indicator bit 0): For uppercase-only hostnames

    • 16 bits per 3 characters
    • Character set: 0-9, A-Z, -, .
  2. 7-bit ASCII with optimizations (indicator bit 1): For mixed-case hostnames

    • Uses optimization tables for common TLDs and subdomains
    • .com, .org, .net etc. encoded as single 7-bit sequence
    • id., www., qr. encoded as single 7-bit sequence
    • Country TLDs encoded as 14-bit sequences

Important: The hostname length field indicates number of 7-bit sequences, NOT number of output characters.

Variable-Length Alphanumeric Encoding (Section 14.5.6)

Format: 3-bit encoding indicator + 5-bit length + variable data

Indicator Method Bits per char
0 Numeric ~3.32 bits/digit
1 Upper hex 4 bits
2 Lower hex 4 bits
3 Base64 URI-safe 6 bits
4 7-bit ASCII 7 bits
5 URN Code 40 ~5.33 bits

'++' Scheme BINARY Pattern

All '++' schemes require trailing ([01]*) in their BINARY pattern to capture variable-length serial and hostname data after the fixed fields.

Example for SGTIN++:

"pattern": "^11111101([01])([01]{3})([01]{56})([01]*)"

'++' Scheme Structure

Most '++' schemes follow this structure:

  • Header (8 bits) - unique per scheme
  • DataToggle (1 bit) - +AIDC data indicator
  • Filter (3 bits)
  • Fixed fields (scheme-specific, BCD encoded)
  • Serial (variable-length alphanumeric)
  • Hostname (1-bit encoding + 6-bit length + data)

Known Issues

TDS 2.3 Standard Errata

See docs/TDS-2.3-Errata.md for documented errors in the TDS 2.3 specification, including:

  • SGTIN++/DSGTIN++ hostname errors in E.3
  • SSCC++/ITIP++ header errors in E.3

Note: Errors in '++' scheme JSON files are not "errata" since the JSON files are custom implementations, not from GS1. Only errors in the official TDS 2.3 specification document should be documented in the errata file.

Testing Requirements

Tests are in a private submodule. After cloning, initialize:

git submodule update --init --recursive

All tests must pass. Tests are never allowed to fail. Before committing any changes, ensure all tests pass by running:

dotnet test test/TagDataTranslation.Tests/TagDataTranslation.Tests.csproj

Test Coverage

Use the coverage script to generate an HTML coverage report:

./scripts/coverage.sh

This runs all tests with coverlet, generates a Cobertura XML file, and produces an HTML report at coveragereport/index.html (opens automatically on macOS).

Coverage-driven test improvement process:

  1. Run ./scripts/coverage.sh to generate the report
  2. Read the Cobertura XML (test/TagDataTranslation.Tests/TestResults/*/coverage.cobertura.xml) to identify uncovered lines/branches per class
  3. Prioritize gaps by impact: internal classes with InternalsVisibleTo access can be tested directly; otherwise test through TDTEngine.Translate()
  4. Write tests targeting the uncovered lines, run the suite, re-run coverage to verify improvement

Key coverage notes:

  • InternalsVisibleTo("TagDataTranslation.Tests") is set in AssemblyInfo.cs — internal classes like EncodedAICodec, VariableLengthFieldCodec, PlusPlusFieldConverter can be tested directly
  • Generated JSON serializer code (TdtJsonContext, TableJsonContext) will always have partial coverage — this is expected
  • Table lookup classes (TableB, TableK, TableE) are exercised indirectly through scheme translations; their query methods may show low coverage if only used by specific scheme paths
  • coveragereport/ and TestResults/ are gitignored

Build Commands

# Build all targets
dotnet build src/TagDataTranslation/TagDataTranslation.csproj

# Run tests
dotnet test test/TagDataTranslation.Tests/TagDataTranslation.Tests.csproj

# Run specific test categories
dotnet test --filter "FullyQualifiedName~TDS23"
dotnet test --filter "FullyQualifiedName~TDT22"

# Run benchmarks
dotnet run -c Release --project test/TagDataTranslation.Benchmarks

# Build npm WASM package
cd npm && npm run build

# Run npm smoke test
cd npm && node test/smoke.js

npm Package (@mimasu/tdt)

Architecture

The npm package wraps the .NET library via WebAssembly. The build pipeline:

  1. dotnet publish compiles the WASM project (sdk/wasm/) targeting browser-wasm
  2. npm/scripts/build.js copies the _framework/ output to npm/dist/wasm/
  3. npm/dist/index.js loads the .NET WASM runtime and exposes createEngine()

Key Files

Path Description
sdk/wasm/TagDataTranslation.Wasm.csproj WASM project (net10.0, browser-wasm)
sdk/wasm/JsInterop.cs JSExport methods callable from JavaScript
sdk/wasm/Program.cs Minimal entry point required by runtime
sdk/wasm/main.js WASM module entry point
npm/package.json npm package metadata
npm/dist/index.js CJS wrapper with createEngine()
npm/dist/index.mjs ESM re-export
npm/dist/index.d.ts TypeScript type definitions
npm/scripts/build.js Build script (WASM compile + license copy)
npm/test/smoke.js Smoke test for encode/decode/tryTranslate
examples/NodeApp/ Example Node.js app using the published package

.NET 10 WASM Gotchas

  • SDK: Use Microsoft.NET.Sdk (not Microsoft.NET.Sdk.BlazorWebAssembly) for library-style WASM
  • AllowUnsafeBlocks: Required — the JSExport source generator emits unsafe code in .NET 10
  • JsonSerializerIsReflectionEnabledByDefault: Must be true — trimmed WASM disables reflection-based JSON by default, but TDTEngine uses System.Text.Json with reflection to load scheme files
  • TrimmerRootAssembly: Must include TagDataTranslation — without this, the IL trimmer strips model constructors, causing DeserializeNoConstructor errors at runtime
  • Entry point: .NET 10 requires a Program.cs with Main (even for library-style WASM)
  • Output path: .NET 10 outputs to AppBundle/_framework/ (not Blazor's wwwroot/_framework/)
  • getAssemblyExports: Returns a Promise in .NET 10 — must await it
  • Initialization order: Call dotnet.create(), then getAssemblyExports(), then runMain()

Publishing to npm

# Build WASM + copy license
cd npm && npm run build

# Set version
npm version 3.x.x --no-git-tag-version

# Publish (opens browser for auth challenge)
npm publish --tag beta --access public   # prerelease
npm publish --access public              # stable release

The build script auto-copies LICENSING.md from the repo root to npm/LICENSE.md (gitignored) so the license ships with every publish.

Package Details

  • Scope: @mimasu (public)
  • License: BSL-1.1
  • Size: ~2.6 MB compressed, ~15.7 MB unpacked (includes .NET WASM runtime)
  • Engine requirement: Node.js >= 18.0.0

File Locations

Path Description
src/TagDataTranslation/ Main library
src/TagDataTranslation/Schemes2/ JSON scheme definitions
src/TagDataTranslation/Encoding/ Encoding helper classes
src/TagDataTranslation/Tables/ Lookup tables (Table F, K, E, B)
test/TagDataTranslation.Tests/ Unit tests
docs/ Errata and plans
docs/TDS-2.3-Errata.md Known errors in TDS 2.3 specification
docs/Scheme-Conversion-Errata.md Errors found in XML to JSON conversion

Adding New Schemes

  1. Create JSON scheme file in src/TagDataTranslation/Schemes2/
  2. Define all levels (BINARY, BARE_IDENTIFIER, etc.)
  3. Add BINARY pattern with appropriate capture groups
  4. For '++' schemes, add variableLengthField and hostnameField definitions
  5. Add tests in appropriate test file

Performance

Caching Architecture

The Translate() hot path uses several caches to avoid repeated work:

  • Regex cache: ConcurrentDictionary<string, Regex> in TDTEngine — compiled regex patterns shared across all engine instances
  • Character set regex cache: ConcurrentDictionary<string, Regex?> in RuleExecutor — caches ValidateCharacterset patterns (null = invalid pattern)
  • Grammar token cache: ConcurrentDictionary<string, GrammarToken[]> in TDTEngine — parsed grammar strings cached as token arrays
  • Pre-sorted fields/rules: Option.Field sorted by Seq at load time; Level.ExtractRules/FormatRules pre-split and sorted at load time
  • BinaryConverter lookup tables: Static arrays for hex↔binary conversion (no per-character Convert calls)
  • Static grammar regex: Single compiled Regex instance for grammar parsing

All caches are static and thread-safe. They grow monotonically (no eviction) which is fine because the set of patterns/grammars is bounded by the scheme definitions.

Benchmarks

Run with dotnet run -c Release --project test/TagDataTranslation.Benchmarks. Results on Apple M1 Pro, .NET 8.0:

Benchmark Mean Allocated
SGTIN-96 encode 7.82 us 9.9 KB
SGTIN-96 decode 7.65 us 9.2 KB
SGTIN++ encode 24.31 us 75.3 KB
SGTIN++ decode 5.02 us 7.8 KB
HexToBinary (96-bit) 99 ns 480 B
BinaryToHex (96-bit) 54 ns 192 B
Failure (random hex) 17.82 us 504 B
Failure (random binary) 18.27 us 784 B

Performance Guidelines

  • Do NOT create new Regex(pattern) in the hot path — use GetCachedRegex(pattern) in TDTEngine or the RuleExecutor charset cache
  • Do NOT use .OrderBy().ToList() on fields/rules — they are pre-sorted at load time
  • Pre-size StringBuilder when output length is predictable (e.g., hex.Length * 4 for HexToBinary)

Debugging Tips

  • Use TryTranslateDetails() for detailed translation information
  • Binary patterns must match exactly - check bit counts
  • For '++' schemes, hostname length is in sequences, not characters
  • GS1 standards in markdown are in the parent Mimasu repo: docs/standards/md/gs1/

Important Implementation Notes

'+' Scheme JSON Files

The '+' scheme JSON files (SGTIN+.json, SSCC+.json, etc.) are from the GS1 standard and should NOT be modified. They support GS1 Digital Link URIs with ANY hostname, not just id.gs1.org.

'++' Scheme JSON Files

The '++' scheme JSON files are custom implementations and CAN be modified as needed to match the TDS 2.3 specification.

Scheme Selection Ambiguity

When GS1_DIGITAL_LINK input is provided, both '+' and '++' schemes may match the URL pattern:

  • '+' schemes match URLs with any hostname (e.g., https://id.gs1.org/01/...)
  • '++' schemes also match URLs with any hostname and capture it for encoding

The engine may select the '++' scheme due to more specific pattern matching. For '+' scheme tests:

  • Test GS1_DIGITAL_LINK as OUTPUT only (translate from BINARY/BARE_IDENTIFIER to GS1_DIGITAL_LINK)
  • Do NOT test GS1_DIGITAL_LINK as INPUT (ambiguous which scheme will be selected)

Use ExecuteTestsWithOutputOnly() helper for '+' scheme tests with GS1_DIGITAL_LINK.

Field Name Consistency

Field names MUST match across all levels of a scheme:

  • BAD: itipBinary in BINARY level, itip in BARE_IDENTIFIER level (no conversion rule)
  • GOOD: itip in both BINARY and BARE_IDENTIFIER levels

'++' Scheme Specific Notes

DSGTIN++: Requires multiple options for different date types (like DSGTIN+):

  • Option 0: prodDate (date type indicator 0000)
  • Option 4: expDate (date type indicator 0100)
  • etc.

GRAI++: GS1_DIGITAL_LINK should capture 14-digit grai field (not 13 digits + hardcoded 0):

  • Pattern: \\/8003\\/([0-9]{14})...
  • Grammar: '/8003/' grai urlEscapedSerial

GDTI++: BARE_IDENTIFIER should use ;serial= separator:

  • Pattern: ^gdti=([0-9]{13});serial=...
  • Grammar: 'gdti=' gdti ';serial=' serial ';hostname=' hostname

ITIP++: Use combined itip field (18 digits = gtin + piece + total):

  • BARE_IDENTIFIER: itip=095211411234540102;serial=rif981;hostname=...
  • GS1_DIGITAL_LINK: /8006/095211411234540102/21/rif981

License

BSL 1.1 - See LICENSING.md