TagDataTranslation Library - Claude Code Context

Project Overview

TagDataTranslation is a C# library implementing GS1's Tag Data Translation (TDT) specification for encoding and decoding EPC (Electronic Product Code) identifiers used in RFID tags.

Supported Standards

TDT 2.2 - Full support for all standard EPC schemes
TDS 2.3 - Support for '+' and '++' schemes with hostname encoding
Standards in markdown are in the parent Mimasu repo: docs/standards/md/gs1/tdt/ and docs/standards/md/gs1/tds/

Tech Stack

.NET 8.0 / 9.0 / 10.0 (multi-targeting)
xUnit for testing
JSON-based scheme definitions (in src/TagDataTranslation/Schemes2/ folder)

Key Components

TDTEngine.cs

The main translation engine. Key methods:

Translate(input, inputFormat, parameterList) - Main translation method
ProcessInput() - Parses input and extracts fields
ProcessOutput() - Formats output to requested level

Scheme Files (src/TagDataTranslation/Schemes2/*.json)

JSON definitions for each EPC scheme containing:

Level definitions (BINARY, BARE_IDENTIFIER, GS1_DIGITAL_LINK, etc.)
Field patterns and extraction rules
Encoding/decoding rules

src/TagDataTranslation/Encoding/ folder

Helper classes for specific encoding methods:

HostnameEncoder.cs - TDS 2.3 hostname encoding with optimizations

TDS 2.3 Implementation Details

'+' Schemes (Plus schemes)

Single-plus schemes like SGTIN+, SSCC+, etc. that:

Use variable-length serial encoding
Support GS1 Digital Link URIs with id.gs1.org hostname
Do NOT encode custom hostnames

'++' Schemes (Plus-plus schemes)

Note: The '++' scheme JSON files are custom implementations created by Claude, not official GS1 scheme definitions. They are based on the TDS 2.3 specification but the JSON schema files themselves are not from GS1.

Double-plus schemes like SGTIN++, SSCC++, etc. that:

Include all features of '+' schemes
Additionally encode a custom hostname in binary
Support branded GS1 Digital Link URIs (e.g., https://coca-cola.com/01/...)

Hostname Encoding (Section 14.5.16)

Two methods supported:

Code 40 (indicator bit 0): For uppercase-only hostnames
- 16 bits per 3 characters
- Character set: 0-9, A-Z, -, .
7-bit ASCII with optimizations (indicator bit 1): For mixed-case hostnames
- Uses optimization tables for common TLDs and subdomains
- .com, .org, .net etc. encoded as single 7-bit sequence
- id., www., qr. encoded as single 7-bit sequence
- Country TLDs encoded as 14-bit sequences

Important: The hostname length field indicates number of 7-bit sequences, NOT number of output characters.

Variable-Length Alphanumeric Encoding (Section 14.5.6)

Format: 3-bit encoding indicator + 5-bit length + variable data

Indicator	Method	Bits per char
0	Numeric	~3.32 bits/digit
1	Upper hex	4 bits
2	Lower hex	4 bits
3	Base64 URI-safe	6 bits
4	7-bit ASCII	7 bits
5	URN Code 40	~5.33 bits

'++' Scheme BINARY Pattern

All '++' schemes require trailing ([01]*) in their BINARY pattern to capture variable-length serial and hostname data after the fixed fields.

Example for SGTIN++:

"pattern": "^11111101([01])([01]{3})([01]{56})([01]*)"

'++' Scheme Structure

Most '++' schemes follow this structure:

Header (8 bits) - unique per scheme
DataToggle (1 bit) - +AIDC data indicator
Filter (3 bits)
Fixed fields (scheme-specific, BCD encoded)
Serial (variable-length alphanumeric)
Hostname (1-bit encoding + 6-bit length + data)

Known Issues

TDS 2.3 Standard Errata

See docs/TDS-2.3-Errata.md for documented errors in the TDS 2.3 specification, including:

SGTIN++/DSGTIN++ hostname errors in E.3
SSCC++/ITIP++ header errors in E.3

Note: Errors in '++' scheme JSON files are not "errata" since the JSON files are custom implementations, not from GS1. Only errors in the official TDS 2.3 specification document should be documented in the errata file.

Testing Requirements

Tests are in a private submodule. After cloning, initialize:

git submodule update --init --recursive

All tests must pass. Tests are never allowed to fail. Before committing any changes, ensure all tests pass by running:

dotnet test test/TagDataTranslation.Tests/TagDataTranslation.Tests.csproj

Test Coverage

Use the coverage script to generate an HTML coverage report:

./scripts/coverage.sh

This runs all tests with coverlet, generates a Cobertura XML file, and produces an HTML report at coveragereport/index.html (opens automatically on macOS).

Coverage-driven test improvement process:

Run ./scripts/coverage.sh to generate the report
Read the Cobertura XML (test/TagDataTranslation.Tests/TestResults/*/coverage.cobertura.xml) to identify uncovered lines/branches per class
Prioritize gaps by impact: internal classes with InternalsVisibleTo access can be tested directly; otherwise test through TDTEngine.Translate()
Write tests targeting the uncovered lines, run the suite, re-run coverage to verify improvement

Key coverage notes:

InternalsVisibleTo("TagDataTranslation.Tests") is set in AssemblyInfo.cs — internal classes like EncodedAICodec, VariableLengthFieldCodec, PlusPlusFieldConverter can be tested directly
Generated JSON serializer code (TdtJsonContext, TableJsonContext) will always have partial coverage — this is expected
Table lookup classes (TableB, TableK, TableE) are exercised indirectly through scheme translations; their query methods may show low coverage if only used by specific scheme paths
coveragereport/ and TestResults/ are gitignored

Build Commands

# Build all targets
dotnet build src/TagDataTranslation/TagDataTranslation.csproj

# Run tests
dotnet test test/TagDataTranslation.Tests/TagDataTranslation.Tests.csproj

# Run specific test categories
dotnet test --filter "FullyQualifiedName~TDS23"
dotnet test --filter "FullyQualifiedName~TDT22"

# Run benchmarks
dotnet run -c Release --project test/TagDataTranslation.Benchmarks

# Build npm WASM package
cd npm && npm run build

# Run npm smoke test
cd npm && node test/smoke.js

npm Package (@mimasu/tdt)

Architecture

The npm package wraps the .NET library via WebAssembly. The build pipeline:

dotnet publish compiles the WASM project (sdk/wasm/) targeting browser-wasm
npm/scripts/build.js copies the _framework/ output to npm/dist/wasm/
npm/dist/index.js loads the .NET WASM runtime and exposes createEngine()

Key Files

Path	Description
`sdk/wasm/TagDataTranslation.Wasm.csproj`	WASM project (net10.0, browser-wasm)
`sdk/wasm/JsInterop.cs`	JSExport methods callable from JavaScript
`sdk/wasm/Program.cs`	Minimal entry point required by runtime
`sdk/wasm/main.js`	WASM module entry point
`npm/package.json`	npm package metadata
`npm/dist/index.js`	CJS wrapper with `createEngine()`
`npm/dist/index.mjs`	ESM re-export
`npm/dist/index.d.ts`	TypeScript type definitions
`npm/scripts/build.js`	Build script (WASM compile + license copy)
`npm/test/smoke.js`	Smoke test for encode/decode/tryTranslate
`examples/NodeApp/`	Example Node.js app using the published package

.NET 10 WASM Gotchas

SDK: Use Microsoft.NET.Sdk (not Microsoft.NET.Sdk.BlazorWebAssembly) for library-style WASM
AllowUnsafeBlocks: Required — the JSExport source generator emits unsafe code in .NET 10
JsonSerializerIsReflectionEnabledByDefault: Must be true — trimmed WASM disables reflection-based JSON by default, but TDTEngine uses System.Text.Json with reflection to load scheme files
TrimmerRootAssembly: Must include TagDataTranslation — without this, the IL trimmer strips model constructors, causing DeserializeNoConstructor errors at runtime
Entry point: .NET 10 requires a Program.cs with Main (even for library-style WASM)
Output path: .NET 10 outputs to AppBundle/_framework/ (not Blazor's wwwroot/_framework/)
getAssemblyExports: Returns a Promise in .NET 10 — must await it
Initialization order: Call dotnet.create(), then getAssemblyExports(), then runMain()

Publishing to npm

# Build WASM + copy license
cd npm && npm run build

# Set version
npm version 3.x.x --no-git-tag-version

# Publish (opens browser for auth challenge)
npm publish --tag beta --access public   # prerelease
npm publish --access public              # stable release

The build script auto-copies LICENSING.md from the repo root to npm/LICENSE.md (gitignored) so the license ships with every publish.

Package Details

Scope: @mimasu (public)
License: BSL-1.1
Size: ~2.6 MB compressed, ~15.7 MB unpacked (includes .NET WASM runtime)
Engine requirement: Node.js >= 18.0.0

File Locations

Path	Description
`src/TagDataTranslation/`	Main library
`src/TagDataTranslation/Schemes2/`	JSON scheme definitions
`src/TagDataTranslation/Encoding/`	Encoding helper classes
`src/TagDataTranslation/Tables/`	Lookup tables (Table F, K, E, B)
`test/TagDataTranslation.Tests/`	Unit tests
`docs/`	Errata and plans
`docs/TDS-2.3-Errata.md`	Known errors in TDS 2.3 specification
`docs/Scheme-Conversion-Errata.md`	Errors found in XML to JSON conversion

Adding New Schemes

Create JSON scheme file in src/TagDataTranslation/Schemes2/
Define all levels (BINARY, BARE_IDENTIFIER, etc.)
Add BINARY pattern with appropriate capture groups
For '++' schemes, add variableLengthField and hostnameField definitions
Add tests in appropriate test file

Performance

Caching Architecture

The Translate() hot path uses several caches to avoid repeated work:

Regex cache: ConcurrentDictionary<string, Regex> in TDTEngine — compiled regex patterns shared across all engine instances
Character set regex cache: ConcurrentDictionary<string, Regex?> in RuleExecutor — caches ValidateCharacterset patterns (null = invalid pattern)
Grammar token cache: ConcurrentDictionary<string, GrammarToken[]> in TDTEngine — parsed grammar strings cached as token arrays
Pre-sorted fields/rules: Option.Field sorted by Seq at load time; Level.ExtractRules/FormatRules pre-split and sorted at load time
BinaryConverter lookup tables: Static arrays for hex↔binary conversion (no per-character Convert calls)
Static grammar regex: Single compiled Regex instance for grammar parsing

All caches are static and thread-safe. They grow monotonically (no eviction) which is fine because the set of patterns/grammars is bounded by the scheme definitions.

Benchmarks

Run with dotnet run -c Release --project test/TagDataTranslation.Benchmarks. Results on Apple M1 Pro, .NET 8.0:

Benchmark	Mean	Allocated
SGTIN-96 encode	7.82 us	9.9 KB
SGTIN-96 decode	7.65 us	9.2 KB
SGTIN++ encode	24.31 us	75.3 KB
SGTIN++ decode	5.02 us	7.8 KB
HexToBinary (96-bit)	99 ns	480 B
BinaryToHex (96-bit)	54 ns	192 B
Failure (random hex)	17.82 us	504 B
Failure (random binary)	18.27 us	784 B

Performance Guidelines

Do NOT create new Regex(pattern) in the hot path — use GetCachedRegex(pattern) in TDTEngine or the RuleExecutor charset cache
Do NOT use .OrderBy().ToList() on fields/rules — they are pre-sorted at load time
Pre-size StringBuilder when output length is predictable (e.g., hex.Length * 4 for HexToBinary)

Debugging Tips

Use TryTranslateDetails() for detailed translation information
Binary patterns must match exactly - check bit counts
For '++' schemes, hostname length is in sequences, not characters
GS1 standards in markdown are in the parent Mimasu repo: docs/standards/md/gs1/

Important Implementation Notes

'+' Scheme JSON Files

The '+' scheme JSON files (SGTIN+.json, SSCC+.json, etc.) are from the GS1 standard and should NOT be modified. They support GS1 Digital Link URIs with ANY hostname, not just id.gs1.org.

'++' Scheme JSON Files

The '++' scheme JSON files are custom implementations and CAN be modified as needed to match the TDS 2.3 specification.

Scheme Selection Ambiguity

When GS1_DIGITAL_LINK input is provided, both '+' and '++' schemes may match the URL pattern:

'+' schemes match URLs with any hostname (e.g., https://id.gs1.org/01/...)
'++' schemes also match URLs with any hostname and capture it for encoding

The engine may select the '++' scheme due to more specific pattern matching. For '+' scheme tests:

Test GS1_DIGITAL_LINK as OUTPUT only (translate from BINARY/BARE_IDENTIFIER to GS1_DIGITAL_LINK)
Do NOT test GS1_DIGITAL_LINK as INPUT (ambiguous which scheme will be selected)

Use ExecuteTestsWithOutputOnly() helper for '+' scheme tests with GS1_DIGITAL_LINK.

Field Name Consistency

Field names MUST match across all levels of a scheme:

BAD: itipBinary in BINARY level, itip in BARE_IDENTIFIER level (no conversion rule)
GOOD: itip in both BINARY and BARE_IDENTIFIER levels

'++' Scheme Specific Notes

DSGTIN++: Requires multiple options for different date types (like DSGTIN+):

Option 0: prodDate (date type indicator 0000)
Option 4: expDate (date type indicator 0100)
etc.

GRAI++: GS1_DIGITAL_LINK should capture 14-digit grai field (not 13 digits + hardcoded 0):

Pattern: \\/8003\\/([0-9]{14})...
Grammar: '/8003/' grai urlEscapedSerial

GDTI++: BARE_IDENTIFIER should use ;serial= separator:

Pattern: ^gdti=([0-9]{13});serial=...
Grammar: 'gdti=' gdti ';serial=' serial ';hostname=' hostname

ITIP++: Use combined itip field (18 digits = gtin + piece + total):

BARE_IDENTIFIER: itip=095211411234540102;serial=rif981;hostname=...
GS1_DIGITAL_LINK: /8006/095211411234540102/21/rif981

License

BSL 1.1 - See LICENSING.md

FilesExpand file tree

CLAUDE.md

Latest commit

History