Skip to content

Milestones

List view

  • Advanced Features - Full Tokenizer API This milestone completes the tokenizer API with advanced features including training, serialization, and pipeline component access. **Key Features:** - Training Support (Train, TrainFromIterator) - Serialization (Save, ToJSON) - Pipeline Component Access (get/set normalizer, pre-tokenizer, post-processor) **Target:** Complete feature parity with HuggingFace tokenizers for advanced use cases including custom tokenizer training and fine-grained component control.

    No due date
    0/3 issues closed
  • Extended Functionality - Medium Priority Features This milestone extends the tokenizer with dynamic token management and enhanced encoding information capabilities. **Key Features:** - Dynamic Token Management (AddTokens, AddSpecialTokens) - Enhanced Encoding Information (WordIDs, SequenceIDs, mapping methods) **Target:** Advanced functionality for use cases requiring runtime vocabulary modification and detailed encoding analysis.

    No due date
    0/2 issues closed
  • Core Functionality - High Priority Features This milestone focuses on essential batch processing and token/vocabulary access functionality that extends the core tokenization capabilities. **Key Features:** - Batch Processing (EncodeBatch, DecodeBatch) - Token/Vocabulary Access (TokenToID, IDToToken, GetVocab) **Target:** Essential functionality for production use cases requiring batch operations and vocabulary introspection.

    No due date
    0/11 issues closed