Skip to content

Conversation

@monoxgas
Copy link
Contributor

@monoxgas monoxgas commented Jun 24, 2025

  • Refactored and cleaned message slicing behaviors.
  • Refactored tokenizers to class instances following more closely with generators.
  • Refactored tool transforms to combine json mechanics.
  • Added documentation for slicing and tokenization
  • Added new test suites for slicing and tokenization

Generated Summary

  • Refactored the tokenization API by replacing the old “rigging/tokenize” module with a new “rigging/tokenizer” module that provides updated types (TokenSlice, TokenizedChat, Tokenizer) and functions (get_tokenizer, register_tokenizer), along with support for lazy loading and caching.
  • Revamped message slicing functionality by introducing new methods on the Message class:
    • New slice management methods such as append_slice, replace_with_slice, mark_slice (with overloads for string, range, regex, or model-type targets), find_slices, get_slice, iter_slices, and remove_slices.
    • Improved content update handling so that slice positions are recalculated automatically, and deprecated methods (e.g. .strip()) now warn with suggestions to use .remove_slices().
  • Updated error handling by renaming InvalidModelSpecifiedError to InvalidGeneratorError and introducing a new InvalidTokenizerError for tokenizer-related issues.
  • Modified generator code to use a global “g_generators” dictionary instead of “g_providers” and updated generator identifier parsing, including better support for additional parameters.
  • Enhanced documentation and examples by adding new docs for topics such as message slicing, tokenization, transforms, and expanded tool invocation examples.
  • Updated tests with new suites for message slicing and tokenizer functionality to validate correct slice creation, updating, removal, and chat-level slice aggregation.
  • Minor adjustments to pyproject.toml and CI workflow files to align dependency declarations and enforce always-run typechecks.

This summary was generated with ❤️ by rigging

Generated Summary

  • Updated the messaging slicing API in the Message class:
    • Introduced new helper methods such as append_slice, replace_with_slice, mark_slice (with overloads), find_slices, get_slice, and iter_slices to better manage, query, and update message slices.
    • Deprecated the older .strip() method and improved the content‐update mechanism so that slice positions are recalculated automatically.
  • Refactored error handling:
    • Renamed InvalidModelSpecifiedError to InvalidGeneratorError throughout the generator modules.
    • Introduced a new InvalidTokenizerError for tokenizer-related issues.
  • Refactored generator API:
    • Switched from using g_providers to g_generators for generator registration and lookup.
    • Updated get_generator to support identifier strings in a more flexible format (with provider, model, and additional arguments) and to raise InvalidGeneratorError on invalid input.
  • Reworked tokenization functionality:
    • Removed the old “rigging.tokenize” module files and consolidated functionality under the new “rigging.tokenizer” package.
    • Updated get_tokenizer to accept an identifier string or a Tokenizer instance, with lazy loading and proper keyword arguments parsing (including base64‐encoded values).
    • Added a new TransformersTokenizer implementation under “rigging/tokenizer/transformers_.py” and registration support via register_tokenizer.
  • Adjusted Chat and ChatList methods:
    • Modified the to_tokens methods to accept a tokenizer as either a string or Tokenizer instance and an optional transform (string or Transform instance), and to use asyncio.gather for batch processing.
  • Updated documentation and tests:
    • New API documentation has been added/updated for message slicing, tokenization, transforms, and tools.
    • New test suites validate the extended message slicing operations and the new tokenizer functionality.

This summary was generated with ❤️ by rigging

Generated Summary

  • Changed error classes in generator and tokenizer modules by renaming InvalidModelSpecifiedError to InvalidGeneratorError and adding InvalidTokenizerError to improve clarity.
  • Refactored the tokenization API by renaming the “tokenize” module to “tokenizer”, moving abstract definitions from the old tokenize package into a new tokenizer package, and updating references in Chat and other modules.
  • Extended message slicing functionality:
    • Added properties (e.g. content getter/setter and str method) and clone method for MessageSlice.
    • Introduced several new slice methods (append_slice, replace_with_slice, mark_slice with overloads, find_slices, get_slice, iter_slices, remove_slices) with improved filtering, ordering, and error handling.
  • Updated Chat’s to_tokens and transform methods to accept a Tokenizer instance or identifier and an optional transform, enforcing new type checks.
  • Updated documentation with new topics on message slicing, tokenization, and transforms, and adjusted API docs (e.g. generator.mdx, error.mdx, model.mdx, transform.mdx, topics/tools.mdx).
  • Enhanced tests by adding extensive test suites for message slicing and tokenization to validate proper functionality, edge cases, and integration with model parsing.
  • Updated dependency management in pyproject.toml (e.g. moving elasticsearch dependency and removing eval-type-backport) to reflect current requirements.

This summary was generated with ❤️ by rigging

…s instances. Added documentation for new features. Added new test suites.
@monoxgas monoxgas requested a review from Copilot June 24, 2025 23:13
@monoxgas monoxgas requested a review from a team as a code owner June 24, 2025 23:13
@dreadnode-renovate-bot dreadnode-renovate-bot bot added area/docs Changes to documentation and guides area/python Changes to Python package configuration and dependencies area/tests Changes to test files and testing infrastructure area/examples Changes to example code and demonstrations type/docs Documentation updates and improvements labels Jun 24, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors how messages are sliced and tokenized, converts tokenizers to class-based instances, unifies JSON/XML tool transforms, and adds documentation and new test suites for slicing and tokenization.

  • Overhauled message slicing API with append_slice, replace_with_slice, and removal of deprecated add_slice/strip methods.
  • Refactored tokenizers into class instances with a get_tokenizer registry and updated chat-level .to_tokens to accept identifier or instance plus optional transforms.
  • Consolidated JSON/XML tool transforms under make_tools_to_json_transform/make_tools_to_xml_transform with a new get_transform helper; updated default tags and parameters.

Reviewed Changes

Copilot reviewed 38 out of 39 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rigging/tokenizer/base.py Corrected exception handling and typing in get_tokenizer
rigging/transform/json_tools.py Adjusted default tool_response_tag and removed redundant logic
rigging/transform/init.py Added get_transform helper but missing "xml" mode case
rigging/transform/xml_tools.py Swapped add_slice usage to replace_with_slice/append_slice
rigging/message.py Fully reimplemented slicing API (MessageSlice, .slices setter)
Comments suppressed due to low confidence (2)

rigging/transform/init.py:29

  • The get_transform switch handles json, json-in-xml, and json-with-tag but omits an "xml" case—consider adding case "xml": return make_tools_to_xml_transform for completeness.
        case _:

rigging/transform/json_tools.py:161

  • The default tool_response_tag was changed from "tool_response" to "tool-response". Ensure this aligns with downstream tag lookups and documentation to avoid mismatches.
            tool_response_tag = tool_response_tag or "tool-response"

@dreadnode-renovate-bot dreadnode-renovate-bot bot added the area/github Changes made to GitHub Actions label Jun 25, 2025
@monoxgas monoxgas merged commit 5506ed6 into main Jun 25, 2025
7 checks passed
@monoxgas monoxgas deleted the feat/slicing-and-tokenizing-updates branch June 25, 2025 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docs Changes to documentation and guides area/examples Changes to example code and demonstrations area/github Changes made to GitHub Actions area/python Changes to Python package configuration and dependencies area/tests Changes to test files and testing infrastructure type/docs Documentation updates and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants