This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This repository is a Rust implementation of twitter-text that parses tweet text using a Pest PEG grammar. It includes multiple language bindings (Ruby, Python, Java, C++) and maintains conformance with the original twitter-text specification.
The project uses Bazel as the primary build system, with Cargo available for standalone Rust development.
New crates are add by modifying //3rdparty:crates_vendor, running that, and then making sure the Cargo.toml files match.
# Build all Rust code
bazel build //rust/...
# Run all tests
bazel test //rust/...
# Build specific language bindings
bazel build //rust/ruby-bindings:twittertext
bazel build //rust/java-bindings:twitter_text_java_ffm
bazel build //rust/cpp-bindings/...
# Run specific test suites
bazel test //rust/ruby-bindings/spec:all
bazel test //rust/java-bindings/tests:all
bazel test //rust/cpp-bindings/...# Build Rust workspace
cargo build
# Run Rust tests
cargo test
# Test a specific package
cargo test -p twitter-text
cargo test -p twitter-text-parser-
PEG Grammar Parser (
rust/parser/)- Uses Pest parser generator with
twitter_text.pestgrammar - Parses URLs, @mentions, #hashtags, $cashtags, and emojis
- Order of entity matching is significant (URLs before hashtags to resolve ambiguities)
- Uses Pest parser generator with
-
Main Library (
rust/twitter-text/src/)lib.rs: Public API andTwitterTextParseResultsstructextractor.rs: Core extraction logic, validates URLs and character weightsvalidator.rs: Tweet validation (length limits, invalid characters)autolinker.rs: Converts entities to HTML linkshit_highlighter.rs: Highlights search hits in tweet textentity.rs: Entity data structures
-
Configuration (
rust/config/)- Loads config from
config/v*.json - Defines character weights and URL length limits
- Used for tweet length calculation (280 weighted characters)
- Loads config from
-
Conformance Tests (
rust/conformance/)- Tests against YAML test suites in
conformance/directory - Cross-platform test definitions shared with other twitter-text implementations
- Do not modify conformance YAML files directly; they are canonical
- Tests against YAML test suites in
- Ruby (
rust/ruby-bindings/): Uses Magnus FFI, requires Ruby 3.3+ - Java (
rust/java-bindings/): Uses JDK 23+ Foreign Function & Memory API (Project Panama) - Python (
rust/python-bindings/): Uses PyO3 - C++ (
rust/cpp-bindings/): Uses cxx.rs for safe Rust-C++ interop - Swift (
rust/swift-bindings/): Uses C FFI with auto-generated Clang modules
# Bazel test with filter
bazel test //rust/twitter-text:tests --test_filter=test_name
# Cargo test with filter
cargo test test_name
# Run specific conformance test file
bazel test //rust/conformance:autolink_testThe conformance suite (conformance/*.yml) defines canonical test cases. All implementations must pass these tests. Tests cover:
- Autolink (URL/mention/hashtag linking)
- Extract (entity extraction)
- Validation (tweet validity)
- Hit highlighting
- URLs are weighted based on
Configuration.transformed_url_length - Character weights are defined in
Configuration.ranges(most characters: 2, ASCII/Latin-1: 1) - Tweet length limit: 280 weighted characters
- Input text is normalized to Unicode NFC form
- Use
ValidatingExtractor::new_with_nfc_input()if input is already NFC-normalized for efficiency
The PEG grammar processes entities in this order:
- URLs (including t.co short URLs)
- Hashtags
- Mentions
- Cashtags
This order prevents ambiguities (e.g., goo.gl/4udoLK could match as hashtag but should be URL).
- Rust Changes: Edit code in
rust/directories, runcargo testfor quick feedback - Grammar Changes: Modify
rust/parser/src/twitter_text.pest, rebuild parser - Conformance: Ensure changes pass
bazel test //rust/conformance:all - Bindings: Test language bindings after core changes to ensure FFI compatibility
- Rust: 1.91.1 (edition 2021)
- Bazel: 8.4.2+
- Ruby: 3.3+ (for Ruby bindings, requires libyaml)
- Java: JDK 23+ (for Java FFM bindings)
- Python: 3.12 (for Python bindings)
- LLVM: 17.0.6 (hermetic toolchain via Bazel)
The Swift bindings (rust/swift-bindings/) provide a Swift-friendly API over the shared C FFI layer.
# Build Swift library (works on both Linux and macOS)
bazel build //rust/swift-bindings:TwitterText
# Produces: libTwitterText.a, TwitterText.swiftmodule, TwitterText.swiftdocOn macOS (recommended):
# Tests work via Bazel on macOS
bazel test //rust/swift-bindings:TwitterTextTestsOn Linux:
# Bazel tests don't work on Linux due to rules_swift linker compatibility
# Use Swift Package Manager instead:
cd rust/swift-bindings
swift test- Linux: Uses hermetic Swift 6.0.3 toolchain downloaded by Bazel (see
MODULE.bazel) - macOS: Uses system Swift from Xcode (automatically detected by rules_swift)
- C Interop: C headers from
//rust/ffi-bindingsare auto-wrapped viaswift_interop_hint - Module Name: Swift code imports
CTwitterTextto access C FFI functions
Swift binary tests cannot run via Bazel on Linux because rules_swift passes Darwin-specific linker flags (-add_ast_path) that ld.lld doesn't understand. This is a rules_swift limitation when targeting Linux. The workaround is to use Swift Package Manager for testing on Linux, or test on macOS where the native toolchain works correctly.
- Ruby bindings require system libyaml (
brew install libyamlon macOS) - Java bindings use jextract to generate FFM code from C headers
- Sanitizer tests available:
bazel test //rust/cpp-sanitizer-bindings/... - MODULE.bazel contains all external dependencies and toolchain configurations
- Swift bindings share the same C FFI layer (
//rust/ffi-bindings) with Java and C++ bindings