Skip to content

Commit 70f8e8f

Browse files
committed
Update README.md
1 parent 35cf616 commit 70f8e8f

File tree

1 file changed

+90
-15
lines changed

1 file changed

+90
-15
lines changed

README.md

Lines changed: 90 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,101 @@
1-
twitter-text in Rust
2-
============
1+
# twitter-text in Rust
32

4-
This repo is a Rust implementation of twitter-text. All aspects of tweet text are parsed by a [Pest](https://github.com/pest-parser/pest) [PEG](https://en.wikipedia.org/wiki/Parsing_expression_grammar) grammar, with the exception of URL length and character weighting. See the [parser](rust/parser/src) directory for the grammar. Procedural validation for URL lengths and character weights is performed by the [Extractor](rust/twitter-text/src/extractor.rs) code.
3+
A Rust implementation of [twitter-text](https://github.com/twitter/twitter-text) that parses tweet text using a [Pest](https://github.com/pest-parser/pest) [PEG](https://en.wikipedia.org/wiki/Parsing_expression_grammar) grammar. Includes bindings for Ruby, Python, Java, C++, Swift, and WebAssembly.
54

6-
To run the tests, [install Rust](https://www.rust-lang.org/tools/install), and then try this in the terminal:
5+
## Features
6+
7+
- **Entity extraction**: URLs, @mentions, #hashtags, $cashtags, and emoji
8+
- **Tweet validation**: 280 weighted character limit with configurable weights
9+
- **Autolinking**: Convert entities to HTML links
10+
- **Hit highlighting**: Highlight search terms in tweet text
11+
- **Unicode 17.0**: Full emoji support including ZWJ sequences and skin tone modifiers
12+
13+
## Quick Start
14+
15+
### Using Cargo
16+
17+
```bash
18+
cargo build
19+
cargo test
720
```
8-
> cargo build
9-
> cargo test
21+
22+
### Using Bazel
23+
24+
```bash
25+
# Build everything
26+
bazel build //rust/...
27+
28+
# Run all tests
29+
bazel test //rust/...
1030
```
1131

12-
### Ruby Bindings
32+
## Language Bindings
1333

14-
The Ruby bindings require **Ruby 3.4.1 or higher**. If you're on macOS with the system Ruby (2.6.x), you'll need to install a newer version:
34+
| Language | Directory | Requirements | Technology |
35+
|----------|-----------|--------------|------------|
36+
| Ruby | `rust/ruby-bindings/` | Ruby 3.3+ | [Magnus](https://github.com/matsadler/magnus) FFI |
37+
| Python | `rust/python-bindings/` | Python 3.12 | [PyO3](https://github.com/PyO3/pyo3) |
38+
| Java | `rust/java-bindings/` | JDK 23+ | Foreign Function & Memory API |
39+
| C++ | `rust/cpp-bindings/` | C++17 | [cxx.rs](https://github.com/dtolnay/cxx) |
40+
| Swift | `rust/swift-bindings/` | Swift 6.0+ | C FFI |
41+
| WebAssembly | `rust/wasm-bindings/` | - | [wasm-bindgen](https://github.com/AshleyScirra/nicerm) |
42+
43+
### Building Bindings
1544

16-
**Option 1: Homebrew (simplest)**
1745
```bash
18-
brew install ruby
46+
# Ruby
47+
bazel build //rust/ruby-bindings:twittertext
48+
49+
# Python
50+
bazel build //rust/python-bindings:twitter_text
51+
52+
# Java
53+
bazel build //rust/java-bindings:twitter_text_java_ffm
54+
55+
# C++
56+
bazel build //rust/cpp-bindings/...
57+
58+
# Swift
59+
bazel build //rust/swift-bindings:TwitterText
60+
61+
# WebAssembly
62+
bazel build //rust/wasm-bindings:twitter_text_wasm
1963
```
2064

21-
**Option 2: Ruby version manager**
22-
* [rbenv](https://github.com/rbenv/rbenv)
23-
* [rvm](https://rvm.io/)
24-
* [asdf](https://asdf-vm.com/)
65+
## Architecture
66+
67+
### Core Components
68+
69+
- **PEG Grammar Parser** (`rust/parser/`): Pest grammar for parsing tweet entities
70+
- **Main Library** (`rust/twitter-text/`): Extraction, validation, autolinking, and highlighting
71+
- **Configuration** (`rust/config/`): Character weights and URL length settings
72+
- **Conformance Tests** (`rust/conformance/`): Tests against canonical twitter-text test suites
73+
74+
### Entity Parsing Order
75+
76+
The grammar processes entities in this order to resolve ambiguities:
77+
1. URLs (including t.co short URLs)
78+
2. Hashtags
79+
3. Mentions
80+
4. Cashtags
81+
82+
## Dependencies
83+
84+
- **Rust**: 1.91.1+
85+
- **Bazel**: 8.4.2+ (for full build)
86+
- **Ruby**: 3.3+ (requires libyaml: `brew install libyaml` on macOS)
87+
- **Python**: 3.12
88+
- **Java**: JDK 23+
89+
- **LLVM**: 17.0.6 (hermetic toolchain via Bazel)
90+
91+
## Conformance
92+
93+
This implementation passes the canonical twitter-text conformance tests in `conformance/*.yml`. These tests cover:
94+
- Autolink (URL/mention/hashtag linking)
95+
- Extract (entity extraction)
96+
- Validation (tweet validity)
97+
- Hit highlighting
98+
99+
## License
25100

26-
The Ruby bindings use the [magnus](https://github.com/matsadler/magnus) crate which requires Ruby 3.2.3+ APIs.
101+
Apache 2.0

0 commit comments

Comments
 (0)