Skip to content

Commit e4d9e1a

Browse files
committed
Add crate main documentation
1 parent afc0712 commit e4d9e1a

File tree

2 files changed

+95
-15
lines changed

2 files changed

+95
-15
lines changed

src/json_schema/mod.rs

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,19 +7,19 @@
77
//! for regex generation.
88
//!
99
//! ## Supported features
10-
//!
10+
//!
1111
//! Note, that only some of the features of JSON schema are supported for regex generation.
12-
//!
12+
//!
1313
//! ### Supported constraints
14-
//!
14+
//!
1515
//! #### Common
1616
//! - `type`
1717
//! - Specifies the data type (string, number, integer, boolean, array, object, null).
1818
//! - `enum`
1919
//! - Lists the allowed values.
2020
//! - `const`
2121
//! - Specifies a single allowed value.
22-
//!
22+
//!
2323
//! #### Object
2424
//! - `properties`
2525
//! - Defines the expected properties of an object and their schemas.
@@ -52,7 +52,7 @@
5252
//! - `format`
5353
//! - Specifies a pre-defined format, these are supported [`FormatType`]
5454
//!
55-
//! #### Number
55+
//! #### Number
5656
//! - `minDigitsInteger`
5757
//! - Specifies minimum number of digits in the integer part of a numeric value.
5858
//! - `maxDigitsInteger`
@@ -65,36 +65,36 @@
6565
//! - Defines minimum number of digits in the exponent part of a scientific notation number.
6666
//! - `maxDigitsExponent`
6767
//! - Defines maximum number of digits in the exponent part of a scientific notation number.
68-
//!
68+
//!
6969
//! #### Integer
7070
//! - `minDigits`
7171
//! - Defines the minimum number of digits.
7272
//! - `maxDigits`
7373
//! - Defines the maximum number of digits.
74-
//!
74+
//!
7575
//! #### Logical
7676
//! - `allOf`
7777
//! - Combines multiple schemas; all must be valid.
7878
//! - `anyOf`
7979
//! - Combines multiple schemas; at least one must be valid.
8080
//! - `oneOf`
8181
//! - Combines multiple schemas; exactly one must be valid.
82-
//!
82+
//!
8383
//! ### Recursion
84-
//!
84+
//!
8585
//! Currently maximum recursion depth is cautiously defined at the level 3.
86-
//!
86+
//!
8787
//! Note, that in general recursion in regular expressions is not the best approach due to inherent limitations
8888
//! and inefficiencies, especially when applied to complex patterns or large input.
89-
//!
90-
//! But often, even simple referential JSON schemas will produce enormous regex size, since it increases
89+
//!
90+
//! But often, even simple referential JSON schemas will produce enormous regex size, since it increases
9191
//! exponentially in recursive case, which likely to introduce performance issues by consuming large
9292
//! amounts of time, resources and memory.
93-
//!
93+
//!
9494
//! ### References
95-
//!
95+
//!
9696
//! Only local references are currently being supported.
97-
//!
97+
//!
9898
//! ### Unconstrained objects
9999
//!
100100
//! An empty object means unconstrained, allowing any JSON type.

src/lib.rs

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,83 @@
1+
//! # Outlines_core
2+
//!
3+
//! `outlines_core` crate provides a convenient way to:
4+
//!
5+
//! - build regular expressions from JSON schemas
6+
//!
7+
//! - construct an [`index::Index`] object by combining a [`vocabulary::Vocabulary`] and regular
8+
//! expression to efficiently map tokens from a given `Vocabulary` to state transitions in a
9+
//! finite-state automation
10+
//!
11+
//! ## `json_schema`
12+
//!
13+
//! [`json_schema`] module provides interfaces to generate a regular expression based on a given JSON schema, depending on its type:
14+
//! - [`json_schema::regex_from_str`]
15+
//! - [`json_schema::regex_from_value`]
16+
//!
17+
//! Whitespace pattern could be customized, otherwise the default [`json_schema::WHITESPACE`] pattern is used.
18+
//!
19+
//! Note, that not all the features of JSON schema are supported for regex generation: [Supported Features](json_schema#supported-features)
20+
//!
21+
//! ## `Index`
22+
//!
23+
//! Once [`index::Index`] is built, it can be used to evaluate or validate token sequences.
24+
//!
25+
//! ### Complexity and construction cost
26+
//!
27+
//! `Index` can accommodate large vocabularies and complex regular expressions. However, its size **may** grow
28+
//! significantly with the complexity of the input, as well as time and computational resources.
29+
//!
30+
//! ## Python bindings
31+
//!
32+
//! Additionally, crate provides interfaces to integrate the crate's functionality with Python.
33+
//!
34+
//! ## Support
35+
//!
36+
//! `Outlines_core` is primarily used in structured text generation project [`outlines`](https://github.com/dottxt-ai/outlines),
37+
//!
38+
//! ## Example
39+
//!
40+
//! Basic example of how it all fits together.
41+
//!
42+
//! ```rust
43+
//! # use outlines_core::Error;
44+
//! use outlines_core::prelude::*;
45+
//!
46+
//! # fn main() -> Result<(), Error> {
47+
//! // Define a JSON schema
48+
//! let schema = r#"{
49+
//! "type": "object",
50+
//! "properties": {
51+
//! "name": { "type": "string" },
52+
//! "age": { "type": "integer" }
53+
//! },
54+
//! "required": ["name", "age"]
55+
//! }"#;
56+
//!
57+
//! // Generate a regular expression from it
58+
//! let regex = json_schema::regex_from_str(&schema, None)?;
59+
//! println!("Generated regex: {}", regex);
60+
//!
61+
//! // Create `Vocabulary` from pretrained large language model (but manually is also possible)
62+
//! let vocabulary = Vocabulary::from_pretrained("openai-community/gpt2", None);
63+
//!
64+
//! // Create new `Index` from regex and a given `Vocabulary`
65+
//! let index = Index::new(regex, &vocabulary)?;
66+
//!
67+
//! let initial_state = index.initial_state();
68+
//! println!("Is initial state {} a final state? {}", initial_state, index.is_final_state(&initial_state));
69+
//!
70+
//! let allowed_tokens = index.allowed_tokens(&initial_state).expect("Some allowed tokens");
71+
//! println!("Allowed tokens at initial state are {:?}", allowed_tokens);
72+
//!
73+
//! let token_id = allowed_tokens.first().expect("First token");
74+
//! println!("Next state for the token_id {} is {:?}", token_id, index.next_state(&initial_state, token_id));
75+
//! println!("Final states are {:?}", index.final_states());
76+
//! println!("Index has exactly {} transitions", index.transitions().len());
77+
//! # Ok(())
78+
//! }
79+
//! ```
80+
181
pub mod error;
282
pub mod index;
383
pub mod json_schema;

0 commit comments

Comments
 (0)