Skip to content

Commit 4a7d2dc

Browse files
committed
Minor optimisation
1 parent 3b7cd4f commit 4a7d2dc

File tree

19 files changed

+255
-121
lines changed

19 files changed

+255
-121
lines changed

README.md

Lines changed: 51 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,48 @@
1-
# Celma
1+
# Celma
22

33
[![stable](http://badges.github.io/stability-badges/dist/stable.svg)](http://github.com/badges/stability-badges)
44

55
[Celma ("k")noun "channel" (KEL) in Quenya](https://www.elfdict.com/w/kelma)
66

7-
Celma is a generalised parser combinator implementation. Generalised means not an implementation restricted to a stream of characters.
7+
Celma is a generalised parser combinator implementation. Generalised means not an implementation restricted to a stream
8+
of characters.
89

910
## Overview
1011

11-
Generalization is the capability to design a parser based on pipelined parsers and separate parsers regarding their semantic level.
12+
Generalization is the capability to design a parser based on pipelined parsers and separate parsers regarding their
13+
semantic level.
1214

1315
# Celma parser meta language
1416

1517
## Grammar
18+
1619
In order to have a seamless parser definition two dedicated `proc_macro` are designed:
1720

1821
```rust
19-
parsec_rules = "pub"? "let" ident ('{' rust_type '}')? (':' '{' rust_type '}')? "=" parser)+
22+
parsec_rules = "pub" ? "let" ident ('{' rust_type '}') ? (':' '{' rust_type '}') ? "=" parser) +
2023
parser = binding? atom occurrence? additional? transform?
2124
```
2225

2326
```rust
2427
binding = ident '='
2528
occurrence = ("*" | "+" | "?")
26-
additional = "|"? parser
29+
additional = "|" ? parser
2730
transform = "->" '{' rust_code '}'
2831
atom = alter? '(' parser ')' | CHAR | STRING | ident
29-
alter = ("^"|"!"|"#"|"/")
30-
ident = [a..zA..Z][a..zA..Z0..9_]* - {"let"}
32+
alter = ("^" | "!" | "#" | "/")
33+
ident = [a..zA..Z][a..zA..Z0..9_] * - {"let"}
3134
```
3235

3336
The `alter` is an annotation where:
37+
3438
- `^` allows the capability to recognize negation,
35-
- `!` allows the capability to backtrack on failure and
39+
- `!` allows the capability to backtrack on failure and
3640
- `#` allows the capability to capture all chars.
3741
- `/` allows the capability to lookahead without consuming scanned elements.
3842

39-
The `#` alteration is important because it prevents massive list construction in memory.
43+
The `#` alteration is important because it prevents massive list construction in memory.
4044

41-
## Using the meta-language
45+
## Using the meta-language
4246

4347
Therefore, a parser can be defined using this meta-language.
4448

@@ -50,7 +54,8 @@ let parser = parsec!(
5054

5155
## A Full Example: JSON
5256

53-
A [JSon parser](https://github.com/d-plaindoux/celma/blob/master/macro/benches/json.rs#L61) can be designed thanks to the Celma parser meta language.
57+
A [JSon parser](https://github.com/d-plaindoux/celma/blob/master/macro/benches/json.rs#L61) can be designed thanks to
58+
the Celma parser meta language.
5459

5560
### JSon abstract data type
5661

@@ -66,7 +71,7 @@ pub enum JSON {
6671
}
6772
```
6873

69-
### Transformation functions
74+
### Transformation functions
7075

7176
```rust
7277
fn mk_vec<E>(a: Option<(E, Vec<E>)>) -> Vec<E> {
@@ -91,7 +96,7 @@ fn mk_f64(a: Vec<char>) -> f64 {
9196

9297
### The JSon parser
9398

94-
The JSon parser is define by six rules dedicated to `number`, `string`, `null`, `boolean`, `array`
99+
The JSon parser is define by six rules dedicated to `number`, `string`, `null`, `boolean`, `array`
95100
and `object`.
96101

97102
#### JSON Rules
@@ -123,9 +128,11 @@ parsec_rules!(
123128

124129
## The expression parser thanks to pipelined parsers.
125130

126-
The previous parser mixes char analysis and high-level term construction. This can be done in a different manner since Celma is a generalized parser combinator implementation.
131+
The previous parser mixes char analysis and high-level term construction. This can be done in a different manner since
132+
Celma is a generalized parser combinator implementation.
127133

128-
For instance a first parser dedicated to lexeme recognition can be designed. Then on top of this lexer an expression parser can be easily designed.
134+
For instance a first parser dedicated to lexeme recognition can be designed. Then on top of this lexer an expression
135+
parser can be easily designed.
129136

130137
### Tokenizer
131138

@@ -142,7 +149,7 @@ parsec_rules!(
142149

143150
### Lexemes
144151

145-
The Lexeme parser recognizes simple token keywords.
152+
The Lexeme parser recognizes simple token keywords.
146153

147154
```rust
148155
parsec_rules!(
@@ -155,8 +162,10 @@ parsec_rules!(
155162

156163
### Expression parser
157164

158-
The expression parser builds expression consuming tokens. For this purpose the stream type can be specified for each parser. If it's not the case the default one is `char`.
159-
In the following example the declaration `expr{Token}:{Expr}` denotes a parser consuming a `Token` stream and producing an `Expr`.
165+
The expression parser builds expression consuming tokens. For this purpose the stream type can be specified for each
166+
parser. If it's not the case the default one is `char`.
167+
In the following example the declaration `expr{Token}:{Expr}` denotes a parser consuming a `Token` stream and producing
168+
an `Expr`.
160169

161170
```rust
162171
parsec_rules!(
@@ -173,50 +182,59 @@ parsec_rules!(
173182

174183
```rust
175184
let tokenizer = token();
176-
let stream = ParserStream::new(&tokenizer, CharStream::new("1 + 2"));
185+
let stream = ParserStream::new( & tokenizer, CharStream::new("1 + 2"));
177186
let response = expr().and_left(eos()).parse(stream);
178187

179188
match response {
180-
Success(v, _, _) => assert_eq!(v.eval(), 3),
181-
_ => assert_eq!(true, false),
189+
Success(v, _, _) => assert_eq!(v.eval(), 3),
190+
_ => assert_eq!(true, false),
182191
}
183192
```
184193

185194
# Celma language internal design
186195

187196
Celma is an embedded language in Rust for building simple parsers.
188197
The language is processed when Rust is compiled. To this end, we
189-
identify two steps. The first is to analyse the language using a
190-
syntax analyser in a direct style. Then, this parser is invoked
191-
during the compilation phase, using a procedural macro dedicated
198+
identify two steps. The first is to analyse the language using a
199+
syntax analyser in a direct style. Then, this parser is invoked
200+
during the compilation phase, using a procedural macro dedicated
192201
to Rust to manage the language in Rust.
193202

194203
## V0
195204

196-
In the V0 the transpilation is a direct style Parsec generation without any
197-
optimisations cf. [celma parser in direct style](https://github.com/d-plaindoux/celma/blob/master/lang/v0/parser/src/parser.rs).
205+
In V0, transpilation is a direct style generation of Parsec without any
206+
optimisations. To this end, the `AST` is translated directly into a parser
207+
parser using the `core` library.
208+
cf. [celma parser in direct style](https://github.com/d-plaindoux/celma/blob/master/lang/v0/parser/src/parser.rs).
198209

199210
## V1
200211

201-
This version target an aggressive and an efficient parser compilation. For this
202-
purpose the compilation follows a traditional control and data flow mainly inspired
203-
by the papers like [A Typed, Algebraic Approach to Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/a-typed-algebraic-approach-to-parsing.pdf)
204-
and [Fusing Lexing and Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/fusing-lexing-and-parsing.pdf).
212+
This version targets an aggressive and an efficient parser compilation. For this
213+
purpose the compilation follows a traditional control and data flow inspired by
214+
the following papers:
215+
- [A Typed, Algebraic Approach to Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/a-typed-algebraic-approach-to-parsing.pdf) nd
216+
- [Fusing Lexing and Parsing](https://www.cl.cam.ac.uk/~jdy22/papers/fusing-lexing-and-parsing.pdf).
217+
218+
### Celma AST generation
205219

206220
First, we express [Celma in Celma](https://github.com/d-plaindoux/celma/blob/master/lang/v1/parser/src/parser.rs).
207221
This gives us an AST denoting parsers expressed using the Celma language i.e. Celma(v1) thanks to Celma(v0).
208222

209223
### Normalisation
210224

211-
Work in progress
225+
The first step is to produce the **Deterministic Greibach Normal Form**
226+
of a given grammar. For this purpose we have a first AST for the grammar
227+
abstract denotation.
228+
229+
NOTE: Work in progress
212230

213231
### Fusion
214232

215-
Work in progress
233+
NOTE: Work in progress
216234

217235
### Staging
218236

219-
Work in progress
237+
NOTE: Work in progress
220238

221239
# License
222240

genlex/tests/lib.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,3 @@
1313
See the License for the specific language governing permissions and
1414
limitations under the License.
1515
*/
16-

lang/v0/ast/src/syntax.rs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,6 @@ pub enum ASTParsec {
3232
PLookahead(Box<ASTParsec>),
3333
}
3434

35-
36-
3735
impl ASTParsec {
3836
pub fn wrap(self) -> Box<Self> {
3937
Box::new(self)

lang/v0/core/benches/array_bench.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
#[macro_use]
1818
extern crate bencher;
1919

20-
use bencher::{black_box, Bencher};
20+
use bencher::{Bencher, black_box};
2121

2222
use celma_v0_core::parser::and::AndOperation;
2323
use celma_v0_core::parser::core::any;
@@ -103,7 +103,7 @@ fn basic_a_and_b(bencher: &mut Bencher) {
103103
}
104104

105105
fn basic_delimited_string(bencher: &mut Bencher) {
106-
let data = b"\"hello\"".to_vec().repeat(SIZE);
106+
let data = b"\"hello world!\"".to_vec().repeat(SIZE);
107107

108108
let parser = u8('"')
109109
.and(not_u8('"').opt_rep())

lang/v0/core/benches/iter_bench.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
#[macro_use]
1818
extern crate bencher;
1919

20-
use bencher::{black_box, Bencher};
20+
use bencher::{Bencher, black_box};
2121

2222
use celma_v0_core::parser::and::AndOperation;
2323
use celma_v0_core::parser::char::a_char;

lang/v0/core/benches/str_bench.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
#[macro_use]
1818
extern crate bencher;
1919

20-
use bencher::{black_box, Bencher};
20+
use bencher::{Bencher, black_box};
2121

2222
use celma_v0_core::parser::and::AndOperation;
2323
use celma_v0_core::parser::char::a_char;
@@ -94,7 +94,7 @@ fn basic_a_and_b(bencher: &mut Bencher) {
9494
}
9595

9696
fn basic_delimited_string(bencher: &mut Bencher) {
97-
let string = "\"hello\"".repeat(SIZE);
97+
let string = "\"hello world!\"".repeat(SIZE);
9898
let data = string.as_str();
9999

100100
let parser = delimited_string().opt_rep().and(eos());

lang/v0/core/examples/expression.rs

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ where
6161
S: Stream<Item = char>,
6262
{
6363
alpha()
64+
.or(a_char('🙃'))
6465
.rep()
6566
.map(|v| Token::Ident(v.into_iter().collect::<String>()))
6667
}
@@ -126,8 +127,8 @@ fn main() {
126127
_ => println!("KO"),
127128
}
128129

129-
match ident().and(eos()).left().parse(CharStream::new("Toto")) {
130-
Success(Token::Ident(ref s), _, _) if *s == String::from("Toto") => {
130+
match ident().and(eos()).left().parse(CharStream::new("Toto🙃")) {
131+
Success(Token::Ident(ref s), _, _) if *s == String::from("Toto🙃") => {
131132
println!("Ident = {}", s)
132133
}
133134
_ => println!("KO"),

lang/v0/core/src/parser/literal.rs

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -73,14 +73,16 @@ pub fn escaped<'a, S>() -> impl Parse<char, S> + Combine<char> + 'a
7373
where
7474
S: Stream<Item = char> + 'a,
7575
{
76-
string(r#"\'"#)
77-
.map(|_| '\'')
78-
.or(string(r#"\""#).map(|_| '\"'))
79-
.or(string(r#"\\"#).map(|_| '\\'))
80-
.or(string(r#"\n"#).map(|_| '\n'))
81-
.or(string(r#"\r"#).map(|_| '\r'))
82-
.or(string(r#"\t"#).map(|_| '\t'))
83-
.or(string(r#"\0"#).map(|_| '\0'))
76+
a_char('\\').and_right(
77+
a_char('\'')
78+
.map(|_| '\'')
79+
.or(a_char('"').map(|_| '\"'))
80+
.or(a_char('\\').map(|_| '\\'))
81+
.or(a_char('n').map(|_| '\n'))
82+
.or(a_char('r').map(|_| '\r'))
83+
.or(a_char('t').map(|_| '\t'))
84+
.or(a_char('0').map(|_| '\0')),
85+
)
8486
// etc. TODO
8587
}
8688

lang/v0/macro/benches/json.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
#[macro_use]
1818
extern crate bencher;
1919

20-
use bencher::{black_box, Bencher};
20+
use bencher::{Bencher, black_box};
2121

2222
use celma_v0_core::parser::and::AndOperation;
2323
use celma_v0_core::parser::char::{digit, space};

lang/v0/parser/src/parser.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -264,8 +264,8 @@ where
264264
skip().and_right(parsec()).and_left(skip()).and_left(eos())
265265
}
266266

267-
pub fn celma_parsec_rules<'a, S>(
268-
) -> impl Parse<Vec<ASTParsecRule>, S> + Combine<Vec<ASTParsecRule>> + 'a
267+
pub fn celma_parsec_rules<'a, S>()
268+
-> impl Parse<Vec<ASTParsecRule>, S> + Combine<Vec<ASTParsecRule>> + 'a
269269
where
270270
S: Stream<Item = char> + 'a,
271271
{

0 commit comments

Comments
 (0)