Skip to content

Commit 58f5324

Browse files
add ixmlGrammar(), add option FAIL_ON_ERROR
1 parent 5de45d3 commit 58f5324

File tree

10 files changed

+148
-43
lines changed

10 files changed

+148
-43
lines changed

README.md

Lines changed: 37 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,49 +12,68 @@ Markup Blitz is an implementation of [Invisible XML][IXML] (ixml). Please see th
1212

1313
# Building Markup Blitz
1414

15-
Use JDK 11 or higher. For building Markup Blitz, use these commands:
15+
Use JDK 11 or higher. For building Markup Blitz, clone this GitHub repository and go to the resulting directory:
1616

1717
```sh
1818
git clone https://github.com/GuntherRademacher/markup-blitz.git
1919
cd markup-blitz
20+
```
21+
22+
Then run this command, on Unix/Linux,
23+
24+
```sh
25+
./gradlew clean jar
26+
```
27+
28+
or this one, on Windows
29+
30+
```sh
2031
gradlew clean jar
2132
```
2233

23-
This creates `build\libs\markup-blitz.jar` which serves the Markup Blitz API. It is also usable as an executable jar for standalone execution.
34+
This creates `build/libs/markup-blitz-1.1.jar` which serves the Markup Blitz API. It is also usable as an executable jar for standalone execution.
2435

2536
# Running tests
2637

27-
For running the tests, use this command:
38+
For running the tests, use this command on Unix/Linux,
39+
40+
```sh
41+
./gradlew test
42+
```
43+
44+
or this one on Windows:
2845

2946
```sh
3047
gradlew test
3148
```
3249

33-
Markup Blitz comes with a few tests, but it also passes all of the 3091 tests in the Invisible XML community project [ixml][GHIXML]. For running those as well, make sure that the [ixml][GHIXML] project is available next to the Markup Blitz project and use the above command.
50+
Markup Blitz comes with a few tests, but it also passes all of the 3091 tests in the Invisible XML community project [ixml][GHIXML]. For running those as well, make sure that the [ixml][GHIXML] project is available next to the [markup-blitz][markup-blitz] project and use the above command.
3451

3552
# Markup Blitz in Eclipse
3653

3754
The project can be imported into Eclipse as a Gradle project.
3855

3956
# Markup Blitz on Maven Central
4057

41-
Markup Blitz is available on Maven Central with groupId `de.bottlecaps` and artifactId `markup-blitz`.
58+
Markup Blitz is available on [Maven Central][maven-central] with groupId `de.bottlecaps` and artifactId `markup-blitz`.
4259

4360
# Running Markup Blitz from command line
4461

4562
Markup Blitz can be run from command line to process some input according to an Invisible XML grammar:
4663

4764
```txt
48-
Usage: java -jar markup-blitz.jar [<OPTION>...] <GRAMMAR> <INPUT>
65+
Usage: java -jar markup-blitz-1.1.jar [<OPTION>...] [<GRAMMAR>] <INPUT>
4966
5067
Compile an Invisible XML grammar, and parse input with the resulting parser.
5168
5269
<GRAMMAR> the grammar (literal, file name or URL), in ixml notation.
70+
When omitted, the ixml grammar will be used.
5371
<INPUT> the input (literal, file name or URL).
5472
5573
<OPTION>:
5674
--indent generate resulting xml with indentation.
5775
--trace print parser trace.
76+
--fail-on-error throw an exception instead of returning an error document.
5877
--timing print timing information.
5978
--verbose print intermediate results.
6079
@@ -100,15 +119,19 @@ public String parse(String input, Option... options)
100119
**Returns:** `String`: the resulting XML
101120

102121
### de.bottlecaps.markup.Blitz.Option
103-
Either of the `generate` and `parse` methods accepts `Option` arguments for creating extra diagnostic output. Generation time options are passed to the `Parser` object implicitly, and they are used at parsing time, when `parse`is called without any options.
122+
Either of the `generate` and `parse` methods accepts `Option` arguments for creating extra diagnostic output. Generation time options are passed to the `Parser` object implicitly, and they are used at parsing time, when `parse` is called without any options.
123+
104124
```java
125+
/** Parser and generator options. */
105126
public enum Option {
106-
/** Print information on intermediate results. */ VERBOSE,
107-
/** Print timing information. */ TIMING,
108-
/** Generate XML with indentation. */ INDENT,
109-
/** Print parser trace. */ TRACE;
127+
/** Parser option: Generate XML with indentation. */ INDENT,
128+
/** Parser option: Print parser trace. */ TRACE,
129+
/** Parser option: Fail on parsing error. */ FAIL_ON_ERROR,
130+
/** Generator option: Print timing information. */ TIMING,
131+
/** Generator option: Print information on intermediate results. */ VERBOSE;
110132
}
111133
```
134+
112135
# Performance
113136

114137
As with [REx Parser Generator][REx], the goal of Markup Blitz is to provide good performance. In general, however, REx parsers can be expected to perform much better. This is primarily because REx allows separating the specification into tokenization and parsing steps. This is in contrast to Invisible XML, which uses a uniform grammar to resolve from the start symbol down to codepoint level. Separate tokenization enables the use of algorithms optimized for this purpose, the establishment of token termination rules, and the easy accommodation of whitespace rules. Without it, all of this has to be accomplished by the parser alone, which often leads to costly handling of local ambiguities.
@@ -138,5 +161,7 @@ The work in this project was supported by the [BaseX][BaseX] organization.
138161
[parser]: https://en.wikipedia.org/wiki/Parsing#Parser
139162
[parse-tree]: https://en.wikipedia.org/wiki/Parse_tree
140163
[parser-generator]: https://en.wikipedia.org/wiki/Compiler-compiler
141-
[fnInvisibleXml]: https://github.com/qt4cg/qtspecs/issues/238
164+
[fnInvisibleXml]: https://qt4cg.org/pr/791/xpath-functions-40/Overview.html#func-invisible-xml
142165
[BXFiddle]: https://bxfiddle.cloud.basexgmbh.de/
166+
[markup-blitz]: https://github.com/GuntherRademacher/markup-blitz
167+
[maven-central]: https://github.com/GuntherRademacher/markup-blitz

src/main/java/de/bottlecaps/markup/Blitz.java

Lines changed: 37 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -25,18 +25,22 @@
2525
* @author Gunther Rademacher
2626
*/
2727
public class Blitz {
28-
/** Generation time and parse time options. */
28+
/** The ixml grammar resource. */
29+
public final static String IXML_GRAMMAR_RESOURCE = "de/bottlecaps/markup/blitz/ixml.ixml";
30+
31+
/** Parser and generator options. */
2932
public enum Option {
30-
/** Generate XML with indentation. */ INDENT,
31-
/** Print parser trace. */ TRACE,
32-
/** Print timing information. */ TIMING,
33-
/** Print information on intermediate results. */ VERBOSE;
33+
/** Parser option: Generate XML with indentation. */ INDENT,
34+
/** Parser option: Print parser trace. */ TRACE,
35+
/** Parser option: Fail on parsing error. */ FAIL_ON_ERROR,
36+
/** Generator option: Print timing information. */ TIMING,
37+
/** Generator option: Print information on intermediate results. */ VERBOSE;
3438
}
3539

3640
/**
3741
* Generate a parser from an Invisible XML grammar in ixml notation.
3842
*
39-
* @param grammar the Invisible XML grammar in ixml notation
43+
* @param grammar the Invisible XML grammar in ixml notation.
4044
* @param blitzOptions options for use at generation time and parsing time
4145
* @return the generated parser
4246
* @throws BlitzException if any error is detected while generating the parser
@@ -117,10 +121,12 @@ else if (args[i].startsWith("-"))
117121
break;
118122
}
119123

120-
if (i != args.length - 2)
124+
if (i != args.length - 2 && i != args.length - 1)
121125
usage(1);
122-
String grammar = args[i];
123-
String input = args[i + 1];
126+
String grammar = i == args.length - 1
127+
? ixmlGrammar()
128+
: args[i];
129+
String input = args[args.length - 1];
124130

125131
String grammarString = grammar.startsWith("!")
126132
? grammar.substring(1)
@@ -137,16 +143,23 @@ else if (args[i].startsWith("-"))
137143
}
138144

139145
private static void usage(int exitCode) {
140-
System.err.println("Usage: java -jar markup-blitz.jar [<OPTION>...] <GRAMMAR> <INPUT>");
146+
String resource = Blitz.class.getResource("/" + Blitz.class.getName().replace('.', '/') + ".class").toString();
147+
final String origin = resource.startsWith("jar:")
148+
? "-jar " + resource.replaceFirst("^.*/([^/]+.jar)!.*$", "$1")
149+
: Blitz.class.getName();
150+
151+
System.err.println("Usage: java " + origin + " [<OPTION>...] [<GRAMMAR>] <INPUT>");
141152
System.err.println();
142153
System.err.println(" Compile an Invisible XML grammar, and parse input with the resulting parser.");
143154
System.err.println();
144155
System.err.println(" <GRAMMAR> the grammar (literal, file name or URL), in ixml notation.");
156+
System.err.println(" When omitted, the ixml grammar will be used.");
145157
System.err.println(" <INPUT> the input (literal, file name or URL).");
146158
System.err.println();
147159
System.err.println(" <OPTION>:");
148160
System.err.println(" --indent generate resulting xml with indentation.");
149161
System.err.println(" --trace print parser trace.");
162+
System.err.println(" --fail-on-error throw an exception instead of returning an error document.");
150163
System.err.println(" --timing print timing information.");
151164
System.err.println(" --verbose print intermediate results.");
152165
System.err.println();
@@ -197,8 +210,21 @@ public static URL url(final String input) {
197210
return uri.toURL();
198211
}
199212
catch (Exception e) {
200-
throw new BlitzException("failed to process URL: " + input, e);
213+
throw new BlitzException("Failed to process URL: " + input, e);
201214
}
202215
}
203216

217+
/**
218+
* Return the ixml grammar as a string.
219+
*
220+
* @return the ixml grammar
221+
*/
222+
public static String ixmlGrammar() {
223+
try {
224+
return urlContent(Blitz.class.getClassLoader().getResource(IXML_GRAMMAR_RESOURCE));
225+
}
226+
catch (IOException e) {
227+
throw new BlitzException("Failed to access ixml grammar resource " + IXML_GRAMMAR_RESOURCE, e);
228+
}
229+
}
204230
}

src/main/java/de/bottlecaps/markup/blitz/Parser.java

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -560,27 +560,31 @@ public String parse(String input, Option... options) {
560560
}
561561
}
562562
catch (BlitzIxmlException e) {
563+
if (currentOptions.contains(Option.FAIL_ON_ERROR))
564+
throw e;
563565
Nonterminal ixml = new Nonterminal("ixml", new Symbol[] {new Insertion(e.getMessage().codePoints().toArray())});
564566
ixml.addChildren(attribute("xmlns:ixml", IXML_NAMESPACE));
565567
ixml.addChildren(attribute("ixml:state", "failed"));
566568
ixml.addChildren(attribute("ixml:error-code", e.getError().name()));
567569
b.stack[0] = new Nonterminal("root", new Symbol[] {ixml});
568570
}
569571
catch (BlitzException e) {
572+
if (currentOptions.contains(Option.FAIL_ON_ERROR))
573+
throw e;
570574
Nonterminal ixml = new Nonterminal("ixml", new Symbol[] {new Insertion(e.getMessage().codePoints().toArray())});
571575
ixml.addChildren(attribute("xmlns:ixml", IXML_NAMESPACE));
572576
ixml.addChildren(attribute("ixml:state", "failed"));
573577
b.stack[0] = new Nonterminal("root", new Symbol[] {ixml});
574578
}
579+
finally {
580+
if (currentOptions.contains(Option.TIMING)) {
581+
long t1 = System.currentTimeMillis();
582+
System.err.println(" ixml parsing time: " + (t1 - t0) + " msec");
583+
}
584+
}
575585

576586
b.serialize(s);
577-
String result = w.toString();
578-
579-
if (currentOptions.contains(Option.TIMING)) {
580-
long t1 = System.currentTimeMillis();
581-
System.err.println(" ixml parsing time: " + (t1 - t0) + " msec");
582-
}
583-
return result;
587+
return w.toString();
584588
}
585589

586590
private Nonterminal attribute(String name, String value) {

src/test/resources/ixml.ixml renamed to src/main/resources/de/bottlecaps/markup/blitz/ixml.ixml

File renamed without changes.

src/test/java/de/bottlecaps/markup/BlitzTest.java

Lines changed: 51 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
import java.util.Set;
77

8+
import org.junit.jupiter.api.Assertions;
89
import org.junit.jupiter.api.Test;
910

1011
import de.bottlecaps.markup.Blitz.Option;
@@ -109,8 +110,8 @@ public void testCss() {
109110

110111
@Test
111112
public void testIxml() {
112-
Parser parser = generate(resourceContent("ixml.ixml"), Option.INDENT); // , Option.TIMING);
113-
String xml = parser.parse(resourceContent("ixml.ixml"));
113+
Parser parser = generate(resourceContent(Blitz.IXML_GRAMMAR_RESOURCE), Option.INDENT); // , Option.TIMING);
114+
String xml = parser.parse(resourceContent(Blitz.IXML_GRAMMAR_RESOURCE));
114115
assertEquals(resourceContent("ixml.xml"), xml);
115116
}
116117

@@ -654,6 +655,54 @@ public void testUnicodeVersion() {
654655
result);
655656
}
656657

658+
@Test
659+
public void testIxmlGrammar() {
660+
Parser parser = generate(Blitz.ixmlGrammar());
661+
String result = parser.parse(
662+
"S: 'a'.",
663+
Option.INDENT);
664+
assertEquals(
665+
"<ixml>\n"
666+
+ " <rule name=\"S\">\n"
667+
+ " <alt>\n"
668+
+ " <literal string=\"a\"/>\n"
669+
+ " </alt>\n"
670+
+ " </rule>\n"
671+
+ "</ixml>",
672+
result);
673+
}
674+
675+
@Test
676+
public void testErrorDocument() {
677+
Parser parser = Blitz.generate("S: 'a'.");
678+
String result = parser.parse("b");
679+
assertEquals(
680+
"<ixml xmlns:ixml=\"http://invisiblexml.org/NS\" ixml:state=\"failed\">Failed to parse input:\n"
681+
+ "lexical analysis failed\n"
682+
+ "while expecting 'a'\n"
683+
+ "at line 1, column 1:\n"
684+
+ "...b...</ixml>",
685+
result);
686+
}
687+
688+
@Test
689+
public void testFailOnError() {
690+
Parser parser = Blitz.generate("S: 'a'.", Option.FAIL_ON_ERROR);
691+
try {
692+
String result = parser.parse("b");
693+
Assertions.fail("Parse did not fail, returned: \n" + result);
694+
}
695+
catch (BlitzParseException e) {
696+
assertEquals(
697+
"Failed to parse input:\n"
698+
+ "lexical analysis failed\n"
699+
+ "while expecting 'a'\n"
700+
+ "at line 1, column 1:\n"
701+
+ "...b...",
702+
e.getMessage());
703+
}
704+
}
705+
657706
// @Test
658707
// public void test() {
659708
// Parser parser = generate(

src/test/java/de/bottlecaps/markup/blitz/grammar/IxmlTest.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,11 @@
1212
import org.junit.jupiter.api.Disabled;
1313
import org.junit.jupiter.api.Test;
1414

15+
import de.bottlecaps.markup.Blitz;
1516
import de.bottlecaps.markup.TestBase;
1617

1718
public class IxmlTest extends TestBase {
1819
private static final String invisiblexmlOrgUrl = "https://invisiblexml.org/1.0/ixml.ixml";
19-
private static final String ixmlResource = "ixml.ixml";
2020

2121
private static final String githubJsonIxmlUrl = "https://raw.githubusercontent.com/GuntherRademacher/rex-parser-benchmark/main/src/main/resources/de/bottlecaps/rex/benchmark/json/parsers/xquery/json.ixml";
2222
private static final String jsonIxmlResource = "json.ixml";
@@ -26,7 +26,7 @@ public class IxmlTest extends TestBase {
2626

2727
@BeforeAll
2828
public static void beforeAll() throws URISyntaxException, IOException {
29-
ixmlIxmlResourceContent = resourceContent(ixmlResource);
29+
ixmlIxmlResourceContent = resourceContent(Blitz.IXML_GRAMMAR_RESOURCE);
3030
jsonIxmlResourceContent = resourceContent(jsonIxmlResource);
3131
}
3232

@@ -36,7 +36,7 @@ public void testIxmlResource() throws Exception {
3636
String expectedResult = ixmlIxmlResourceContent
3737
.replaceAll("^\\{[^\n]*\n", "")
3838
.replaceAll("\\. \\{[^\n]*\n", ".\n");
39-
assertEquals(expectedResult, grammar.toString(), "roundtrip failed for " + ixmlResource);
39+
assertEquals(expectedResult, grammar.toString(), "roundtrip failed for " + Blitz.IXML_GRAMMAR_RESOURCE);
4040
}
4141

4242
@Test
@@ -60,7 +60,7 @@ public void testInvisiblexmlOrgUrlContent() throws Exception {
6060
String expectedResult = ixmlIxmlResourceContent
6161
.replaceAll("^\\{[^\n]*\n", "")
6262
.replaceAll("\\. \\{[^\n]*\n", ".\n");
63-
testUrlContent(invisiblexmlOrgUrl, ixmlResource, expectedResult);
63+
testUrlContent(invisiblexmlOrgUrl, Blitz.IXML_GRAMMAR_RESOURCE, expectedResult);
6464
}
6565

6666
private void testUrlContent(String url, String resource, String expectedResult) throws MalformedURLException {

src/test/java/de/bottlecaps/markup/blitz/ixml/IxmlCommunityTest.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
import org.junit.jupiter.params.provider.MethodSource;
4444
import org.junit.jupiter.params.support.AnnotationConsumer;
4545

46+
import de.bottlecaps.markup.Blitz;
4647
import de.bottlecaps.markup.BlitzException;
4748
import de.bottlecaps.markup.TestBase;
4849
import de.bottlecaps.markup.blitz.Parser;
@@ -359,7 +360,7 @@ private void test(TestCase testCase) {
359360
assertNull(input, "unexpected input for grammar test " + testCase.getName());
360361
assertEquals(1, testCase.getOutputs().size(), "expected a single reference output for grammar test");
361362
if (ixmlParser == null) {
362-
String ixmlIxmlResourceContent = resourceContent("ixml.ixml");
363+
String ixmlIxmlResourceContent = resourceContent(Blitz.IXML_GRAMMAR_RESOURCE);
363364
ixmlParser = generate(ixmlIxmlResourceContent);
364365
}
365366
String xmlRepresentation = ixmlParser.parse(testCase.getGrammar());

src/test/java/de/bottlecaps/markup/blitz/transform/TestBNF.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import org.junit.jupiter.api.BeforeAll;
1515
import org.junit.jupiter.api.Test;
1616

17+
import de.bottlecaps.markup.Blitz;
1718
import de.bottlecaps.markup.TestBase;
1819
import de.bottlecaps.markup.blitz.Parser;
1920
import de.bottlecaps.markup.blitz.codepoints.RangeSet;
@@ -30,15 +31,14 @@
3031
import de.bottlecaps.markup.blitz.grammar.Term;
3132

3233
public class TestBNF extends TestBase {
33-
private static final String ixmlResource = "ixml.ixml";
3434
private static final String jsonIxmlResource = "json.ixml";
3535

3636
private static String ixmlIxmlResourceContent;
3737
private static String jsonIxmlResourceContent;
3838

3939
@BeforeAll
4040
public static void beforeAll() throws URISyntaxException, IOException {
41-
ixmlIxmlResourceContent = resourceContent(ixmlResource);
41+
ixmlIxmlResourceContent = resourceContent(Blitz.IXML_GRAMMAR_RESOURCE);
4242
jsonIxmlResourceContent = resourceContent(jsonIxmlResource);
4343
}
4444

0 commit comments

Comments
 (0)