Skip to content

Commit d449d74

Browse files
committed
readme and benchmarks
1 parent d7f456b commit d449d74

File tree

2 files changed

+46
-54
lines changed

2 files changed

+46
-54
lines changed

README.md

Lines changed: 36 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ push!(parent::Node, child::Node)
6060
parent[2] = child
6161
```
6262

63-
- Bring convenience functions into your namespace with `using XML.NodeConstructors`:
63+
- `using XML.NodeConstructors` will give you access to convenience functions (`document`, `cdata`, `element`, etc.) for creating `Node`s.
6464

6565
```julia
6666
using XML.NodeConstructors
@@ -70,27 +70,37 @@ cdata("hello > < ' \" I have odd characters")
7070
# Node CDATA <![CDATA[hello > < ' " I have odd characters]]>
7171
```
7272

73-
### `XML.RowNode`
74-
- A data structure that can used as a *Tables.jl* source. It is only lazy in how it accesses its children.
73+
### `XML.LazyNode`
7574

75+
A lazy data structure that just keeps track of the position in the raw data (`Vector{UInt8}`) to read from.
7676

77-
### `XML.RawData`
78-
- A super lazy data structure that holds the reference `Vector{UInt8}` data along with position/length to read from.
77+
- Iteration in depth first search (DFS) order. This is the natural order in which you would visit XML nodes by reading an XML document from top to bottom.
7978

79+
```julia
80+
doc = LazyNode(filename)
8081

81-
## Reading
82+
foreach(println, doc)
83+
# LazyNode DECLARATION <?xml version="1.0"?>
84+
# LazyNode ELEMENT <catalog>
85+
# LazyNode ELEMENT <book id="bk101">
86+
# LazyNode ELEMENT <author>
87+
# LazyNode TEXT "Gambardella, Matthew"
88+
# LazyNode ELEMENT <title>
89+
#
90+
```
8291

83-
```julia
84-
XML.RawData(filename)
8592

86-
RowNode(filename)
93+
## Reading
8794

95+
```julia
96+
# Reading from file:
8897
Node(filename)
98+
LazyNode(filename)
8999

90-
# Parsing:
91-
parse(XML.RawData, str)
92-
parse(RowNode, str)
100+
# Parsing from string:
93101
parse(Node, str)
102+
parse(LazyNode, str)
103+
94104
```
95105

96106
## Writing
@@ -103,39 +113,22 @@ XML.write(io::IO, node) # write to stream
103113
XML.write(node) # String
104114
```
105115

106-
## Iteration
107116

108-
```julia
109-
doc = XML.RowNode(filename)
110-
111-
foreach(println, doc)
112-
# RowNode DECLARATION <?xml version="1.0">
113-
# RowNode ELEMENT <catalog> (12 children)
114-
# RowNode ELEMENT <book id="bk101"> (6 children)
115-
# RowNode ELEMENT <author> (1 child)
116-
# RowNode TEXT "Gambardella, Matthew"
117-
# RowNode ELEMENT <title> (1 child)
118-
#
119117

120-
# Use as Tables.jl source:
121-
using DataFrames
122118

123-
DataFrame(doc)
124-
```
119+
## Performance
125120

126-
Note that you can also iterate through `XML.RawData`. However, *BEWARE* that this iterator
127-
has some non-node elements (e.g. just the closing tag of an element).
121+
- Comparing benchmarks (fairly) between packages is hard.
122+
- The most fair comparison is between "XML.jl - Node Load" and `XMLDict.jl - read` in which XMLDict is 1.4x slower.
123+
- See the `benchmarks/suite.jl` file.
128124

129-
```julia
130-
data = XML.RawData(filename)
131-
132-
foreach(println, data)
133-
# 1: RAW_DECLARATION (pos=1, len=20): <?xml version="1.0"?>
134-
# 1: RAW_ELEMENT_OPEN (pos=23, len=8): <catalog>
135-
# 2: RAW_ELEMENT_OPEN (pos=36, len=16): <book id="bk101">
136-
# 3: RAW_ELEMENT_OPEN (pos=60, len=7): <author>
137-
# 4: RAW_TEXT (pos=68, len=19): Gambardella, Matthew
138-
# 3: RAW_ELEMENT_CLOSE (pos=88, len=8): </author> <------ !!! NOT A NODE !!!
139-
# 3: RAW_ELEMENT_OPEN (pos=104, len=6): <title>
140-
#
141-
```
125+
| Benchmark | code | median time | median GC |
126+
|-----------|------|-------------|-----------|
127+
| XML.jl - Raw Data load | `XML.Raw($file)` | 10.083 μs | 0.00% |
128+
| XMLjl - LazyNode load | `LazyNode($file)` | 10.250 μs | 0.00% |
129+
| XML.jl - collect LazyNode | `collect(LazyNode($file))` | 102.149 ms | 24.51% GC |
130+
| XML.jl - Node load | `Node($file)` | 1.085 s | 16.16% |
131+
| EzXML.jl - read | `EzXML.readxml($file) | 192.345 ms | N/A |
132+
| XMLDict.jl - read | `XMLDict.xml_dict(read($file, String))` | 1.525 s | GC 23.17%
133+
| XML.jl LazyNode iteration | `for x in XML.LazyNode($file); end` | 67.547 ms | 16.55% GC
134+
| EzXML.StreamReader | `r = open(EzXML.StreamReader, $file); for x in r; end; close(r))` | 142.340 ms | N/A

benchmarks/suite.jl

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,18 @@ using XMLDict: XMLDict
44
using BenchmarkTools
55

66

7+
# nasa.xml was downloaded from:
78
# http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/www/repository.html#nasa
89
file = joinpath(@__DIR__, "nasa.xml")
910

1011
#-----------------------------------------------------------------------------# Read
11-
@info "XML.Raw" @benchmark XML.Raw($file)
12-
@info "XML.LazyNode" @benchmark XML.LazyNode($file)
13-
# @info "XML.Node" @benchmark Node($file)
14-
# @info "XML.RowNode" @benchmark XML.RowNode($file)
15-
# @info "EzXML.readxml" @benchmark EzXML.readxml($file)
16-
# @info "XMLDict.xml_dict" @benchmark XMLDict.xml_dict(read($file, String))
12+
@info "XML.Raw" @benchmark XML.Raw($file) # median: 10.083 μs (0.00% GC)
13+
@info "XML.LazyNode" @benchmark XML.LazyNode($file) # median: 10.250 μs (0.00% GC)
14+
@info "collect(XML.LazyNode)" @benchmark collect(XML.LazyNode($file)) # median 102.149 ms (24.51% GC)
15+
@info "XML.Node" @benchmark Node($file) # median: 1.085 s (16.16% GC)
16+
@info "EzXML.readxml" @benchmark EzXML.readxml($file) # median: 192.345 ms
17+
@info "XMLDict.xml_dict" @benchmark XMLDict.xml_dict(read($file, String)) # median: 1.525 s (GC 23.17%)
1718

18-
# #-----------------------------------------------------------------------------# Iteration
19-
# @info "XML.RawData iteration" @benchmark (for x in XML.RawData($file); end)
20-
# @info "XML.RowNode iteration" @benchmark (for x in XML.RowNode($file); end)
21-
22-
# @info "EzXML.StreamReader" @benchmark (reader = open(EzXML.StreamReader, $file); for x in reader; end; close(reader))
19+
#-----------------------------------------------------------------------------# Iteration
20+
@info "XML.LazyNode iteration" @benchmark (for x in XML.LazyNode($file); end) # 67.547 ms (16.55% GC)
21+
@info "EzXML.StreamReader" @benchmark (reader = open(EzXML.StreamReader, $file); for x in reader; end; close(reader)) # median 142.340 ms

0 commit comments

Comments
 (0)