Skip to content

Commit 9ab8a90

Browse files
committed
First commit
1 parent 84e532b commit 9ab8a90

File tree

11 files changed

+1265
-3
lines changed

11 files changed

+1265
-3
lines changed

.gitignore

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
11
.buildpath
2-
.project
2+
.project
3+
/vendor/
4+
/composer.lock
5+
/composer.phar
6+
/tests/report/

README.md

Lines changed: 236 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,236 @@
1-
# XMLReaderReg
2-
Extension of PHP's XMLReader to include simplified interface
1+
### XMLReaderReg
2+
An extension of PHP's XMLReader to include a simplified interface.
3+
4+
#Quick Start
5+
6+
Rather than having to use boiler plate code to fetch particular elements, XMLReaderReg allows you to register an interest in certain elements along with a callback. This effectively changes it from a pull parser to a push parser.
7+
8+
```php
9+
require_once __DIR__ . '/../XMLReaderReg.php';
10+
11+
$inputFile = __DIR__ ."/../tests/data/simpleTest1.xml";
12+
$reader = new XMLReaderReg();
13+
$reader->open($inputFile);
14+
15+
$reader->process([
16+
// Find all person elements
17+
'(.*/person(?:\[\d*\])?)' => function (SimpleXMLElement $data, $path): void {
18+
echo "1) Value for ".$path." is ".PHP_EOL.
19+
$data->asXML().PHP_EOL;
20+
},
21+
// Find the corresponding element in the hierarchy
22+
'/root/person2/firstname' => function (string $data): void {
23+
echo "3) Value for /root/person2/firstname is ". $data.PHP_EOL;
24+
}
25+
]);
26+
27+
$reader->close();
28+
```
29+
30+
The main addition is the `process()` method, rather than looping through the document structure, `process()` is passed an array of regex and associated callback elements. When a particular XML element matches the pattern you are interested in the callback will be passed the data from that element.
31+
32+
#How the document hierarchy is encoded
33+
As the document is loaded, the code builds a simple document hierarchy based on the nesting of the elements. So...
34+
35+
```xml
36+
<person>
37+
<firstname>John</firstname>
38+
<lastname>Doe</lastname>
39+
</person>
40+
```
41+
will produce the following tree...
42+
43+
```
44+
/person
45+
/person/firstname
46+
/person/lastname
47+
```
48+
49+
To allow for multiple elements, this is slightly modified to keep track of the number of elements...
50+
51+
```xml
52+
<root>
53+
<firstname>John</firstname>
54+
<firstname>Fred</firstname>
55+
</root>
56+
```
57+
will produce the following tree...
58+
59+
```
60+
/root
61+
/root/firstname
62+
/root/firstname[1]
63+
```
64+
Note that the first instance doesn't get a suffix (as it doesn't yet know there is any more elements of this name) and they start at 1 when added.
65+
66+
The array elements are remembered at any particular level of nesting, so
67+
68+
```xml
69+
<root>
70+
<firstname>John</firstname>
71+
<lastname>Doe</lastname>
72+
<firstname>Fred</firstname>
73+
</root>
74+
```
75+
will produce the following tree...
76+
77+
```
78+
/root
79+
/root/firstname
80+
/root/lastname
81+
/root/firstname[1]
82+
```
83+
84+
#Regex matching
85+
The matching process is as simple as working out where the data you want lies in the document. You can be as explicit or as vague as you wish using regex's ability to match the content of the above hierarchy.
86+
87+
From the quick start sample code...
88+
89+
```
90+
/root/person2/firstname
91+
```
92+
directly matches an element in the hierarchy, whereas
93+
94+
```
95+
.*/person(?:\[\d*\])?
96+
```
97+
will find any `<person>` element and allow an optional suffix for use when multiple elements are present.
98+
99+
Also something useful in regex's is capture groups, notice that this last regex is actually `(.*/person(?:\[\d*\])?)` in the code. The capture groups will be passed to the callback.
100+
101+
#The callback function
102+
The basic callback function definition is
103+
104+
```php
105+
function (mixed $data[, mixed $path]): void {}
106+
```
107+
108+
**data**
109+
110+
The data content of the matching element. This can be type hinted to a `string`, `SimpleXMLElement` or `DOMElement`.
111+
112+
In this callback,
113+
114+
```php
115+
function ( $data ) {}
116+
```
117+
118+
as there is no typehint for the callback value, it will be passed the results of [readInnerXml()](https://www.php.net/manual/en/xmlreader.readinnerxml.php) which is a string containing just the contents of the XML element.
119+
120+
There are a couple of alternatives which are more specific...
121+
122+
```php
123+
// same as above, just with a type hint
124+
function ( string $data ) {}
125+
126+
// The element is passed as a SimpleXMLelement
127+
function ( \SimpleXMLElement $data ) {}
128+
129+
// The element is passed as a DOMElement
130+
function ( \DOMElement $data ) {}
131+
```
132+
the last 2 allow you to fetch the content in a more accessible format if you need to do any further processing.
133+
134+
For `DOMElement` the equivalent of `$reader->importNode($reader->expand(), true)` is passed.
135+
136+
For `SimpleXMLElement` the equivalent of `simplexml_import_dom(importNode($reader->expand(), true))` is passed.
137+
138+
**path**
139+
140+
The capture group(s) from the regex.
141+
142+
If you don't use capture groups, you can omit the `$path` parameter. If you do use capture groups, then it will pass an array which is the return value of `$matches` from [preg_match()](https://www.php.net/manual/en/function.preg-match.php) which is used internally to check the path against the regex patterns.
143+
144+
##Options
145+
146+
#DOM Document owner
147+
148+
```php
149+
public function setDocument ( DOMDocument $doc ): void;
150+
```
151+
When using DOMDocument, the owner of a created node can be important. If you want to control this, then create your own instance of DOMDocument and pass that to this call. Any subsequently generated nodes passed to callbacks will be owned by this document.
152+
153+
If this is not called, all nodes will be owned by an internally created document.
154+
155+
#Namespace usage - Matching
156+
157+
```php
158+
public function setUseNamespaces ( bool $useNamespaces ): void;
159+
```
160+
Flag to indicate if the path is built with namespaces or not. By default, this flag is set to `true` and will use namespaces where defined in the document.
161+
162+
With
163+
164+
```xml
165+
<a:root xmlns:a="http://someurl.com">
166+
<a:person>
167+
...
168+
```
169+
set to `true`, it will generate a path hierarchy of
170+
171+
```
172+
/a:root
173+
/a:root/a:person
174+
```
175+
set to `false`, it will generate a path hierarchy of
176+
177+
```
178+
/root
179+
/root/person
180+
```
181+
#Namespace usage - Output
182+
183+
```php
184+
public function setOutputNamespaces ( bool $outputNamespace ) : void;
185+
```
186+
If you don't need/want the namespaces in the output, calling this with `false` will remove all namespaces from the output. This includes the definition and any namespaces prefixes from the nodes.
187+
188+
Due to the processing this will incur an overhead.
189+
190+
#Configuring array notation
191+
By default array notation is turned off, this will present duplicated elements as
192+
193+
```
194+
/root
195+
/root/firstname
196+
/root/firstname
197+
```
198+
This removes the need to include a regex to match the optional array index in (for example) `(.*/person(?:\[\d*\])?)` and just use `(.*/person)` to retrieve every `<person>` element.
199+
200+
In some cases you may not need to know which instance of an element is being processed, this allows you to extract a specific instance or simply to know from the path what instance is being processed.
201+
202+
```php
203+
public function setArrayNotation ( bool $arrayNotation ): void;
204+
```
205+
Calling this with `false` will stop the generation of array indicies when matching is done. So from the above example the path will look like the following...
206+
207+
```
208+
/root
209+
/root/firstname
210+
/root/firstname[1]
211+
```
212+
#Stop the processing
213+
214+
```php
215+
public function flagToStop () : void;
216+
```
217+
During a callback, you may decide that you do not need to process any more of the content, this method will flag the `process()` method to stop at the next iteration.
218+
219+
This can be done something like...
220+
221+
```PHP
222+
function (DOMElement $data, $path)
223+
use ($reader): void {
224+
// process $data
225+
$reader->flagToStop();
226+
}
227+
```
228+
##Examples
229+
examples/XMLReaderBasic.php has a brief set of examples on how to use XMLReaderReg
230+
231+
##Tests
232+
tests/XMLReaderRegTest.php is a PHPUnit test set for XMLReaderReg.
233+
234+
Please note that `testFetchLargeFullRead` reads a 25MB XML file so will take some time to complete.
235+
##License
236+
Please see the LICENSE file.

0 commit comments

Comments
 (0)