|
1 | | -# XMLReaderReg |
2 | | -Extension of PHP's XMLReader to include simplified interface |
| 1 | +### XMLReaderReg |
| 2 | +An extension of PHP's XMLReader to include a simplified interface. |
| 3 | + |
| 4 | +#Quick Start |
| 5 | + |
| 6 | +Rather than having to use boiler plate code to fetch particular elements, XMLReaderReg allows you to register an interest in certain elements along with a callback. This effectively changes it from a pull parser to a push parser. |
| 7 | + |
| 8 | +```php |
| 9 | +require_once __DIR__ . '/../XMLReaderReg.php'; |
| 10 | + |
| 11 | +$inputFile = __DIR__ ."/../tests/data/simpleTest1.xml"; |
| 12 | +$reader = new XMLReaderReg(); |
| 13 | +$reader->open($inputFile); |
| 14 | + |
| 15 | +$reader->process([ |
| 16 | + // Find all person elements |
| 17 | + '(.*/person(?:\[\d*\])?)' => function (SimpleXMLElement $data, $path): void { |
| 18 | + echo "1) Value for ".$path." is ".PHP_EOL. |
| 19 | + $data->asXML().PHP_EOL; |
| 20 | + }, |
| 21 | + // Find the corresponding element in the hierarchy |
| 22 | + '/root/person2/firstname' => function (string $data): void { |
| 23 | + echo "3) Value for /root/person2/firstname is ". $data.PHP_EOL; |
| 24 | + } |
| 25 | + ]); |
| 26 | + |
| 27 | +$reader->close(); |
| 28 | +``` |
| 29 | + |
| 30 | +The main addition is the `process()` method, rather than looping through the document structure, `process()` is passed an array of regex and associated callback elements. When a particular XML element matches the pattern you are interested in the callback will be passed the data from that element. |
| 31 | + |
| 32 | +#How the document hierarchy is encoded |
| 33 | +As the document is loaded, the code builds a simple document hierarchy based on the nesting of the elements. So... |
| 34 | + |
| 35 | +```xml |
| 36 | +<person> |
| 37 | + <firstname>John</firstname> |
| 38 | + <lastname>Doe</lastname> |
| 39 | +</person> |
| 40 | +``` |
| 41 | +will produce the following tree... |
| 42 | + |
| 43 | +``` |
| 44 | +/person |
| 45 | +/person/firstname |
| 46 | +/person/lastname |
| 47 | +``` |
| 48 | + |
| 49 | +To allow for multiple elements, this is slightly modified to keep track of the number of elements... |
| 50 | + |
| 51 | +```xml |
| 52 | +<root> |
| 53 | + <firstname>John</firstname> |
| 54 | + <firstname>Fred</firstname> |
| 55 | +</root> |
| 56 | +``` |
| 57 | +will produce the following tree... |
| 58 | + |
| 59 | +``` |
| 60 | +/root |
| 61 | +/root/firstname |
| 62 | +/root/firstname[1] |
| 63 | +``` |
| 64 | +Note that the first instance doesn't get a suffix (as it doesn't yet know there is any more elements of this name) and they start at 1 when added. |
| 65 | + |
| 66 | +The array elements are remembered at any particular level of nesting, so |
| 67 | + |
| 68 | +```xml |
| 69 | +<root> |
| 70 | + <firstname>John</firstname> |
| 71 | + <lastname>Doe</lastname> |
| 72 | + <firstname>Fred</firstname> |
| 73 | +</root> |
| 74 | +``` |
| 75 | +will produce the following tree... |
| 76 | + |
| 77 | +``` |
| 78 | +/root |
| 79 | +/root/firstname |
| 80 | +/root/lastname |
| 81 | +/root/firstname[1] |
| 82 | +``` |
| 83 | + |
| 84 | +#Regex matching |
| 85 | +The matching process is as simple as working out where the data you want lies in the document. You can be as explicit or as vague as you wish using regex's ability to match the content of the above hierarchy. |
| 86 | + |
| 87 | +From the quick start sample code... |
| 88 | + |
| 89 | +``` |
| 90 | +/root/person2/firstname |
| 91 | +``` |
| 92 | +directly matches an element in the hierarchy, whereas |
| 93 | + |
| 94 | +``` |
| 95 | +.*/person(?:\[\d*\])? |
| 96 | +``` |
| 97 | +will find any `<person>` element and allow an optional suffix for use when multiple elements are present. |
| 98 | + |
| 99 | +Also something useful in regex's is capture groups, notice that this last regex is actually `(.*/person(?:\[\d*\])?)` in the code. The capture groups will be passed to the callback. |
| 100 | + |
| 101 | +#The callback function |
| 102 | +The basic callback function definition is |
| 103 | + |
| 104 | +```php |
| 105 | +function (mixed $data[, mixed $path]): void {} |
| 106 | +``` |
| 107 | + |
| 108 | +**data** |
| 109 | + |
| 110 | +The data content of the matching element. This can be type hinted to a `string`, `SimpleXMLElement` or `DOMElement`. |
| 111 | + |
| 112 | +In this callback, |
| 113 | + |
| 114 | +```php |
| 115 | +function ( $data ) {} |
| 116 | +``` |
| 117 | + |
| 118 | +as there is no typehint for the callback value, it will be passed the results of [readInnerXml()](https://www.php.net/manual/en/xmlreader.readinnerxml.php) which is a string containing just the contents of the XML element. |
| 119 | + |
| 120 | +There are a couple of alternatives which are more specific... |
| 121 | + |
| 122 | +```php |
| 123 | +// same as above, just with a type hint |
| 124 | +function ( string $data ) {} |
| 125 | + |
| 126 | +// The element is passed as a SimpleXMLelement |
| 127 | +function ( \SimpleXMLElement $data ) {} |
| 128 | + |
| 129 | +// The element is passed as a DOMElement |
| 130 | +function ( \DOMElement $data ) {} |
| 131 | +``` |
| 132 | +the last 2 allow you to fetch the content in a more accessible format if you need to do any further processing. |
| 133 | + |
| 134 | +For `DOMElement` the equivalent of `$reader->importNode($reader->expand(), true)` is passed. |
| 135 | + |
| 136 | +For `SimpleXMLElement` the equivalent of `simplexml_import_dom(importNode($reader->expand(), true))` is passed. |
| 137 | + |
| 138 | +**path** |
| 139 | + |
| 140 | +The capture group(s) from the regex. |
| 141 | + |
| 142 | +If you don't use capture groups, you can omit the `$path` parameter. If you do use capture groups, then it will pass an array which is the return value of `$matches` from [preg_match()](https://www.php.net/manual/en/function.preg-match.php) which is used internally to check the path against the regex patterns. |
| 143 | + |
| 144 | +##Options |
| 145 | + |
| 146 | +#DOM Document owner |
| 147 | + |
| 148 | +```php |
| 149 | +public function setDocument ( DOMDocument $doc ): void; |
| 150 | +``` |
| 151 | +When using DOMDocument, the owner of a created node can be important. If you want to control this, then create your own instance of DOMDocument and pass that to this call. Any subsequently generated nodes passed to callbacks will be owned by this document. |
| 152 | + |
| 153 | +If this is not called, all nodes will be owned by an internally created document. |
| 154 | + |
| 155 | +#Namespace usage - Matching |
| 156 | + |
| 157 | +```php |
| 158 | +public function setUseNamespaces ( bool $useNamespaces ): void; |
| 159 | +``` |
| 160 | +Flag to indicate if the path is built with namespaces or not. By default, this flag is set to `true` and will use namespaces where defined in the document. |
| 161 | + |
| 162 | +With |
| 163 | + |
| 164 | +```xml |
| 165 | +<a:root xmlns:a="http://someurl.com"> |
| 166 | + <a:person> |
| 167 | + ... |
| 168 | +``` |
| 169 | +set to `true`, it will generate a path hierarchy of |
| 170 | + |
| 171 | +``` |
| 172 | +/a:root |
| 173 | +/a:root/a:person |
| 174 | +``` |
| 175 | +set to `false`, it will generate a path hierarchy of |
| 176 | + |
| 177 | +``` |
| 178 | +/root |
| 179 | +/root/person |
| 180 | +``` |
| 181 | +#Namespace usage - Output |
| 182 | + |
| 183 | +```php |
| 184 | +public function setOutputNamespaces ( bool $outputNamespace ) : void; |
| 185 | +``` |
| 186 | +If you don't need/want the namespaces in the output, calling this with `false` will remove all namespaces from the output. This includes the definition and any namespaces prefixes from the nodes. |
| 187 | + |
| 188 | +Due to the processing this will incur an overhead. |
| 189 | + |
| 190 | +#Configuring array notation |
| 191 | +By default array notation is turned off, this will present duplicated elements as |
| 192 | + |
| 193 | +``` |
| 194 | +/root |
| 195 | +/root/firstname |
| 196 | +/root/firstname |
| 197 | +``` |
| 198 | +This removes the need to include a regex to match the optional array index in (for example) `(.*/person(?:\[\d*\])?)` and just use `(.*/person)` to retrieve every `<person>` element. |
| 199 | + |
| 200 | +In some cases you may not need to know which instance of an element is being processed, this allows you to extract a specific instance or simply to know from the path what instance is being processed. |
| 201 | + |
| 202 | +```php |
| 203 | +public function setArrayNotation ( bool $arrayNotation ): void; |
| 204 | +``` |
| 205 | +Calling this with `false` will stop the generation of array indicies when matching is done. So from the above example the path will look like the following... |
| 206 | + |
| 207 | +``` |
| 208 | +/root |
| 209 | +/root/firstname |
| 210 | +/root/firstname[1] |
| 211 | +``` |
| 212 | +#Stop the processing |
| 213 | + |
| 214 | +```php |
| 215 | +public function flagToStop () : void; |
| 216 | +``` |
| 217 | +During a callback, you may decide that you do not need to process any more of the content, this method will flag the `process()` method to stop at the next iteration. |
| 218 | + |
| 219 | +This can be done something like... |
| 220 | + |
| 221 | +```PHP |
| 222 | +function (DOMElement $data, $path) |
| 223 | + use ($reader): void { |
| 224 | + // process $data |
| 225 | + $reader->flagToStop(); |
| 226 | +} |
| 227 | +``` |
| 228 | +##Examples |
| 229 | +examples/XMLReaderBasic.php has a brief set of examples on how to use XMLReaderReg |
| 230 | + |
| 231 | +##Tests |
| 232 | +tests/XMLReaderRegTest.php is a PHPUnit test set for XMLReaderReg. |
| 233 | + |
| 234 | +Please note that `testFetchLargeFullRead` reads a 25MB XML file so will take some time to complete. |
| 235 | +##License |
| 236 | +Please see the LICENSE file. |
0 commit comments