-
Notifications
You must be signed in to change notification settings - Fork 83
##Overview
XML is markup language used to encode data into a document. It is both human-readable, and machine-readable. The benefit of using this markup language, is that there are no predefined tags. The author of a given XML document may create any tags to conform to any arbitrary structure that is logically needed.
###Sample Document
<?xml version='1.0'?>
<!-- Sample Dataset-->
<dataset>
<observation>
<dependent-variable>James Blonde</dependent-variable>
<independent-variable>
<label>SSN</label>
<value>0034773019</value>
</independent-variable>
<independent-variable>
<label>Salary</label>
<value>88500</value>
</independent-variable>
</observation>
<observation>
<dependent-variable>Boston Powers</dependent-variable>
<independent-variable>
<label>SSN</label>
<value>007000007</value>
</independent-variable>
<independent-variable>
<label>Salary</label>
<value>88500</value>
</independent-variable>
</observation>
...
</dataset>###XML Declaration
An XML document may begin with an optional declaration. If one is used, it is important to remember that nothing may preceed the declaration, not even whitespace, or comments.
Generally, an xml declaration is as follows:
<?xml version='1.0'?>where the version attribute, indicates the xml version being used. Another optional attribute may be defined in the same declaration. Specifically, the encoding attribute indicates the encoding standard being used in the xml document:
<?xml version='1.0' encoding='UTF-8'?>By default, xml standard states that all XML software must understand both UTF-8, and UTF-16. When this attribute is not defined, the xml document defaults to UTF-8.
Note: an XML declaration is case sensitive, and cannot begin as <?XML ..?>.
###XML Document:
An XML document is syntactically similar to HTML, except the latter was designed to display data (presentation). XML on the otherhand, was designed to describe data, with a focus on what the data means. Both markup languages adhere to very similar syntax.
XML syntax requirements:
- An XML document must have exactly one root element (see above
<dataset>) - The root element encapsulates all other elements
- An XML element is case sensitive
- Every XML element, with an opening tag, must have a corresponding closing tag
- A closing tag, must contain a slash (i.e.
</xxx>). - XML elements may be nested
####DTD Validation
Document type definition (DTD), define the follow properties:
- what elements are allowed in the xml document
- what attributes each element is allowed to have
- the ordering, and nesting of these elements
DTD's are declared within the DOCTYPE element, under the xml declaration.
The following is an example of an inline definition:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE documentelement [definition]>while, the following is an example of an external definition:
<?xml version="1.0"?>
<!DOCTYPE documentelement SYSTEM "https://localhost/schema.dtd">Both options can either expand definition (code below inside the square brackets), or define schema.dtd as follows:
<!ELEMENT dataset (observation+)>
<!ELEMENT observation (dependent-variable,independent-variable+)>
<!ELEMENT dependent-variable (#CDATA)>
<!ELEMENT independent-variable (label,value)>
<!ELEMENT label (#CDATA)>
<!ELEMENT value (#CDATA)>The above DTD defines the following structure:
- a
datasetcontains at least oneobservation - an
observationcontains onedependent-variable, and at least oneindependent-variable - a
dependent-variablecontainsCDATAtext - an
independent-variablecontains alabel, and avalue - both
label, andvaluecontainsCDATAtext (character data not supposed to be parsed by a parser)
Note: CDATA can be replaced with PCDATA, which means the corresponding text, will be parsed by a parser. A third alternative is ANY, which means an element may contain any content.
Note: if observation+ was replaced with observation*, then there would be 0, or more observations.