You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: csv-schema-1.1.html
+27-15Lines changed: 27 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -177,7 +177,8 @@
177
177
This document represents the specification of the CSV Schema Language 1.1
178
178
as defined by <ahref="http://www.nationalarchives.gov.uk">The National Archives</a>.
179
179
It is unclear yet whether this document will be submitted to a formal standards body
180
-
such as the <ahref="http://w3.org">W3C</a>.
180
+
such as the <ahref="http://w3.org">W3C</a>.
181
+
This version supersedes the original <ahref="http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html">CSV Schema Language 1.0</a> published on 28 August 2014.
181
182
</section>
182
183
<sectionid='abstract'>
183
184
<acronymtitle="Comma Separated Value">CSV</acronym> (Comma Separated Value) data comes in many shapes and sizes. Apart from [[RFC4180]] which is a fairly recent development (and often ignored),
@@ -303,9 +304,9 @@ <h1>Basics</h1>
303
304
<sectionid="new-in-1.1" class="informative">
304
305
<h1>New in CSV Schema Language 1.1 - A brief introduction to the new features of CSV Schema Language 1.1</h1>
305
306
<p>
306
-
The last 18 months with CSV Schema being in regular use at The National Archives has highlighted a few additional
307
-
<atitle="Column Validation Expression">Column Validation Expressions</a> that would provide further useful validation, simplify schema writing, or make schemas more readable.
308
-
In addition the concept of a <a>String Provider</a> has been extended to allow concatenation
307
+
The last 18 months with <ahref="http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html">CSV Schema Language 1.0</a>being in regular use at The National Archives
308
+
has highlighted a few additional <atitle="Column Validation Expression">Column Validation Expressions</a> that would provide further useful validation,
309
+
simplify schema writing, or make schemas more readable. In addition the concept of a <a>String Provider</a> has been extended to allow concatenation
309
310
to produce a final string input to expressions from some set of other <atitle="String Provider">String Providers</a>, and also a function to allow removal of a Windows file
310
311
extension to make certain comparisons more straightforward and robust.
then <code>Mr</code> would be regarded as invalid (strictly speaking that would also require the use of an <a>Explicit Context Expression</a> to refer to the other column,
830
831
but that is a subexpression of the Non Conditional Expression class).
831
832
</p>
832
-
<p><strong>NOTE</strong> To increase control over expression applicability and to avoiding creating a <ahref="https://en.wikipedia.org/wiki/Left_recursion">left-recursive</a> grammar (which could lead to problems for various parser implementations),
833
+
<p><strong>NOTE</strong> To increase control over expression applicability and to avoid creating a <ahref="https://en.wikipedia.org/wiki/Left_recursion">left-recursive</a> grammar (which could lead to problems for various parser implementations),
833
834
<atitle="Column Validation Expression">Column Validation Expressions</a> have been further split into <atitle="Combinatorial Expression">Combinatorial Expressions</a> and <atitle="Non Combinatorial Expression">Non Combinatorial Expressions</a>.</p>
834
835
<tableclass="ebnf-table">
835
836
<tr>
@@ -1194,9 +1195,12 @@ <h4>URI Expressions</h4>
1194
1195
<h4>XSD Date Time Expressions</h4>
1195
1196
<p>
1196
1197
An <dfn>XSD Date Time Expression</dfn> checks that the data in the column is expressed as a valid XML Schema dateTime data type (see [[!XMLSCHEMA-2]] and [[!ISO8601]]).
1197
-
You can also provide an OPTIONAL <em>from</em> and <em>to</em> date-times
1198
+
You can also provide an OPTIONAL <em>from</em> and <em>to</em> date-times
1198
1199
(inclusive) to ensure that the value in the column falls within an expected date-time range.
1199
1200
</p>
1201
+
<p>
1202
+
As the XSD Date Time Expression uses the <a>XSD Date Time Literal</a> the final, timezone, part of the [[!ISO8601]] definition is OPTIONAL.
Consider a file that is expected to contain data relating to one particular batch of a process,
1427
1431
and should also be the data relating to one particular day. Over time we will receive many such files,
1428
-
so we don't want to amend the schema each day to say what the valid date for the file is, or what the appropriate batch_code is.
1432
+
so we do not want to amend the schema each day to say what the valid date for the file is, or what the appropriate batch_code is.
1429
1433
Instead we give a generic rule for the field content: it's an <a>xDate Expression</a> in the case of batch_date;
1430
1434
or that the batch_code will comprise three to five uppercase letters, followed by an uppercase B, followed by three digits;
1431
1435
and for each row in the file every date must be identical and every batch_code must be identical.
@@ -1560,7 +1564,7 @@ <h3>Input parameters used in Single Expressions and External Single Expressions<
1560
1564
<p>
1561
1565
Many <atitle="Single Expression">Single Expressions</a> and <atitle="External Single Expression">External Single Expressions</a> take a <a>String Provider</a>
1562
1566
as an input. A <dfn>String Provider</dfn> takes the form of either a <a>Column Reference</a>, a <a>String Literal</a>, <a>Concatenation Expression</a>
1563
-
or a <a>No Extension Expression</a>.
1567
+
or a <a>No Extension Argument Provider</a>.
1564
1568
</p>
1565
1569
<p>
1566
1570
A <dfn>Column Reference</dfn> comprises a <code>dollar sign ($)</code>, i.e. the [[UTF-8]] character code <code>0x24</code>,
@@ -1689,6 +1693,13 @@ <h4>Switch Case Expression</h4>
1689
1693
</section>
1690
1694
<section>
1691
1695
<h2>Column Expression examples</h2>
1696
+
<p>
1697
+
Example usage for a range of <atitle="Column Expression">Column Expressions</a> is given below.
1698
+
A greater range of example and other schemas actually used by The National Archives
1699
+
<ahref="https://github.com/digital-preservation/csv-schema/tree/master/example-schemas">can be found on GitHub</a>.
1700
+
Most of these are extensively commented in order to explain usage. There is a also a set of example files to be downloaded which allow
1701
+
<atitle="File Exists Expression">File Exists Expressions</a> and <atitle="Checksum Expression">Checksum Expressions and path substitutions to be more easily understood.
/*If "image_split" field is string: yes (precisely)
1717
1728
then timestamp for image split, compliant with XSD DateTime data type
1718
1729
and in range 4 December 2013 - 4 March 2014 (from the midnight starting 4 December,
1719
-
to last second of 4 March), else it must be blank (ie "image_split" is no).
1730
+
to last second of 4 March), and in the UTC (Greenwich Meantime) timezone,
1731
+
else it must be blank (ie "image_split" is no).
1720
1732
</pre>
1721
1733
</section>
1722
1734
</section>
@@ -1966,11 +1978,11 @@ <h2>Identifiers</h2>
1966
1978
<h1>Errors and Warnings</h1>
1967
1979
<p>
1968
1980
An implementation MUST first check that the provided CSV Schema(s) are syntactically correct. If not, a <a>Schema Error</a> is produced,
1969
-
and no further validation SHOULD of the CSV Schema(s) or provided CSV files(s) should be undertaken. If the schema check is successful then an implementation
1981
+
and no further validation of the CSV Schema(s) or provided CSV files(s) SHOULD be undertaken. If the schema check is successful then an implementation
1970
1982
MAY continue with further CSV Schema(s) and CSV file validation.</p>
1971
1983
<p>If an implementation performs validation of a CSV file against a CSV Schema, a report SHOULD be produced for each <a>Column Validation Expression</a>
1972
-
that fails validation; This is generally considered a <a>Validation Error</a>, unless the <a>Warning Directive</a> has been used reduce the severity of an error within a specific
1973
-
<atitle="Column Rules">Column Rule</a> to a <a>Validation Warning</a>.
1984
+
that fails validation; This is generally considered a <a>Validation Error</a>,
1985
+
unless the <a>Warning Directive</a> has been used to reduce the severity of an error within a specific <atitle="Column Rules">Column Rule</a> to a <a>Validation Warning</a>.
0 commit comments