Skip to content

Commit d9d96f1

Browse files
author
DavidUnderdown
committed
fixed syntax error in schema, had - instead of , in regex
1 parent e3af182 commit d9d96f1

File tree

2 files changed

+28
-16
lines changed

2 files changed

+28
-16
lines changed

csv-schema-1.1.html

Lines changed: 27 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,8 @@
177177
This document represents the specification of the CSV Schema Language 1.1
178178
as defined by <a href="http://www.nationalarchives.gov.uk">The National Archives</a>.
179179
It is unclear yet whether this document will be submitted to a formal standards body
180-
such as the <a href="http://w3.org">W3C</a>.
180+
such as the <a href="http://w3.org">W3C</a>.
181+
This version supersedes the original <a href="http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html">CSV Schema Language 1.0</a> published on 28 August 2014.
181182
</section>
182183
<section id='abstract'>
183184
<acronym title="Comma Separated Value">CSV</acronym> (Comma Separated Value) data comes in many shapes and sizes. Apart from [[RFC4180]] which is a fairly recent development (and often ignored),
@@ -303,9 +304,9 @@ <h1>Basics</h1>
303304
<section id="new-in-1.1" class="informative">
304305
<h1>New in CSV Schema Language 1.1 - A brief introduction to the new features of CSV Schema Language 1.1</h1>
305306
<p>
306-
The last 18 months with CSV Schema being in regular use at The National Archives has highlighted a few additional
307-
<a title="Column Validation Expression">Column Validation Expressions</a> that would provide further useful validation, simplify schema writing, or make schemas more readable.
308-
In addition the concept of a <a>String Provider</a> has been extended to allow concatenation
307+
The last 18 months with <a href="http://digital-preservation.github.io/csv-schema/csv-schema-1.1.html">CSV Schema Language 1.0</a> being in regular use at The National Archives
308+
has highlighted a few additional <a title="Column Validation Expression">Column Validation Expressions</a> that would provide further useful validation,
309+
simplify schema writing, or make schemas more readable. In addition the concept of a <a>String Provider</a> has been extended to allow concatenation
309310
to produce a final string input to expressions from some set of other <a title="String Provider">String Providers</a>, and also a function to allow removal of a Windows file
310311
extension to make certain comparisons more straightforward and robust.
311312
</p>
@@ -723,7 +724,7 @@ <h5>Column Directives</h5>
723724
<td class="ebnf-num">[23]</td>
724725
<td class="ebnf-left"><a title="ebnf-column-directives"><dfn>ColumnDirectives</dfn></a></td>
725726
<td class="ebnf-bind">::=</td>
726-
<td class="ebnf-right"><a>OptionalDirective</a>? <a>MatchIsFalseDirective</a>? <a>IgnoreCaseDirective</a>? <a>WarningDirective</a>?&#160;&#160;&#160;&#160;/* <a>xgc:unordered</a> */</td>
727+
<td class="ebnf-right"><a>OptionalDirective</a>? <a>MatchIsFalseDirective</a>? <a>IgnoreCaseDirective</a>? <a>WarningDirective</a></td>
727728
<td class="ebnf-note">/* <a>xgc:unordered</a> */</td>
728729
</tr>
729730
</table>
@@ -829,7 +830,7 @@ <h1>Column Validation Expressions</h1>
829830
then <code>Mr</code> would be regarded as invalid (strictly speaking that would also require the use of an <a>Explicit Context Expression</a> to refer to the other column,
830831
but that is a subexpression of the Non Conditional Expression class).
831832
</p>
832-
<p><strong>NOTE</strong> To increase control over expression applicability and to avoiding creating a <a href="https://en.wikipedia.org/wiki/Left_recursion">left-recursive</a> grammar (which could lead to problems for various parser implementations),
833+
<p><strong>NOTE</strong> To increase control over expression applicability and to avoid creating a <a href="https://en.wikipedia.org/wiki/Left_recursion">left-recursive</a> grammar (which could lead to problems for various parser implementations),
833834
<a title="Column Validation Expression">Column Validation Expressions</a> have been further split into <a title="Combinatorial Expression">Combinatorial Expressions</a> and <a title="Non Combinatorial Expression">Non Combinatorial Expressions</a>.</p>
834835
<table class="ebnf-table">
835836
<tr>
@@ -1194,9 +1195,12 @@ <h4>URI Expressions</h4>
11941195
<h4>XSD Date Time Expressions</h4>
11951196
<p>
11961197
An <dfn>XSD Date Time Expression</dfn> checks that the data in the column is expressed as a valid XML Schema dateTime data type (see [[!XMLSCHEMA-2]] and [[!ISO8601]]).
1197-
You can also provide an OPTIONAL <em>from</em> and <em>to</em> date-times
1198+
You can also provide an OPTIONAL <em>from</em> and <em>to</em> date-times
11981199
(inclusive) to ensure that the value in the column falls within an expected date-time range.
11991200
</p>
1201+
<p>
1202+
As the XSD Date Time Expression uses the <a>XSD Date Time Literal</a> the final, timezone, part of the [[!ISO8601]] definition is OPTIONAL.
1203+
</p>
12001204
<table class="ebnf-table">
12011205
<tr>
12021206
<td class="ebnf-num">[52]</td>
@@ -1425,7 +1429,7 @@ <h5>Identical Expressions example</h5>
14251429
<p>
14261430
Consider a file that is expected to contain data relating to one particular batch of a process,
14271431
and should also be the data relating to one particular day. Over time we will receive many such files,
1428-
so we don't want to amend the schema each day to say what the valid date for the file is, or what the appropriate batch_code is.
1432+
so we do not want to amend the schema each day to say what the valid date for the file is, or what the appropriate batch_code is.
14291433
Instead we give a generic rule for the field content: it's an <a>xDate Expression</a> in the case of batch_date;
14301434
or that the batch_code will comprise three to five uppercase letters, followed by an uppercase B, followed by three digits;
14311435
and for each row in the file every date must be identical and every batch_code must be identical.
@@ -1560,7 +1564,7 @@ <h3>Input parameters used in Single Expressions and External Single Expressions<
15601564
<p>
15611565
Many <a title="Single Expression">Single Expressions</a> and <a title="External Single Expression">External Single Expressions</a> take a <a>String Provider</a>
15621566
as an input. A <dfn>String Provider</dfn> takes the form of either a <a>Column Reference</a>, a <a>String Literal</a>, <a>Concatenation Expression</a>
1563-
or a <a>No Extension Expression</a>.
1567+
or a <a>No Extension Argument Provider</a>.
15641568
</p>
15651569
<p>
15661570
A <dfn>Column Reference</dfn> comprises a <code>dollar sign ($)</code>, i.e. the [[UTF-8]] character code <code>0x24</code>,
@@ -1689,6 +1693,13 @@ <h4>Switch Case Expression</h4>
16891693
</section>
16901694
<section>
16911695
<h2>Column Expression examples</h2>
1696+
<p>
1697+
Example usage for a range of <a title="Column Expression">Column Expressions</a> is given below.
1698+
A greater range of example and other schemas actually used by The National Archives
1699+
<a href="https://github.com/digital-preservation/csv-schema/tree/master/example-schemas">can be found on GitHub</a>.
1700+
Most of these are extensively commented in order to explain usage. There is a also a set of example files to be downloaded which allow
1701+
<a title="File Exists Expression">File Exists Expressions</a> and <a title="Checksum Expression">Checksum Expressions and path substitutions to be more easily understood.
1702+
</p>
16921703
<pre class="example" data-lt="Column Expression Syntax">
16931704
piece: is("1") and (in($file_path) and in($resource_uri)) /*The column "piece" must have the specific value 1
16941705
the value must also be part of the value of the columns "file_path" and "resource_uri"
@@ -1697,11 +1708,11 @@ <h2>Column Expression examples</h2>
16971708
the combination of piece and item must be unique within the file.
16981709
file_uuid: uuid4 unique /*must be a version 4 uuid, and the value must be unique within the file (uuids must be
16991710
lower case). Here an implicit And Expression is used*/
1700-
file_path: fileExists uri starts("file:///") /*fileExists checks that there is actually a file of the given name at the
1711+
file_path: fileExists uri starts(concat("file:///",$piece,"/",$item)) /*fileExists checks that there is actually a file of the given name at the
17011712
specified location on the file system which is assumed to be the value held in "file_path".
17021713
We know the location should be in the form of a URI so a URI expression is used,
17031714
and in particular this should be a file url, so we further specify that the data
1704-
in the column must start "file:///" */
1715+
in the column must start "file:///", then (a folder named for) the piece id, then a /, then the item id */
17051716
file_checksum: checksum(file($file_path),"SHA-256") /* Compare the value given in this field to the checksum calculated for the file
17061717
found at the location given in the "file_path" field.
17071718
Use the specified checksum algorithm (SHA-256)
@@ -1716,7 +1727,8 @@ <h2>Column Expression examples</h2>
17161727
/*If "image_split" field is string: yes (precisely)
17171728
then timestamp for image split, compliant with XSD DateTime data type
17181729
and in range 4 December 2013 - 4 March 2014 (from the midnight starting 4 December,
1719-
to last second of 4 March), else it must be blank (ie "image_split" is no).
1730+
to last second of 4 March), and in the UTC (Greenwich Meantime) timezone,
1731+
else it must be blank (ie "image_split" is no).
17201732
</pre>
17211733
</section>
17221734
</section>
@@ -1966,11 +1978,11 @@ <h2>Identifiers</h2>
19661978
<h1>Errors and Warnings</h1>
19671979
<p>
19681980
An implementation MUST first check that the provided CSV Schema(s) are syntactically correct. If not, a <a>Schema Error</a> is produced,
1969-
and no further validation SHOULD of the CSV Schema(s) or provided CSV files(s) should be undertaken. If the schema check is successful then an implementation
1981+
and no further validation of the CSV Schema(s) or provided CSV files(s) SHOULD be undertaken. If the schema check is successful then an implementation
19701982
MAY continue with further CSV Schema(s) and CSV file validation.</p>
19711983
<p>If an implementation performs validation of a CSV file against a CSV Schema, a report SHOULD be produced for each <a>Column Validation Expression</a>
1972-
that fails validation; This is generally considered a <a>Validation Error</a>, unless the <a>Warning Directive</a> has been used reduce the severity of an error within a specific
1973-
<a title="Column Rules">Column Rule</a> to a <a>Validation Warning</a>.
1984+
that fails validation; This is generally considered a <a>Validation Error</a>,
1985+
unless the <a>Warning Directive</a> has been used to reduce the severity of an error within a specific <a title="Column Rules">Column Rule</a> to a <a>Validation Warning</a>.
19741986
</p>
19751987
<section>
19761988
<h2>Schema Errors</h2>

example-schemas/generic_digitised_surrogate_tech_acq_metadata_v1.csvs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ batch_code: length(1,16) regex("^[0-9a-zA-Z]{1,16}$")
1616
//by a logical AND unless another boolean is provided). 2nd part restricts to alphanumeric characters as
1717
//specified in digitisation standards p 31. Would usually comprise project identifier (eg department and series),
1818
//plus running count of batch number within that.
19-
department: regex("[A-Z]{1-4}") and (in($file_path) and in($resource_uri))
19+
department: regex("[A-Z]{1,4}") and (in($file_path) and in($resource_uri))
2020
//Parentheses control evaluation order of booleans as might be expected
2121
//The regex statement says that this field must consist of between 1 and 4 upper case alphabetic characters.
2222
//The grouped "in" statements say that the value found in this field must also be found as part of the fields

0 commit comments

Comments
 (0)