Skip to content

Detection, validation and report of documents in the extraction #29

@solfeggietto

Description

@solfeggietto

The Documaster Noark Extraction Validator must detect, validate and report on the whole total number of documents included in the extraction as well as referred to in the metadata of the extraction.

The first Versions of the Noark Extraction Validator will include validator of a specific set of document formats (at least PDF/A formats, but possibly other as well?). But the structure in this tool must be made in a way that makes it easy to increase supported filetypes to validate (and possibly several validation Tools to be called from this tool itself).

The Noark 4 Extractrion tool must list all located files/documents in the extraction and categorize them in the report With respect of whether it is an archival format, a Production format, if the validtar have done a validation test or not on the individual file. The catogorization must have Counters on top Level of types as well as subcounters of that makes sens and if this increase the all over usability of this tool itself as documentation and validation at an overview Level. We must have an open dialog on how this best shall be shown in the report.

I will suggest to Connect Documaster Decom approach (preservation of a general extraction) With this Noark Exctraction Validator approach, since Documaster Decom must detect file-formats in the process of making the archival Version of Production-documents.

Attached a Excel-draft of the current filetypes (this will be enlarged to enclude a somewhat Complete list including the most common Production formats).

Attached the proposed New regulation of document formats (§5-16 to §5-19, in addition to specific demands on Noark-system from §5-20 as well as several related sections therein).

Current regulations of document-formats from The National Archives of Norway:
https://lovdata.no/forskrift/1999-12-01-1566/§8-17
https://lovdata.no/forskrift/1999-12-01-1566/§8-18
https://lovdata.no/forskrift/1999-12-01-1566/§8-19
https://lovdata.no/forskrift/1999-12-01-1566/§8-20

2017-03-17_DRAFT_Dokument-format.xlsx
Høringsnotat - forslag til endringer i riksarkivarens forskrift.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions