|
| 1 | +<pre class='metadata'> |
| 2 | +Title: TREE Discovery and Context Information |
| 3 | +Shortname: TREEDiscovery |
| 4 | +Level: 1 |
| 5 | +Status: w3c/CG-DRAFT |
| 6 | +Markup Shorthands: markdown yes |
| 7 | +Group: TREE hypermedia community group |
| 8 | +URL: https://w3id.org/tree/specification/discovery |
| 9 | +Repository: https://github.com/treecg/specification |
| 10 | +Mailing List: public-treecg@w3.org |
| 11 | +Mailing List Archives: https://lists.w3.org/Archives/Public/public-treecg/ |
| 12 | +Editor: Pieter Colpaert, https://pietercolpaert.be |
| 13 | +Abstract: |
| 14 | + This specification defines how a client selects a specific dataset and search tree, as well as extracts relevant context information. |
| 15 | +</pre> |
| 16 | + |
| 17 | +# Definitions # {#overview} |
| 18 | + |
| 19 | +A `tree:Collection` is a subclass of `dcat:Dataset` ([[!vocab-dcat-3]]). |
| 20 | +The specialization being that this particular dataset is a collection of _members_. |
| 21 | + |
| 22 | +A `tree:SearchTree` is a subClassOf `dcat:Distribution`. |
| 23 | +The specialization being that it uses the main TREE specification to publish a search tree. |
| 24 | + |
| 25 | +A node from which all other nodes can be found is a `tree:RootNode`. |
| 26 | + |
| 27 | +Note: The `tree:SearchTree` and the `tree:RootNode` MAY be identified by the same IRI when no disambiguation is needed. |
| 28 | + |
| 29 | +A TREE client MUST be provided with a URL to start from, which we call the _entrypoint_. |
| 30 | + |
| 31 | +# Initializing a client with a url # {#starting-from} |
| 32 | + |
| 33 | +The goal of the client is to understand what `tree:Collection` it is using, and to find a `tree:RootNode` to start the traversal phase from. |
| 34 | +This discovery specification extends the initialization step in the TREE specification, for the cases in which multiple options are possible. |
| 35 | + |
| 36 | +The client MUST dereference the URL, which will result in a set of quads. The client now MUST first perform the init step from the main specification. |
| 37 | +If that did not return any result, then the client MUST check whether the URL before redirects (`E`) has been used in one of the following discovery patterns described in the subsections: |
| 38 | + 1. `E` is a `tree:Collection`: then the client needs to [select the right search tree](#tree-search-trees) |
| 39 | + 2. `E` is a `dcat:Dataset`: then the client needs to [select the right distribution or dataservice from a catalog](#dcat-dataset) |
| 40 | + 3. `E` is a `ldes:EventStream`: then the client MAY take into account [LDES specific properties](#ldes) |
| 41 | + 4. `E` is a `dcat:Distribution`: then the client needs to [process it accordingly](#dcat-distribution) |
| 42 | + 5. `E` is a `dcat:DataService`: then the client needs to [process it accordingly](#dcat-dataservice) |
| 43 | + 6. `E` is a catalog or is not explicitly mentioned: then it needs to select a dataset based on [shape information](#tree-collection-shapes) and [DCAT Catalog information](#dcat-catalog) |
| 44 | + |
| 45 | +## Selecting a collection via shapes ## {#tree-collection-shapes} |
| 46 | + |
| 47 | +When multiple collections are found by a client, it can choose to prune the collections based on the `tree:shape` property. |
| 48 | +The `tree:shape` property will refer to a first `sh:NodeShape`. |
| 49 | +The collection MAY be pruned in case there is no overlap with the properties the client needs. |
| 50 | + |
| 51 | +Issue: Will we document the precise algorithm to use? Should we extend shapes with cardinality approximations as well? |
| 52 | + |
| 53 | +## Selecting a collection via a catalog ## {#dcat-catalog} |
| 54 | + |
| 55 | +A DCAT Catalog is an overview of datasets, data services and distributions. |
| 56 | +As TREE clients first need to select a dataset, and then a search tree to use, it aligns with how DCAT-AP works. |
| 57 | +DCAT discovery extends upon the previous section in which a collection or dataset can be selected based on the `tree:shape` property. |
| 58 | + |
| 59 | +For now, we will assume the DCAT information is available in subject pages. |
| 60 | + |
| 61 | +Issue: Do we need more text on how to handle different types of DCAT interfaces? |
| 62 | + |
| 63 | +The dataset descriptions can be used for filtering the datasets available in a catalog to a list of datasets that can be useful for the client. |
| 64 | +Such properties may include the spatial extent, the time extent, or how it is possibly a part of another `dcat:Dataset`. |
| 65 | + |
| 66 | +Issue: How precise do we need to be in this specification? |
| 67 | + |
| 68 | +When the `dcat:Dataset` is a `tree:Collection`, the DCAT catalog is going to contain a `dct:type` property with `https://w3id.org/tree#Collection` or `https://w3id.org/ldes#EventStream` as the object. |
| 69 | + |
| 70 | +## Choosing from multiple SearchTrees with TREE ## {#tree-search-trees} |
| 71 | + |
| 72 | +Issue: This is yet to be done |
| 73 | + |
| 74 | +## Selecting a search tree via a DCAT dataset ## {#dcat-dataset} |
| 75 | + |
| 76 | +The are two ways in which you can find a search tree from a dataset: via the distributions and via the data services. Both need to be tested. |
| 77 | +Selecting a distribution or data service when multiple are available needs to be done based on [the search tree description](tree-search-trees). |
| 78 | +If nothing is available, all need to be tested by processing them as exemplifie din the next subsections. |
| 79 | + |
| 80 | +### Selecting a search tree via DCAT Distribution ### {#dcat-distribution} |
| 81 | + |
| 82 | +`E dcat:distribution ?D . ?D dcat:downloadURL ?N .` then ?N is a rootnode of E. |
| 83 | + |
| 84 | +Issue: This is yet to be done |
| 85 | + |
| 86 | +### Selecting a search tree from a DCAT data service ### {#dcat-dataservice} |
| 87 | + |
| 88 | + * `?DS dcat:servesDataset E ; dcat:endpointURL ?U` or `E dcat:endpointURL ?U`, then the algorithm MUST repeat the algorithm with `?U` as the entrypoint. |
| 89 | + |
| 90 | +Issue: This is yet to be done |
| 91 | + |
| 92 | +## Linked Data Event Streams ## {#ldes} |
| 93 | + |
| 94 | +In case the client is not made for query answering, but only for setting up a replication and synchronization system, then there is a special type that can be used to indicate the search tree is made for this purpose: the `ldes:EventSource`. |
| 95 | +Clients that want to prioritize taking a _full_ copy MAY give full priority to this server hint. |
| 96 | + |
| 97 | +<div class="example"> |
| 98 | +```turtle |
| 99 | +E a ldes:EventSource ; |
| 100 | + tree:rootNode|dcat:downloadURL </node1> . |
| 101 | +``` |
| 102 | +</div> |
| 103 | + |
| 104 | +# Extracting content information # {#context} |
| 105 | + |
| 106 | +Issue: This is yet to be done |
| 107 | + |
| 108 | +Context information enables a client to understand who the creator of a certain dataset is, when it was last changed, what other datasets it was derived from, etc. |
| 109 | + |
| 110 | +## DCAT and dcterms ## {#context-dcat} |
| 111 | + |
| 112 | +Issue: This is yet to be done |
| 113 | + |
| 114 | +## Provenance ## {#context-prov} |
| 115 | + |
| 116 | +Issue: This is yet to be done |
| 117 | + |
| 118 | +## Linked Data Event Streams ## {#context-ldes} |
| 119 | + |
| 120 | +Issue: This is yet to be done |
| 121 | + |
| 122 | +LDES (https://w3id.org/ldes/specification) is a way to evolve search trees in a consistent way. It defines every member as immutable, and a collection as append-only. |
| 123 | +Therefore, one can make sure to only process each member once. |
| 124 | +Extra terms are added, such as the concept of an EventStream, retention policies and a timestampPath. |
0 commit comments