-
Notifications
You must be signed in to change notification settings - Fork 14
Advanced resource entity implementation
As discussed in #184:
entity is used to designate the origin of a resource. Different resources can been derived from one entity, i.e. a meeting has multiple nested documents. These documents can be resolvable by their own URL but the original source of the resource is still the same Resources that can be resolved (have their own identifier) should use that URL instead, see the comment below.entity. The same as its parent since this is our actual source.
However, some resources can be enriched. A document can be downloaded and process by a enricher, this changes the original source of the document and this should be reflected in entity. This should be implemented.
Some extra explanation: entity should better be renamed to canonical_iri and canonical_id instead. entity_type can be dropped. If possible, it should be identified with a URL, scheme and query parameters like this. It should represent the suppliers resource as they specify it, it should include a scheme (https:// by default) but no additional parameters. If the supplier does not specify it but we can assume the resource exists, we can construct the more specific URL ourselves. This makes it IRI's, which are most often URL's. This implies that we cannot assume they always resolve.
The canonical creates the bridge between the mapping IRI and the supplier's resource. In SOAP it is not possible to use URL's to identify a specific resource, in that case we do not have more information than the identifier itself so we use canonical_id, it would be something like '8984124'. The used_file would be the URL to our cached version of the SOAP response. In a later iteration we can use URL fragments to designate the identifier within the context of the cached version. We use canonical_id and canonical_iri fields since we need to serialize them as different attributes.
Some considerations:
- When a subresource has an own URL,
canonical_iriis used to specify. There is no direct relation betweencanonical_iriandused_file, the canonical refers to the specific resource while theused_fileshould be the cached version of the resource's parent. - When a subresource doesn't have an own URL,
canonical_idis used to designate the subresource within the resource. There is a direct relation betweencanonical_idandused_file, since the id will always be in the scope of the cached file. - A downloadable document has a
schema:contentUrlto the resolver, soused_fileshouldn't refer the same cache URL. Instead it should refer to the file where the URL to the document was originally specified. Also,schema:isBasedOnset by the enricher refers to the document's original download URL. Canonical should refer to the same URL, except for when the following applies: - Some suppliers distinguish between a document resource URL and a document download URL. If this is the case,
canonical_irishould be the resource URL andschema:isBasedOnshould be the download URL. - Note that for
canonical_irithe document resource URL is specified here as"self": "api.notubiz.nl/document/780972",withoutwith the?format=json&version=1.10.8.However we cannot add this information, it is up to the user to make the decision about which version and format to use.We want to give the user as much information how to find the actual resource we used, so it will be including at least theversionquery parameter but it would also be wise to includeformatas well. Sensitive query parameters like authentication should be left out.