Skip to content

Commit 55a5395

Browse files
committed
Update implementation plan for oep-datasets #1971
1 parent 63cee9a commit 55a5395

File tree

1 file changed

+71
-24
lines changed

1 file changed

+71
-24
lines changed

docs/oeplatform-code/features/oep-datasets/datasets.md

Lines changed: 71 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -35,52 +35,99 @@ On the OEP, datasets are accessible via the UI and programmatically via the REST
3535

3636
## Use Cases
3737

38-
OEP Datasets serve multiple use cases:
38+
??? info "OEP Datasets serve multiple use cases"
3939

40-
**Draft Data Storage**
40+
**Draft Data Storage**
4141

42-
Upload data in any structure, using any PostgreSQL-supported types. No strict schema required during early exploration.
42+
Upload data in any structure, using any PostgreSQL-supported types. No strict schema required during early exploration.
4343

44-
**Catalog Publishing**
44+
**Catalog Publishing**
4545

46-
Move draft datasets to published state within a topic category.
46+
Move draft datasets to published state within a topic category.
4747

48-
Requires:
48+
Requires:
4949

50-
- Open data license for each resource
51-
- Consistent structure
52-
- Metadata completeness
50+
- Open data license for each resource
51+
- Consistent structure
52+
- Metadata completeness
5353

54-
**Scenario Data**
54+
**Scenario Data**
5555

56-
Link datasets to **Scenario Bundles** for model comparison and scenario analysis.
56+
Link datasets to **Scenario Bundles** for model comparison and scenario analysis.
5757

58-
This allows:
58+
This allows:
5959

60-
- Qualitative comparisons of scenario descriptions
61-
- Quantitative comparisons using graphs and metrics across bundles
60+
- Qualitative comparisons of scenario descriptions
61+
- Quantitative comparisons using graphs and metrics across bundles
62+
63+
**User Stories overview**
64+
65+
- User want to create datasets and want to add data, delete data and tables from the dataset or edit it.
66+
- User can use the UI of the Website and the Rest-API to do all datasets related tasks
67+
- Users want to find data and publish it: Data is grouped in topics and all datasets are uploaded to the model_draft as a initial editing space, later datasets can be published and are considered complete, multiple versions might follow
68+
- Users may want to add Datasets to multiple topics
69+
- Users want to use well known functionality like the Legacy API functionality. They think that it cannot be changed suddenly.
70+
(if we need to add functionality we want to make it optional. Once it was adopted we can start do bigger changes.)
6271

6372
---
6473

6574
## Implementation
6675

67-
OEP Datasets are described using the **OEMetadata** specification. The metadata is stored as JSON on both the OEP and **MOSS**, our RDF-capable metadata store. MOSS handles:
76+
??? info "Changes in OEP"
77+
78+
First of we have in the OEP:
79+
80+
- Two databases 1. Primary DB and 2. Django DB. The Datastore for actual data is the Primary DB and the Django DB is like data registry to manage uploaded datasets and provide additional functionality.
81+
82+
- The django application in which mainly the `api` & `dataedit` apps are affected.
83+
84+
??? info "What other services are affected"
85+
86+
Additionally we need to handle:
87+
88+
- The Databus is the PID system for the OEP. Once the data is in a specific quality it is either manually or automatically (once published) registered on the Databus. The Databus itself only stores a metadata entry (based on DCAT-AP) in its internal graph store. It offers a rest api. It is hosted outside the OEP network but is connected internally to enable server-to-server communication.
89+
- The MOSS is another Service we want to use to provide extended semantic search functionality based on the oemetadata entires for each dataset/table. It will also serve as primary metadata store for the OEP. It is also connected to the OEP for server-to-server communication.
90+
91+
??? info "OEMetadata & Moss"
92+
93+
OEP Datasets are described using the **OEMetadata** specification. The metadata is stored as JSON on both the OEP and **MOSS**, our RDF-capable metadata store. MOSS handles:
6894

69-
- Generating RDF from JSON-LD
70-
- Metadata search functionality
95+
- Primary store for metadata documents
96+
- Generating RDF from JSON-LD
97+
- Metadata search functionality
7198

72-
Keeping metadata in both systems improves integration but requires sync logic. This is solved by atomic updates on the OEP side when metadata is created or updated.
99+
Keeping metadata in both systems improves integration but requires sync logic. This is solved by atomic updates on the OEP side when metadata is created or updated both systems are updated or changes are ignored and the user is informed.
73100

74-
We organize datasets hierarchically within the data catalog:
101+
As described above we define a dataset similar to the DCAT-AP definition and build ontop of the frictionless datapackage standard. A datasets can be either a single table resource or multiple. We organize datasets hierarchically. All Datasets are grouped into Topics which make up the catalog categories. The Topics are not part of the hierarchy as datasets can be in multiple Topics.
75102

76-
1. `Topics/` – catalog categories
77-
2. `Topics/<topic>/Datasets/`
78-
3. `Topics/<topic>/Datasets/<dataset>/`
79-
4. `Topics/<topic>/Datasets/<dataset>/Resources/`
80-
5. `Topics/<topic>/Datasets/<dataset>/Resources/<resource>/`
103+
With this baseline definition we get a desired hierarchy which looks like this:
104+
105+
1. `Datasets/`
106+
2. `Datasets/<dataset>/`
107+
3. `Datasets/<dataset>/Resources/`
108+
4. `Datasets/<dataset>/Resources/<resource>/`
81109

82110
This structure ensures deep linking and intuitive access across the platform.
83111

112+
**What will change in the current URL/API system?**
113+
(While keeping functionality available)
114+
115+
To keep the current functionality in place the previous per-table approach is maintained and current urls are redirected:
116+
117+
Topics will not be part of the dataset url anymore but there will be topic specific list urls like
118+
119+
1. `database/topics` = list all topics
120+
2. `database/topics/<topic>` = list all datasets/tables per topic
121+
122+
Currently we have something like `topics/<topic>/tables/<table>` which will become
123+
124+
1. `datasets` = Not necessarily relevant but for api request this could be an easy way to get all available datasets
125+
2. `datasets/<table>/` = Tables detail page
126+
127+
How this will affect the REST-API:
128+
129+
Since we already have a production implementation up and running since years and users are used to the existing structure as well as all REST-API endpoints using
130+
84131
---
85132

86133
## UI Preview

0 commit comments

Comments
 (0)