-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Overview
I've discussed with @roll how additional work that @akariv and I are doing with table schema drivers ( ref. https://github.com/frictionlessdata/tableschema-elasticsearch-py https://github.com/frictionlessdata/tableschema-sql-py https://github.com/frictionlessdata/tableschema-py ) could help bring forward this functionality in Frictionless, as well as in those older (but working and battle tested) libraries.
In Frictionless the Storage API is "not finished":
The Storage concept is responsible for reading and writing data package from dataset source like CKAN, SQL, or others. Currently, the Storage API is not yet finished so you can try reading the codebase and implement your own storage but you need to be ready for some changes to the API that might come.
On reflection, I'd like to better understand what are perceived as the shortcomings of the existing storage APIs, as implemented in the above libraries and a range more.
The current Storage API has an interface like:
# pip install tableschema_sql
from tableschema import Storage
storage = Storage.connect('sql', **options)
storage.create('bucket', descriptor)
storage.write('bucket', rows)
storage.read('bucket')
Which seems like a pretty reasonable interface. Based on my usage of these libraries, I've found the following things I'd like a "better" or "robust" solution for:
- Indexes for storage backends that support them (there is a working implementation in Table Schema SQL)
- Table update/upsert routines:
- row identity for tests when updating (probably PK based but maybe not only)
- update payloads (subset of a row)
- table migration (an update might add a field, or add an index)
- More flexible field mapping: all table schema fields need to be mapped, and consumers need to be able to trivially update the default mappings (e.g: I know I have an array that is safe for a Postgres array field, so map to that and not JSONB; I have a field that I want to map in elastic search to a keyword field and not a text field; and so on).
- Comprehensive mapping of field constraints
- Relational storage:
- Foreign key strategy (e.g.: on sql, normalize as FK constraint, or flatten as array field)
- Array field strategy (e.g: option to normalize into foreign key to related table)
This list is not all critical, just a list of things that I've pondered recently. I don't think the existing storage API limits any such use cases, and it seems to me that the existing API served as a starting point as good as any to iterate from.
Please preserve this line to notify @roll (lead of this repository)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status