Storage API design

# Overview

I've discussed with @roll how additional work that @akariv and I are doing with table schema drivers ( ref. https://github.com/frictionlessdata/tableschema-elasticsearch-py https://github.com/frictionlessdata/tableschema-sql-py https://github.com/frictionlessdata/tableschema-py ) could help bring forward this functionality in [Frictionless](https://github.com/frictionlessdata/frictionless-py), as well as in those older (but working and battle tested) libraries.

In [Frictionless](https://github.com/frictionlessdata/frictionless-py) the Storage API is "not finished":

> The Storage concept is responsible for reading and writing data package from dataset source like CKAN, SQL, or others. Currently, the Storage API is not yet finished so you can try reading the codebase and implement your own storage but you need to be ready for some changes to the API that might come.

On reflection, I'd like to better understand what are perceived as the shortcomings of the *existing* storage APIs, as implemented in the above libraries and a range more.

The [current Storage API](https://github.com/frictionlessdata/tableschema-py/blob/main/tableschema/storage.py#L16) has an interface like:

```
 # pip install tableschema_sql
from tableschema import Storage
storage = Storage.connect('sql', **options)
storage.create('bucket', descriptor)
storage.write('bucket', rows)
storage.read('bucket')
```

Which seems like a pretty reasonable interface. Based on my usage of these libraries, I've found the following things I'd like a "better" or "robust" solution for:

- Indexes for storage backends that support them (there is a working implementation in [Table Schema SQL](https://github.com/frictionlessdata/tableschema-sql-py))
- Table update/upsert routines:
  - row identity for tests when updating (probably PK based but maybe not only)
  - update payloads (subset of a row)
  - table migration (an update might add a field, or add an index)
- More flexible field mapping: **all** table schema fields need to be mapped, and consumers need to be able to trivially update the default mappings (e.g: I know I have an array that is safe for a Postgres array field, so map to that and not JSONB; I have a field that I want to map in elastic search to a keyword field and not a text field; and so on).
- Comprehensive mapping of field constraints
- Relational storage:
  - Foreign key strategy (e.g.: on sql, normalize as FK constraint, or flatten as array field)
  - Array field strategy (e.g: option to normalize into foreign key to related table)

This list is not all critical, just a list of things that I've pondered recently. I don't think the existing storage API limits any such use cases, and it seems to me that the existing API served as a starting point as good as any to iterate from.

---

Please preserve this line to notify @roll (lead of this repository)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storage API design #913

Overview

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Storage API design #913

Description

Overview

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions