-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
We could improve README a bit to make it more explicit (so easier to actually get started) plus include some design info (may want to split into 2 issues)
Acceptance
- Getting started instructions could actually be used with no "blanks to fill in"
- Details of implementation e.g. what is written to git so that other devs could jump in
Tasks
- Explicitness over implicitness in our instructions ... 😄 e.g. actual config values -- see below
- Show detail of what is written to git (maybe in Git / Design section)
- Example of
datapackage.json
Analysis
Questions
- How do we know a resource is lfs managed
- What types of data resources do we handle. 3-4 options for resource data location:
- data in LFS [supported]
- data remote (not managed - just pointed to) [supported]
- data inline in datapackage.json [supported?]
- data local (in git repo) [not supported?]
Example material to include
# ~rufus can we have explicit "real" options
# what about lfs config to use? should that go here?
config = {
}
# Directly instantiate the MetaStoreBackend class:
metastore = GitHubStorage(
lfs_server_url="https://giftless.datahub.io/",
default_branch_name="master"
// directly passed to PyGithub client - for details see https://pygithub.readthedocs.io/en/latest/github.html#main-class-github
"github_options": {
"password_or_token": "GITHUB_API_TOKEN"
},
)
Example datapackage.json - in examples. This example has 2 data resources, one stored in lfs cloud storage, one that is "remote".
{
"name": "my-data-package",
"resources": [
{ // resouce with data in lfs cloud storage
// how do we know?
"path": "data/resource1.csv",
"sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
"lfs_prefix": "datopian/my-data-package",
"bytes": 10240
},
{
"path": "https://myremotesite.com/mydata.csv"
// optionally more information
...
},
]
}
import json
with open("datapackage.json") as f:
metadata = json.loads(f)
package_info = metastore.create(package_id, metadata, message="...", author={name: email})
Now your git repo will look like XXX
.lfsconfig
.gitattributes
README.md ???
datapackage.json
data/resource1.csv
.lfsconfig
[remote "origin"]
# as specified in the original config for this backend
lfsurl = https://giftless.datahub.io/
.gitattributes
data/resource1.csv filter=lfs diff=lfs merge=lfs -text
data/resource1.csv:
version https://git-lfs.github.com/spec/v1
oid sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
size 10240
Metadata
Metadata
Assignees
Labels
No labels