Skip to content

More detail and explicitness in README #12

@rufuspollock

Description

@rufuspollock

We could improve README a bit to make it more explicit (so easier to actually get started) plus include some design info (may want to split into 2 issues)

Acceptance

  • Getting started instructions could actually be used with no "blanks to fill in"
  • Details of implementation e.g. what is written to git so that other devs could jump in

Tasks

  • Explicitness over implicitness in our instructions ... 😄 e.g. actual config values -- see below
  • Show detail of what is written to git (maybe in Git / Design section)
  • Example of datapackage.json

Analysis

Questions

  • How do we know a resource is lfs managed
  • What types of data resources do we handle. 3-4 options for resource data location:
    1. data in LFS [supported]
    2. data remote (not managed - just pointed to) [supported]
    3. data inline in datapackage.json [supported?]
    4. data local (in git repo) [not supported?]

Example material to include

# ~rufus can we have explicit "real" options
# what about lfs config to use? should that go here?
config = {
  }

# Directly instantiate the MetaStoreBackend class:
metastore = GitHubStorage(
  lfs_server_url="https://giftless.datahub.io/",
  default_branch_name="master"
  // directly passed to PyGithub client - for details see https://pygithub.readthedocs.io/en/latest/github.html#main-class-github 
  "github_options": {
      "password_or_token": "GITHUB_API_TOKEN"
    },
  )

Example datapackage.json - in examples. This example has 2 data resources, one stored in lfs cloud storage, one that is "remote".

{
  "name": "my-data-package",
  "resources": [
    {   // resouce with data in lfs cloud storage
        // how do we know?
      "path": "data/resource1.csv",
      "sha256": "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824",
      "lfs_prefix": "datopian/my-data-package",
      "bytes": 10240
    },
    {
      "path": "https://myremotesite.com/mydata.csv"
      // optionally more information
      ...
    },
    
  ]
}
import json

with open("datapackage.json") as f:
    metadata = json.loads(f)

package_info = metastore.create(package_id, metadata, message="...", author={name: email})

Now your git repo will look like XXX

.lfsconfig
.gitattributes
README.md          ???
datapackage.json
data/resource1.csv

.lfsconfig

[remote "origin"]
  # as specified in the original config for this backend
  lfsurl = https://giftless.datahub.io/

.gitattributes

data/resource1.csv filter=lfs diff=lfs merge=lfs -text

data/resource1.csv:

version https://git-lfs.github.com/spec/v1
oid sha256:2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
size 10240

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions