added documentation for dataset creation

emptymalei · emptymalei · commit 36a539db4ce2 · 2021-08-07T03:18:28.000+02:00
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 MIT License
 
-Copyright (c) 2020 Lei Ma
+Copyright (c) 2021 Lei Ma
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -1 +1,2 @@
-include README.md
+include README.md
+graft dataherb/serve/mkdocs_template
diff --git a/docs/tutorials/create/index.md b/docs/tutorials/create/index.md
@@ -0,0 +1,227 @@
+## Create New Dataset
+
+!!! warning "The demo dataset"
+    We will use the data files in [this repository](https://github.com/DataHerb/dataherb-python-demo-dataset) as a demo. Please [download the zip of this repo](https://github.com/emptymalei/dataherb-serve/archive/refs/heads/master.zip).
+
+
+
+### Create
+
+
+After downloading and unzipping [the demo dataset](https://github.com/emptymalei/dataherb-serve/archive/refs/heads/master.zip), `cd` into the folder. In my case, it is
+
+```bash
+cd ~/Downloads/dataherb-python-demo-dataset-main
+```
+
+Run the command
+
+```bash
+dataherb create
+```
+
+and a few questions will pop out.
+
+```bash
+Your current working directory is /Users/itsme/Downloads/dataherb-python-demo-dataset-main
+A dataherb.json file will be created right here.
+Are you sure this is the correct path? [y/N]: y
+```
+
+This makes sure that we are working in the correct folder. In this case, we type `y` to confirm.
+
+#### Which Type of Remote
+
+```bash
+[?] Where is/will be the dataset synced to?: git
+ > git
+   s3
+```
+
+Dataherb supports two different types of sources, S3 and git. In this case, we choose `git` as we would like to sync the dataset to a GitHub repo.
+
+#### Name of the Dataset
+
+```bash
+[?] How would you like to name the dataset?: Dataherb Demo Dataset
+```
+
+This will be the name of your dataset.
+
+#### ID of the Dataset
+
+```bash
+[?] Please specify a unique id for the dataset: git-dataherb-python-demo-dataset
+```
+
+ID of the dataset has to be unique in your whole flora.
+
+#### Description
+
+```bash
+[?] What is the dataset about? This will be the description of the dataset.: This is a demo dataset to test and show dataherb.
+```
+
+Describe the dataset here.
+
+#### URI
+
+```bash
+[?] What is the dataset's URI? This will be the URI of the dataset.: https://github.com/DataHerb/dataherb-python-demo-dataset-created.git
+```
+
+
+
+
+### Result
+
+Two things happened after this.
+
+1. A file `dataherb.json` will be created in the current folder.
+2. The metadata for this dataset has been added to the Flora.
+
+
+
+!!! note "Content of the `dataherb.json` file"
+    The content of the file should be the following.
+
+    ```json
+    {
+        "source": "git",
+        "name": "Dataherb Demo Dataset",
+        "id": "git-dataherb-python-demo-dataset",
+        "description": "This is a demo dataset to test and show dataherb.",
+        "uri": "https://github.com/DataHerb/dataherb-python-demo-dataset-created.git",
+        "metadata_uri": "https://raw.githubusercontent.com/DataHerb/dataherb-python-demo-dataset-created/main/dataherb.json",
+        "datapackage": {
+            "profile": "tabular-data-package",
+            "resources": [
+                {
+                    "path": "dataset/indeed_job_listing.csv",
+                    "profile": "tabular-data-resource",
+                    "name": "indeed_job_listing",
+                    "format": "csv",
+                    "mediatype": "text/csv",
+                    "encoding": "windows-1252",
+                    "schema": {
+                        "fields": [
+                            {
+                                "name": "title",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "location",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "company",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "description",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "salary",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "url",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "published_at",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "id",
+                                "type": "string",
+                                "format": "default"
+                            }
+                        ],
+                        "missingValues": [
+                            ""
+                        ]
+                    }
+                },
+                {
+                    "path": "dataset/stackoverflow_job_listing.csv",
+                    "profile": "tabular-data-resource",
+                    "name": "stackoverflow_job_listing",
+                    "format": "csv",
+                    "mediatype": "text/csv",
+                    "encoding": "utf-8",
+                    "schema": {
+                        "fields": [
+                            {
+                                "name": "link",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "category",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "title",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "description",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "published_at",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "location",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "stackoverflow_id",
+                                "type": "integer",
+                                "format": "default"
+                            },
+                            {
+                                "name": "author",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "location_country",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "location_city",
+                                "type": "string",
+                                "format": "default"
+                            },
+                            {
+                                "name": "updated_at",
+                                "type": "string",
+                                "format": "default"
+                            }
+                        ],
+                        "missingValues": [
+                            ""
+                        ]
+                    }
+                }
+            ]
+        }
+    }
+    ```
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
@@ -1,3 +1,28 @@
 ## Tutorials
 
 This is a series of tutorials to help you get started with `dataherb`.
+
+### Herbs and Flora
+
+The name Dataherb is a metaphor. Each data file is like a **Leaf** (or **Resource**) of a **Herb** (or **Dataset**). Many Herbs form a **Flora**.
+
+| Dataherb Term | Meaning |
+|-----|-----|
+| Herb | A Dataset |
+| Resource | A Data File |
+
+!!! note "Leaf and Resource"
+    Leaf is an alias of Resource.
+
+### Managing the Flora
+
+This python package ships a command line tool that can be used to manage the **Flora**.
+
+The core of a flora is basically a json file that lists the metadata of herbs. The default flora can be configured using `dataherb configure`.
+
+Using this configuration, we can simply store the json representing the flora and recreate the whole flora on other devices.
+
+!!! note "Sync the Flora to GitHub"
+    For example, we can version control the flora and push it to GitHub and pull it down on other devices. In this way, we can sync and restore the flora.
+
+
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -76,6 +76,7 @@ nav:
   - "Tutorials":
     - "Tutorials": tutorials/index.md
     - "Configuration": tutorials/configuration/index.md
+    - "Create Dataset": tutorials/create/index.md
   - References:
     - "Introduction": references/index.md
     - "dataherb.command": references/command.md

Original file line number	Diff line number	Diff line change
`@@ -1 +1,2 @@`
`1`		`-include README.md`
	`1`	`+include README.md`
	`2`	`+graft dataherb/serve/mkdocs_template`