Skip to content

Commit 73f0f28

Browse files
committed
update readme
1 parent ba92d0a commit 73f0f28

File tree

1 file changed

+85
-2
lines changed

1 file changed

+85
-2
lines changed

README.md

Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,92 @@
1-
# dataherb-python
1+
<h1 align="center">
2+
<br>
3+
<a href="https://dataherb.github.io"><img src="https://raw.githubusercontent.com/DataHerb/dataherb.github.io/master/assets/favicon/ms-icon-310x310.png" alt="Markdownify" width="200"></a>
4+
<br>
5+
The Python Package for DataHerb
6+
<br>
7+
</h1>
8+
9+
<h4 align="center">A <a href="https://dataherb.github.io" target="_blank">DataHerb</a> Core Service to Create and Load Datasets.</h4>
10+
11+
<p align="center">
12+
13+
</p>
14+
215

3-
The python toolkit for DataHerb datasets.
416

517
## Install
618

19+
```
20+
pip install dataherb
21+
```
22+
23+
## Usage
24+
25+
### Load Data into DataFrame
26+
27+
```
28+
# Load the package
29+
from dataherb.flora import Flora
30+
31+
# Initialize Flora service
32+
# The Flora service holds all the dataset metadata
33+
dataherb = Flora()
34+
35+
# Search datasets with keyword(s)
36+
geo_datasets = dataherb.search("geo")
37+
print(geo_datasets)
38+
39+
# Get a specific file from a dataset and load as DataFrame
40+
tz_df = dataherb.herb(
41+
"geonames_timezone"
42+
).leaves.get(
43+
"dataset/geonames_timezone.csv"
44+
).data
45+
print(tz_df)
46+
47+
```
48+
49+
50+
### Create Dataset Using Command Line Tool
51+
52+
We provide a template for dataset creation.
53+
54+
> Before creating a dataset, it is recommended that the user reads [the intro](#Understanding-DataHerb).
55+
56+
Use the following command line tool to create the metadata template.
57+
```bash
58+
dataherb create
59+
```
60+
61+
## Understanding DataHerb
62+
63+
64+
### What is DataHerb
65+
66+
DataHerb is an open data initiative to make the access of open datasets easier.
67+
68+
- A **DataHerb** or **Herb** is a dataset. A dataset comes with the data files, and the metadata of the data files.
69+
- A **DataHerb Leaf** or **Leaf** is a data file in the DataHerb.
70+
- A **Flora** is the combination of all the DataHerbs.
71+
72+
In many data projects, finding the right datasets to enhance your data is one of the most time consuming part. DataHerb adds flavor to your data project.
73+
74+
### What is DataHerb Flora
75+
76+
We desigined the following workflow to share and index datasets.
77+
78+
![DataHerb Workflow](https://raw.githubusercontent.com/DataHerb/dataherb.github.io/master/assets/images/dataherb-components.png)
79+
80+
This repository is being used for listing of datasets (Listings in DataHerb flora repository).
81+
82+
### How to Add Your Dataset
83+
84+
> [A Complete **Tutorals**](https://dataherb.github.io/add/)
85+
86+
Simply create a `yml` file in the `flora` folder to link to your dataset repository. Your dataset repository should have a `.dataherb` folder and a `metadata.yml` file in it.
87+
88+
The indexing part will be done by [GitHub Actions](https://github.com/DataHerb/dataherb-flora/actions).
89+
790

891
## Development
992

0 commit comments

Comments
 (0)