datagovindia/README.Rmd at master · econabhishek/datagovindia · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```


# datagovindia


**datagovindia** is a wrapper around >100,000 APIs of the Government of India's open
data platform [data.gov.in](https://data.gov.in/ogpl_apis). Here is a small guide
to take you through the package.  Primarily,the functionality is centered around
three aspects :

* **API discovery** - Finding the right API from all the available APIs
* **API information** - Getting information about a particular API
* **Querying the API** - Getting a tidy data set from the chosen API

## Installation
The package is now on CRAN, download using :
```r
install.packages("datagovindia")
```


You can install the development version from [GitHub](https://github.com/econabhishek/datagovindia) with:

``` r
# install.packages("devtools")
devtools::install_github("econabhishek/datagovindia")
```
## Prerequisites

* An account on data.gov.in
* An API key from the My Account page (instructions here : [official guide](https://data.gov.in/help/how-use-datasets-apis))


## Setup
```{r example}
library(datagovindia)

```


Know more about the various functions in the package [**vignette**](https://cran.r-project.org/package=datagovindia/vignettes/datagovindia_vignette.html).

## Example workflow

Once you have the API key ready, and have chosen the API you want and have its
index_name ([**vignette**](https://cran.r-project.org/package=datagovindia/vignettes/datagovindia_vignette.html) for more details) using the search functions in the
package, you are ready to extract data from it.


The function *get_api_data* is really the powerhouse in this package which allows
one to do things over and above a manually constructed API query can do by utilizing
the data.frame structure of the underlying data. It allows the user to filter, sort,
select variables and to decide how much of the data to extract. The website can itself
filter on only one field with one value at a time but one command through the wrapper
can make multiple requests and append the results from these requests at the same time.

But before we dive into data extraction, we first need to validate our API key relieved
from [data.gov.in](https://data.gov.in/ogpl_apis). To get the key, you need to register first register and then get the key from your "My Account" page after logging in.
More instruction can be found on this [official guide](https://data.gov.in/help/how-use-datasets-apis). Once you get your API key, you
can validate it as follows (only need to do this once per session, this is a sample
key from the website for demonstration) :

```{r}
##Using a sample key
register_api_key("579b464db66ec23bdd000001cdd3946e44ce4aad7209ff7b23ac571b")

```

Once you have your key registered, you are ready to extract data from a chosen API.
Here is what each argument means :

* api_index : index_name of the chosen API (found by using search functions)
* results_per_req : Results per request sent to the server ; can take integer values or the string "all" to get all of the available data
* filter_by : A named character vector of field id (not the name) - value(s) pairs ; can take multiple fields as well as multiple comma separated values
* field_select : A character vector of fields to select only a subset of variables in the final data.frame
* sort_by : Sort by one or multiple fields


In a nutshell, first find the API you want using the search functions, get the **index_name** of the API from the results, optionally take a look at the fields present in the data of the API and then use the get_api_data function to extract the data.
Suppose we choose the API "Real time Air Quality Index from various location" with index_ name *3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69*. First we will look at which fields are available to construct the right query.
Suppose We want to get the data from only 2 cities Chandigarh and Gurugram and pollutants PM10 and NO2. We will let all fields to be returned (dataset columns).

We now look at the fields available to play with.

```{r,results="hide"}
get_api_fields("3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69")
```

```{r,echo=FALSE}
get_api_fields("3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69") %>%
  knitr::kable()
```

We accordingly select the **city** and **pollution_id** fields for constructing our query.
Note that we use only field id to finally query the data.

```{r,results='hide'}

get_api_data(api_index="3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69",
             results_per_req=10,filter_by=c(city="Gurugram,Chandigarh",
                                            polutant_id="PM10,NO2"),
             field_select=c(),
             sort_by=c('state','city'))
```

```{r,echo=FALSE,message=FALSE}

get_api_data(api_index="3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69",
             results_per_req=10,filter_by=c(city="Gurugram,Chandigarh",                                 pollutant_id="PM10,NO2"),
             field_select=c(),
             sort_by=c('state','district','city')) %>%
  knitr::kable()
```

## **Python Version**


 This wrapper is also available on Python (PyPI) visit -

* [Development version](https://github.com/addypy/datagovindia)

* [PyPI](https://pypi.org/project/datagovindia/)

Use
```python
pip install datagovindia
```
 Authors :


* [Abhishek Arora](https://github.com/econabhishek)
* [Aditya Karan Chhabra](https://github.com/addypy)