Skip to content

Commit 2802382

Browse files
committed
Merge remote-tracking branch 'origin/main'
# Conflicts: # docs/source/notebooks/example_brazil.ipynb
2 parents 48f7fc1 + 779784e commit 2802382

File tree

6 files changed

+1539
-1254
lines changed

6 files changed

+1539
-1254
lines changed

README.md

Lines changed: 66 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,14 @@
1-
<a href='https://harmonize-tools.github.io/socio4health/'><img height="100" src="https://raw.githubusercontent.com/harmonize-tools/socio4health/main/docs/source/_static/image.png"/></a>
21

3-
# socio4health <a href='https://www.harmonize-tools.org/'><img height="139" src="https://harmonize-tools.github.io/harmonize-logo.png"/></a>
2+
<a href="https://www.harmonize-tools.org/">
3+
<img height="120" align="right" src="https://harmonize-tools.github.io/harmonize-logo.png" />
4+
</a>
5+
6+
<a href="https://harmonize-tools.github.io/socio4health/">
7+
<img height="120" src="https://raw.githubusercontent.com/harmonize-tools/socio4health/main/docs/source/_static/image.png" />
8+
</a>
49

10+
# socio4health
11+
512
<!-- badges: start -->
613

714
[![Lifecycle:
@@ -13,22 +20,31 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
1320
![commits](https://badgen.net/github/commits/harmonize-tools/socio4health/main)
1421
<!-- badges: end -->
1522

16-
## Overview
23+
## Overview
1724
<p style="font-family: Arial, sans-serif; font-size: 14px;">
18-
Package socio4health is an extraction, transformation, loading (ETL), and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data 📊 from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a unified relational database structure.
25+
Package socio4health is an extraction, transformation and loading (ETL) classification tool designed to simplify the intricate process of collecting and merging data from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a harmonized dataset.
1926
</p>
2027

2128
- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
22-
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, and compressed files, ensuring versatility in sourcing information.
23-
- Consolidating extracted data into a pandas DataFrame.
24-
- Consolidating transformed data into a cohesive relational database.
25-
- Conduct precise queries and apply transformations to meet specific criteria.
29+
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, fixed-width files and geospatial files, ensuring versatility in sourcing information.
30+
- Consolidating extracted data into a pandas (or dask) DataFrame.
2631

2732

2833

2934
## Dependencies
3035

3136
<table>
37+
<tr>
38+
<td align="center">
39+
<a href="https://www.dask.org/" target="_blank">
40+
<img src="https://avatars.githubusercontent.com/u/17131925?s=200&v=4" height="50" alt="pandas logo">
41+
</a>
42+
</td>
43+
<td align="left">
44+
<strong>Dask</strong><br>
45+
Dask is a flexible parallel computing library for analytics.<br>
46+
</td>
47+
</tr>
3248
<tr>
3349
<td align="center">
3450
<a href="https://pandas.pydata.org/" target="_blank">
@@ -37,7 +53,18 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
3753
</td>
3854
<td align="left">
3955
<strong>Pandas</strong><br>
40-
Pandas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool.<br>
56+
Pandas is a well-known open source data analysis and manipulation tool.<br>
57+
</td>
58+
</tr>
59+
<tr>
60+
<td align="center">
61+
<a href="https://geopandas.org/" target="_blank">
62+
<img src="https://avatars.githubusercontent.com/u/8130715?s=48&v=4" height="50" alt="pandas logo">
63+
</a>
64+
</td>
65+
<td align="left">
66+
<strong>Geopandas</strong><br>
67+
Python tools for geographic data.<br>
4168
</td>
4269
</tr>
4370
<tr>
@@ -62,19 +89,46 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
6289
Framework for extracting the data you need from websites.<br>
6390
</td>
6491
</tr>
92+
<tr>
93+
<td align="center">
94+
<a href="https://matplotlib.org/" target="_blank">
95+
<img src="https://avatars.githubusercontent.com/u/215947?s=48&v=4" height="50" alt="scrapy logo">
96+
</a>
97+
</td>
98+
<td align="left">
99+
<strong>Matplotlib</strong><br>
100+
Library for creating static, animated, and interactive visualizations in Python.<br>
101+
</td>
102+
</tr>
103+
<tr>
104+
<td align="center">
105+
<a href="https://pytorch.org/" target="_blank">
106+
<img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" height="50" alt="scrapy logo">
107+
</a>
108+
</td>
109+
<td align="left">
110+
<strong>Torch</strong><br>
111+
Python package for tensor computation and deep neural networks.<br>
112+
</td>
113+
</tr>
65114
</table>
66115

67116
- <a href="https://openpyxl.readthedocs.io/en/stable/">openpyxl</a>
68117
- <a href="https://py7zr.readthedocs.io/en/latest/">py7zr</a>
69118
- <a href="https://pypi.org/project/pyreadstat/">pyreadstat</a>
70119
- <a href="https://tqdm.github.io/">tqdm</a>
71120
- <a href="https://requests.readthedocs.io/en/latest/">requests</a>
121+
- <a href="https://pypi.org/project/appdirs/">appdirs</a>
122+
- <a href="https://pypi.org/project/pyarrow/">pyarrow</a>
123+
- <a href="https://pypi.org/project/deep-translator/">deep_translator</a>
124+
- <a href="https://pypi.org/project/transformers/">transformers</a>
125+
- <a href="https://pypi.org/project/pytest/">pytest</a>
72126

73127
## Installation
74128

75129
**socio4health** can be installed via pip from [PyPI](https://pypi.org/project/socio4health/).
76130

77-
```python
131+
``` CMD
78132
# Install using pip
79133
pip install socio4health
80134
```
@@ -106,6 +160,8 @@ To use the socio4health package, follow these steps:
106160
harmonizer = Harmonizer()
107161
```
108162

163+
For more detailed examples and use cases, please refer to the [socio4health documentation](https://harmonize-tools.github.io/socio4health/).
164+
109165
## Resources
110166

111167
<details>

docs/source/_static/image.png

1.05 KB
Loading

docs/source/getting_started.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Welcome to the Getting Started section of **socio4health**. This guide will walk
66
Installation
77
------------
88

9-
**socio4health** can be installed via pip from `PyPI<https://pypi.org/project/socio4health/>`_. To install the package, run the following command in your terminal:
9+
**socio4health** can be installed via pip from `PyPI <https://pypi.org/project/socio4health/>`_ . To install the package, run the following command in your terminal:
1010

1111
.. code-block:: shell
1212

docs/source/index.rst

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ Welcome to the official documentation site for **socio4health**\! This site serv
5050
Introduction
5151
------------
5252

53-
The Python package **socio4health** is an **extraction**, **transformation**, **loading**, and **AI-assisted query** and **visualization** (ETL-AI QV) tool designed to simplify the process of collecting and merging data from multiple sources into a unified relational database structure, and to visualize or query it using natural language.
53+
The Python package **socio4health** is an **extraction**, **transformation**,and **loading** tool designed to simplify the process of collecting and merging data from multiple sources into a unified database structure.
5454

5555
Features
5656
--------
@@ -68,15 +68,12 @@ Features
6868

6969
* **Load:**
7070

71-
* Consolidate transformed data into a cohesive relational database.
71+
* Consolidate transformed data into a cohesive database.
7272

7373
* **Query:**
7474

7575
* Conduct precise queries and apply transformations to meet specific criteria.
7676

77-
* **AI Query & Visualization:**
78-
79-
* Use natural language input to query data (from values to subsets).
8077

8178
Who should use socio4health?
8279
----------------------------

0 commit comments

Comments
 (0)