Skip to content

Commit 025218f

Browse files
committed
Update README with new features and dependencies
Revised the overview to reflect AI-assisted classification and harmonized datasets. Added support for dask, geospatial, and fixed-width files. Expanded the dependencies section to include Dask, Geopandas, Matplotlib, Torch, and several Python packages. Updated installation instructions and added a link to the documentation for more examples.
1 parent bb8b2bf commit 025218f

File tree

1 file changed

+56
-7
lines changed

1 file changed

+56
-7
lines changed

README.md

Lines changed: 56 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,20 +22,29 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
2222

2323
## Overview
2424
<p style="font-family: Arial, sans-serif; font-size: 14px;">
25-
Package socio4health is an extraction, transformation, loading (ETL), and AI-assisted query and visualization (AI QV) tool designed to simplify the intricate process of collecting and merging data 📊 from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a unified relational database structure.
25+
Package socio4health is an extraction, transformation, loading (ETL), and AI-assisted classification tool designed to simplify the intricate process of collecting and merging data from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a harmonized dataset.
2626
</p>
2727

2828
- Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
29-
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, and compressed files, ensuring versatility in sourcing information.
30-
- Consolidating extracted data into a pandas DataFrame.
31-
- Consolidating transformed data into a cohesive relational database.
32-
- Conduct precise queries and apply transformations to meet specific criteria.
29+
- Support for various data formats, including `.csv`, `.xlsx`, `.xls`, `.txt`, `.sav`, fixed-width files and geospatial files, ensuring versatility in sourcing information.
30+
- Consolidating extracted data into a pandas (or dask) DataFrame.
3331

3432

3533

3634
## Dependencies
3735

3836
<table>
37+
<tr>
38+
<td align="center">
39+
<a href="https://www.dask.org/" target="_blank">
40+
<img src="https://avatars.githubusercontent.com/u/17131925?s=200&v=4" height="50" alt="pandas logo">
41+
</a>
42+
</td>
43+
<td align="left">
44+
<strong>Dask</strong><br>
45+
Dask is a flexible parallel computing library for analytics.<br>
46+
</td>
47+
</tr>
3948
<tr>
4049
<td align="center">
4150
<a href="https://pandas.pydata.org/" target="_blank">
@@ -44,7 +53,18 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
4453
</td>
4554
<td align="left">
4655
<strong>Pandas</strong><br>
47-
Pandas is a fast, powerful, flexible, and easy-to-use open source data analysis and manipulation tool.<br>
56+
Pandas is a well-known open source data analysis and manipulation tool.<br>
57+
</td>
58+
</tr>
59+
<tr>
60+
<td align="center">
61+
<a href="https://geopandas.org/" target="_blank">
62+
<img src="https://avatars.githubusercontent.com/u/8130715?s=48&v=4" height="50" alt="pandas logo">
63+
</a>
64+
</td>
65+
<td align="left">
66+
<strong>Geopandas</strong><br>
67+
Python tools for geographic data.<br>
4868
</td>
4969
</tr>
5070
<tr>
@@ -69,19 +89,46 @@ contributors](https://img.shields.io/github/contributors/harmonize-tools/socio4h
6989
Framework for extracting the data you need from websites.<br>
7090
</td>
7191
</tr>
92+
<tr>
93+
<td align="center">
94+
<a href="https://matplotlib.org/" target="_blank">
95+
<img src="https://avatars.githubusercontent.com/u/215947?s=48&v=4" height="50" alt="scrapy logo">
96+
</a>
97+
</td>
98+
<td align="left">
99+
<strong>Matplotlib</strong><br>
100+
Library for creating static, animated, and interactive visualizations in Python.<br>
101+
</td>
102+
</tr>
103+
<tr>
104+
<td align="center">
105+
<a href="https://pytorch.org/" target="_blank">
106+
<img src="https://avatars.githubusercontent.com/u/21003710?s=48&v=4" height="50" alt="scrapy logo">
107+
</a>
108+
</td>
109+
<td align="left">
110+
<strong>Torch</strong><br>
111+
Python package for tensor computation and deep neural networks.<br>
112+
</td>
113+
</tr>
72114
</table>
73115

74116
- <a href="https://openpyxl.readthedocs.io/en/stable/">openpyxl</a>
75117
- <a href="https://py7zr.readthedocs.io/en/latest/">py7zr</a>
76118
- <a href="https://pypi.org/project/pyreadstat/">pyreadstat</a>
77119
- <a href="https://tqdm.github.io/">tqdm</a>
78120
- <a href="https://requests.readthedocs.io/en/latest/">requests</a>
121+
- <a href="https://pypi.org/project/appdirs/">appdirs</a>
122+
- <a href="https://pypi.org/project/pyarrow/">pyarrow</a>
123+
- <a href="https://pypi.org/project/deep-translator/">deep_translator</a>
124+
- <a href="https://pypi.org/project/transformers/">transformers</a>
125+
- <a href="https://pypi.org/project/pytest/">pytest</a>
79126

80127
## Installation
81128

82129
**socio4health** can be installed via pip from [PyPI](https://pypi.org/project/socio4health/).
83130

84-
```python
131+
``` CMD
85132
# Install using pip
86133
pip install socio4health
87134
```
@@ -113,6 +160,8 @@ To use the socio4health package, follow these steps:
113160
harmonizer = Harmonizer()
114161
```
115162

163+
For more detailed examples and use cases, please refer to the [socio4health documentation](https://harmonize-tools.github.io/socio4health/).
164+
116165
## Resources
117166

118167
<details>

0 commit comments

Comments
 (0)