Skip to content

Commit 0dcda71

Browse files
authored
Merge pull request #275 from awslabs/dev
Bumping to 1.4.0
2 parents 57b6133 + 94994da commit 0dcda71

34 files changed

+2247
-2018
lines changed

.github/workflows/static-checking.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,7 @@ jobs:
3636
run: flake8 setup.py awswrangler testing/test_awswrangler
3737
- name: Pylint Lint
3838
run: pylint -j 0 awswrangler
39+
- name: Black style
40+
run: black --check --line-length 120 --target-version py36 awswrangler testing/test_awswrangler
41+
- name: Imports order check (isort)
42+
run: isort -rc --check-only awswrangler testing/test_awswrangler

.isort.cfg

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[settings]
2+
multi_line_output=3
3+
include_trailing_comma=True
4+
force_grid_wrap=0
5+
use_parentheses=True
6+
line_length=120

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
![AWS Data Wrangler](docs/source/_static/logo2.png?raw=true "AWS Data Wrangler")
55

6-
[![Release](https://img.shields.io/badge/release-1.3.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
6+
[![Release](https://img.shields.io/badge/release-1.4.0-brightgreen.svg)](https://pypi.org/project/awswrangler/)
77
[![Python Version](https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8-brightgreen.svg)](https://anaconda.org/conda-forge/awswrangler)
88
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
99
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
@@ -63,23 +63,23 @@ df = wr.db.read_sql_query("SELECT * FROM external_schema.my_table", con=engine)
6363
- [EMR](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#emr)
6464
- [From source](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#from-source)
6565
- [**Tutorials**](https://github.com/awslabs/aws-data-wrangler/tree/master/tutorials)
66-
- [01 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/01%20-%20Introduction.ipynb)
67-
- [02 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/02%20-%20Sessions.ipynb)
68-
- [03 - Amazon S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/03%20-%20Amazon%20S3.ipynb)
69-
- [04 - Parquet Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/04%20-%20Parquet%20Datasets.ipynb)
70-
- [05 - Glue Catalog](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/05%20-%20Glue%20Catalog.ipynb)
71-
- [06 - Amazon Athena](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/06%20-%20Amazon%20Athena.ipynb)
72-
- [07 - Databases (Redshift, MySQL and PostgreSQL)](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/07%20-%20Redshift%2C%20MySQL%2C%20PostgreSQL.ipynb)
73-
- [08 - Redshift - Copy & Unload.ipynb](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/08%20-%20Redshift%20-%20Copy%20%26%20Unload.ipynb)
74-
- [09 - Redshift - Append, Overwrite and Upsert](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/09%20-%20Redshift%20-%20Append%2C%20Overwrite%2C%20Upsert.ipynb)
75-
- [10 - Parquet Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/10%20-%20Parquet%20Crawler.ipynb)
76-
- [11 - CSV Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/11%20-%20CSV%20Datasets.ipynb)
77-
- [12 - CSV Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/12%20-%20CSV%20Crawler.ipynb)
78-
- [13 - Merging Datasets on S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/13%20-%20Merging%20Datasets%20on%20S3.ipynb)
79-
- [14 - Schema Evolution](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/14%20-%20Schema%20Evolution.ipynb)
80-
- [15 - EMR](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/15%20-%20EMR.ipynb)
81-
- [16 - EMR & Docker](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/16%20-%20EMR%20%26%20Docker.ipynb)
82-
- [17 - Partition Projection](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/17%20-%20Partition%20Projection.ipynb)
66+
- [001 - Introduction](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/001%20-%20Introduction.ipynb)
67+
- [002 - Sessions](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/002%20-%20Sessions.ipynb)
68+
- [003 - Amazon S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/003%20-%20Amazon%20S3.ipynb)
69+
- [004 - Parquet Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/004%20-%20Parquet%20Datasets.ipynb)
70+
- [005 - Glue Catalog](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/005%20-%20Glue%20Catalog.ipynb)
71+
- [006 - Amazon Athena](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/006%20-%20Amazon%20Athena.ipynb)
72+
- [007 - Databases (Redshift, MySQL and PostgreSQL)](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/007%20-%20Redshift%2C%20MySQL%2C%20PostgreSQL.ipynb)
73+
- [008 - Redshift - Copy & Unload.ipynb](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/008%20-%20Redshift%20-%20Copy%20%26%20Unload.ipynb)
74+
- [009 - Redshift - Append, Overwrite and Upsert](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/009%20-%20Redshift%20-%20Append%2C%20Overwrite%2C%20Upsert.ipynb)
75+
- [010 - Parquet Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/010%20-%20Parquet%20Crawler.ipynb)
76+
- [011 - CSV Datasets](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/011%20-%20CSV%20Datasets.ipynb)
77+
- [012 - CSV Crawler](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/012%20-%20CSV%20Crawler.ipynb)
78+
- [013 - Merging Datasets on S3](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/013%20-%20Merging%20Datasets%20on%20S3.ipynb)
79+
- [014 - Schema Evolution](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/014%20-%20Schema%20Evolution.ipynb)
80+
- [015 - EMR](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/015%20-%20EMR.ipynb)
81+
- [016 - EMR & Docker](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/016%20-%20EMR%20%26%20Docker.ipynb)
82+
- [017 - Partition Projection](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/017%20-%20Partition%20Projection.ipynb)
8383
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/latest/api.html)
8484
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#amazon-s3)
8585
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#aws-glue-catalog)

awswrangler/__metadata__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,5 @@
77

88
__title__ = "awswrangler"
99
__description__ = "Pandas on AWS."
10-
__version__ = "1.3.0"
10+
__version__ = "1.4.0"
1111
__license__ = "Apache License 2.0"

awswrangler/_utils.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -203,10 +203,10 @@ def get_region_from_session(boto3_session: Optional[boto3.Session] = None, defau
203203
) # pragma: no cover
204204

205205

206-
def extract_partitions_from_paths(
206+
def extract_partitions_metadata_from_paths(
207207
path: str, paths: List[str]
208208
) -> Tuple[Optional[Dict[str, str]], Optional[Dict[str, List[str]]]]:
209-
"""Extract partitions from Amazon S3 paths."""
209+
"""Extract partitions metadata from Amazon S3 paths."""
210210
path = path if path.endswith("/") else f"{path}/"
211211
partitions_types: Dict[str, str] = {}
212212
partitions_values: Dict[str, List[str]] = {}
@@ -217,7 +217,7 @@ def extract_partitions_from_paths(
217217
) # pragma: no cover
218218
path_wo_filename: str = p.rpartition("/")[0] + "/"
219219
if path_wo_filename not in partitions_values:
220-
path_wo_prefix: str = p.replace(f"{path}/", "")
220+
path_wo_prefix: str = path_wo_filename.replace(f"{path}/", "")
221221
dirs: List[str] = [x for x in path_wo_prefix.split("/") if (x != "") and ("=" in x)]
222222
if dirs:
223223
values_tups: List[Tuple[str, str]] = [tuple(x.split("=")[:2]) for x in dirs] # type: ignore
@@ -238,6 +238,23 @@ def extract_partitions_from_paths(
238238
return partitions_types, partitions_values
239239

240240

241+
def extract_partitions_from_path(path_root: str, path: str) -> Dict[str, Any]:
242+
"""Extract partitions values and names from Amazon S3 path."""
243+
path_root = path_root if path_root.endswith("/") else f"{path_root}/"
244+
if path_root not in path:
245+
raise exceptions.InvalidArgumentValue(
246+
f"Object {path} is not under the root path ({path_root})."
247+
) # pragma: no cover
248+
path_wo_filename: str = path.rpartition("/")[0] + "/"
249+
path_wo_prefix: str = path_wo_filename.replace(f"{path_root}/", "")
250+
dirs: List[str] = [x for x in path_wo_prefix.split("/") if (x != "") and ("=" in x)]
251+
if not dirs:
252+
return {} # pragma: no cover
253+
values_tups: List[Tuple[str, str]] = [tuple(x.split("=")[:2]) for x in dirs] # type: ignore
254+
values_dics: Dict[str, str] = dict(values_tups)
255+
return values_dics
256+
257+
241258
def list_sampling(lst: List[Any], sampling: float) -> List[Any]:
242259
"""Random List sampling."""
243260
if sampling > 1.0 or sampling <= 0.0: # pragma: no cover

0 commit comments

Comments
 (0)