Skip to content

Commit f909b10

Browse files
authored
Merge pull request #36 from chdb-io/onDf
- #37 - Add Table class with can init from various ways, and `query`, `to_pandas` functions. For detail see: #34 - Add test code for the `Table` class - Add Arrow, ArrowSteam type output tests in the basic test. - Update README for 3 kinds of usage
2 parents cab766d + 376f952 commit f909b10

File tree

13 files changed

+755
-17
lines changed

13 files changed

+755
-17
lines changed

.github/workflows/build_wheels.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
fail-fast: false
2323
matrix:
2424
os: [ ubuntu-20.04 ]
25-
python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11"]
25+
python-version: [ "3.8", "3.9", "3.10", "3.11"]
2626
# python-version: [ "3.7" ]
2727
env:
2828
RUNNER_OS: ${{ matrix.os }}
@@ -151,7 +151,7 @@ jobs:
151151
fail-fast: false
152152
matrix:
153153
os: [ macos-12 ]
154-
# python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11"]
154+
# python-version: [ "3.8", "3.9", "3.10", "3.11"]
155155
python-version: [ "3.11" ]
156156
env:
157157
RUNNER_OS: ${{ matrix.os }}
@@ -273,7 +273,7 @@ jobs:
273273
fail-fast: false
274274
matrix:
275275
os: [ macos-11 ]
276-
# python-version: [ "3.7", "3.8", "3.9", "3.10"]
276+
# python-version: [ "3.8", "3.9", "3.10"]
277277
python-version: [ "3.11" ]
278278
env:
279279
RUNNER_OS: ${{ matrix.os }}
@@ -350,7 +350,7 @@ jobs:
350350
CIBW_DEBUG: 1
351351
CIBW_BEFORE_BUILD: "pip install -U pip tox pybind11 && bash -x gen_manifest.sh && bash chdb/build.sh"
352352
CIBW_BUILD_VERBOSITY: 3
353-
CIBW_BUILD: "cp37-macosx_x86_64 cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64"
353+
CIBW_BUILD: "cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64"
354354
CIBW_TEST_REQUIRES: "pyarrow pandas"
355355
CIBW_TEST_COMMAND: "cd {project} && make test"
356356
# with:
@@ -368,7 +368,7 @@ jobs:
368368
# CIBW_DEBUG: 1
369369
# CIBW_BEFORE_BUILD: "pip install -U pip tox pybind11 && bash -x gen_manifest.sh && bash chdb/build.sh"
370370
# CIBW_BUILD_VERBOSITY: 3
371-
# CIBW_BUILD: "cp37-macosx_x86_64 cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64 cp311-macosx_x86_64"
371+
# CIBW_BUILD: "cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64 cp311-macosx_x86_64"
372372
# CIBW_TEST_COMMAND: python -c "import chdb; res = chdb.query('select 1112222222,555', 'CSV'); print(res.get_memview().tobytes())"
373373
- name: Keep killall ccache and wait for ccache to finish
374374
if: always()

README-zh.md

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,12 @@ pip install chdb
4141
python3 -m chdb "SELECT 1,'abc'" Pretty
4242
```
4343

44-
目前,chDB 仅支持 `query` 函数,用于执行 SQL 并返回所需格式的数据。
44+
45+
有三种使用 chdb 的方法:“原始文件查询(性能)”、“高级查询(推荐)”和“DB-API”:
46+
<details>
47+
<summary><h4>🗂️ 原始文件查询</h4>(Parquet、CSV、JSON、Arrow、ORC 等 60 多种格式)</summary>
48+
49+
您可以执行 SQL 并返回所需格式的数据。
4550

4651
```python
4752
import chdb
@@ -61,6 +66,40 @@ res = chdb.query('select * from file("data.csv", CSV)', 'CSV'); print(str(res.g
6166
# 更多内容请参见 https://clickhouse.com/docs/en/interfaces/formats
6267
chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
6368
```
69+
</details>
70+
71+
<details>
72+
<summary><h4>🗂️ 高级查询</h4>(Pandas DataFrame、Parquet 文件/字节、Arrow 文件/字节)</summary>
73+
74+
### 查询 Pandas DataFrame
75+
```python
76+
import chdb.dataframe as cdf
77+
import pandas as pd
78+
tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
79+
ret_tbl = tbl.query('select * from __table__')
80+
print(ret_tbl)
81+
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
82+
```
83+
</details>
84+
85+
<details>
86+
<summary><h4>🗂️ Python DB-API 2.0</h4></summary>
87+
88+
```python
89+
import chdb.dbapi as dbapi
90+
print("chdb driver version: {0}".format(dbapi.get_client_info()))
91+
92+
conn1 = dbapi.connect()
93+
cur1 = conn1.cursor()
94+
cur1.execute('select version()')
95+
print("description: ", cur1.description)
96+
print("data: ", cur1.fetchone())
97+
cur1.close()
98+
conn1.close()
99+
```
100+
</details>
101+
102+
更多示例,请参见 [examples](examples)[tests](tests)
64103

65104
## 演示和示例
66105

README.md

Lines changed: 49 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,15 @@ pip install chdb
4343
python3 -m chdb "SELECT 1,'abc'" Pretty
4444
```
4545

46-
Currently, chDB only supports `query` function, which is used to execute SQL and return desired format data.
46+
<br>
47+
48+
### Data Input
49+
The following methods are available to access on-disk and in-memory data formats:
50+
51+
<details>
52+
<summary><h4>🗂️ Query On File</h4> (Parquet, CSV, JSON, Arrow, ORC and 60+)</summary>
53+
54+
You can execute SQL and return desired format data.
4755

4856
```python
4957
import chdb
@@ -63,6 +71,43 @@ res = chdb.query('select * from file("data.csv", CSV)', 'CSV'); print(str(res.g
6371
# See more in https://clickhouse.com/docs/en/interfaces/formats
6472
chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
6573
```
74+
</details>
75+
76+
<details>
77+
<summary><h4>🗂️ Query On Table</h4> (Pandas DataFrame, Parquet file/bytes, Arrow bytes) </summary>
78+
79+
### Query On Pandas DataFrame
80+
```python
81+
import chdb.dataframe as cdf
82+
import pandas as pd
83+
tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
84+
ret_tbl = tbl.query('select * from __table__')
85+
print(ret_tbl)
86+
print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
87+
```
88+
</details>
89+
90+
<details>
91+
<summary><h4>🗂️ Python DB-API 2.0</h4></summary>
92+
93+
```python
94+
import chdb.dbapi as dbapi
95+
print("chdb driver version: {0}".format(dbapi.get_client_info()))
96+
97+
conn1 = dbapi.connect()
98+
cur1 = conn1.cursor()
99+
cur1.execute('select version()')
100+
print("description: ", cur1.description)
101+
print("data: ", cur1.fetchone())
102+
cur1.close()
103+
conn1.close()
104+
```
105+
</details>
106+
107+
108+
For more examples, see [examples](examples) and [tests](tests).
109+
110+
<br>
66111

67112
## Demos and Examples
68113

@@ -79,8 +124,9 @@ chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
79124
## Contributing
80125
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
81126
There are something you can help:
82-
- [ ] Help me with Windows support, I don't know much about Windows toolchain.
83-
- [x] The Python Wrapper just have a `query` function. I want to add more functions to make it more convenient to use. like `toPandas`, `toNumpy` and so on.
127+
- [ ] Help test and report bugs
128+
- [ ] Help improve documentation
129+
- [ ] Help improve code quality and performance
84130

85131
## License
86132
AGPL-v3.0 or Commercial License, see [LICENSE](LICENSE.txt) for more information.

chdb/__init__.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
cwd = os.getcwd()
1111
os.chdir(current_path)
1212
from . import _chdb # noqa
13+
1314
os.chdir(cwd)
1415
engine_version = str(_chdb.query("SELECT version();", "CSV").get_memview().tobytes())[3:-4]
1516
else:
@@ -22,6 +23,7 @@
2223
except: # pragma: no cover
2324
__version__ = "unknown"
2425

26+
2527
# return pyarrow table
2628
def to_arrowTable(res):
2729
"""convert res to arrow table"""
@@ -33,15 +35,17 @@ def to_arrowTable(res):
3335
print(f'ImportError: {e}')
3436
print('Please install pyarrow and pandas via "pip install pyarrow pandas"')
3537
raise ImportError('Failed to import pyarrow or pandas') from None
36-
38+
3739
return pa.RecordBatchFileReader(res.get_memview()).read_all()
3840

41+
3942
# return pandas dataframe
4043
def to_df(r):
4144
""""convert arrow table to Dataframe"""
4245
t = to_arrowTable(r)
4346
return t.to_pandas(use_threads=True)
4447

48+
4549
# wrap _chdb functions
4650
def query(sql, output_format="CSV", **kwargs):
4751
lower_output_format = output_format.lower()

chdb/dataframe/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# try import pyarrow and pandas, if failed, raise ImportError with suggestion
2+
try:
3+
import pyarrow as pa
4+
import pandas as pd
5+
except ImportError as e:
6+
print(f'ImportError: {e}')
7+
print('Please install pyarrow and pandas via "pip install pyarrow pandas"')
8+
raise ImportError('Failed to import pyarrow or pandas') from None
9+
10+
# check if pandas version >= 2.0.0
11+
if pd.__version__[0] < '2':
12+
print('Please upgrade pandas to version 2.0.0 or higher to have better performance')
13+
14+
from .query import *

0 commit comments

Comments
 (0)