chdb-io
diff --git a/‎.github/workflows/build_wheels.yml‎
Lines changed: 5 additions & 5 deletions b/‎.github/workflows/build_wheels.yml‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎README-zh.md‎
Lines changed: 40 additions & 1 deletion b/‎README-zh.md‎
Lines changed: 40 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 49 additions & 3 deletions b/‎README.md‎
Lines changed: 49 additions & 3 deletions
diff --git a/‎chdb/__init__.py‎
Lines changed: 5 additions & 1 deletion b/‎chdb/__init__.py‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎chdb/dataframe/__init__.py‎
Lines changed: 14 additions & 0 deletions b/‎chdb/dataframe/__init__.py‎
Lines changed: 14 additions & 0 deletions
@@ -22,7 +22,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [ ubuntu-20.04 ]
-        python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11"]
+        python-version: [ "3.8", "3.9", "3.10", "3.11"]
         # python-version: [ "3.7" ]
     env:
       RUNNER_OS: ${{ matrix.os }}
@@ -151,7 +151,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [ macos-12 ]
-        # python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11"]
+        # python-version: [ "3.8", "3.9", "3.10", "3.11"]
         python-version: [ "3.11" ]
     env:
       RUNNER_OS: ${{ matrix.os }}
@@ -273,7 +273,7 @@ jobs:
       fail-fast: false
       matrix:
         os: [ macos-11 ]
-        # python-version: [ "3.7", "3.8", "3.9", "3.10"]
+        # python-version: [ "3.8", "3.9", "3.10"]
         python-version: [ "3.11" ]
     env:
       RUNNER_OS: ${{ matrix.os }}
@@ -350,7 +350,7 @@ jobs:
           CIBW_DEBUG: 1
           CIBW_BEFORE_BUILD: "pip install -U pip tox pybind11 && bash -x gen_manifest.sh && bash chdb/build.sh"
           CIBW_BUILD_VERBOSITY: 3
-          CIBW_BUILD: "cp37-macosx_x86_64 cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64"
+          CIBW_BUILD: "cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64"
           CIBW_TEST_REQUIRES: "pyarrow pandas"
           CIBW_TEST_COMMAND: "cd {project} && make test"
         # with:
@@ -368,7 +368,7 @@ jobs:
       #     CIBW_DEBUG: 1
       #     CIBW_BEFORE_BUILD: "pip install -U pip tox pybind11 && bash -x gen_manifest.sh && bash chdb/build.sh"
       #     CIBW_BUILD_VERBOSITY: 3
-      #     CIBW_BUILD: "cp37-macosx_x86_64 cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64 cp311-macosx_x86_64"
+      #     CIBW_BUILD: "cp38-macosx_x86_64 cp39-macosx_x86_64 cp310-macosx_x86_64 cp311-macosx_x86_64"
       #     CIBW_TEST_COMMAND: python -c "import chdb; res = chdb.query('select 1112222222,555', 'CSV'); print(res.get_memview().tobytes())"
       - name: Keep killall ccache and wait for ccache to finish
         if: always()
 
@@ -41,7 +41,12 @@ pip install chdb
 python3 -m chdb "SELECT 1,'abc'" Pretty
 ```
 
-目前，chDB 仅支持 `query` 函数，用于执行 SQL 并返回所需格式的数据。
+
+有三种使用 chdb 的方法：“原始文件查询（性能）”、“高级查询（推荐）”和“DB-API”：
+<details>
+    <summary><h4>🗂️ 原始文件查询</h4>（Parquet、CSV、JSON、Arrow、ORC 等 60 多种格式）</summary>
+
+您可以执行 SQL 并返回所需格式的数据。
 
 ```python
 import chdb
@@ -61,6 +66,40 @@ res = chdb.query('select * from file("data.csv", CSV)', 'CSV');  print(str(res.g
 # 更多内容请参见 https://clickhouse.com/docs/en/interfaces/formats
 chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
 ```
+</details>
+
+<details>
+    <summary><h4>🗂️ 高级查询</h4>（Pandas DataFrame、Parquet 文件/字节、Arrow 文件/字节）</summary>
+
+### 查询 Pandas DataFrame
+```python
+import chdb.dataframe as cdf
+import pandas as pd
+tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
+ret_tbl = tbl.query('select * from __table__')
+print(ret_tbl)
+print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
+```
+</details>
+
+<details>
+    <summary><h4>🗂️ Python DB-API 2.0</h4></summary>
+
+```python
+import chdb.dbapi as dbapi
+print("chdb driver version: {0}".format(dbapi.get_client_info()))
+
+conn1 = dbapi.connect()
+cur1 = conn1.cursor()
+cur1.execute('select version()')
+print("description: ", cur1.description)
+print("data: ", cur1.fetchone())
+cur1.close()
+conn1.close()
+```
+</details>
+
+更多示例，请参见 [examples](examples) 和 [tests](tests)。
 
 ## 演示和示例
 
 
@@ -43,7 +43,15 @@ pip install chdb
 python3 -m chdb "SELECT 1,'abc'" Pretty
 ```
 
-Currently, chDB only supports `query` function, which is used to execute SQL and return desired format data.
+<br>
+
+### Data Input
+The following methods are available to access on-disk and in-memory data formats:
+
+<details>
+    <summary><h4>🗂️ Query On File</h4> (Parquet, CSV, JSON, Arrow, ORC and 60+)</summary>
+
+You can execute SQL and return desired format data.
 
 ```python
 import chdb
@@ -63,6 +71,43 @@ res = chdb.query('select * from file("data.csv", CSV)', 'CSV');  print(str(res.g
 # See more in https://clickhouse.com/docs/en/interfaces/formats
 chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
 ```
+</details>
+
+<details>
+    <summary><h4>🗂️ Query On Table</h4> (Pandas DataFrame, Parquet file/bytes, Arrow bytes) </summary>
+
+### Query On Pandas DataFrame
+```python
+import chdb.dataframe as cdf
+import pandas as pd
+tbl = cdf.Table(dataframe=pd.DataFrame({'a': [1, 2, 3], 'b': ['a', 'b', 'c']}))
+ret_tbl = tbl.query('select * from __table__')
+print(ret_tbl)
+print(ret_tbl.query('select b, sum(a) from __table__ group by b'))
+```
+</details>
+
+<details>
+    <summary><h4>🗂️ Python DB-API 2.0</h4></summary>
+
+```python
+import chdb.dbapi as dbapi
+print("chdb driver version: {0}".format(dbapi.get_client_info()))
+
+conn1 = dbapi.connect()
+cur1 = conn1.cursor()
+cur1.execute('select version()')
+print("description: ", cur1.description)
+print("data: ", cur1.fetchone())
+cur1.close()
+conn1.close()
+```
+</details>
+
+
+For more examples, see [examples](examples) and [tests](tests).
+
+<br>
 
 ## Demos and Examples
 
@@ -79,8 +124,9 @@ chdb.query('select * from file("data.parquet", Parquet)', 'Dataframe')
 ## Contributing
 Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are **greatly appreciated**.
 There are something you can help:
-- [ ] Help me with Windows support, I don't know much about Windows toolchain.
-- [x] The Python Wrapper just have a `query` function. I want to add more functions to make it more convenient to use. like `toPandas`, `toNumpy` and so on.
+- [ ] Help test and report bugs
+- [ ] Help improve documentation
+- [ ] Help improve code quality and performance
 
 ## License
 AGPL-v3.0 or Commercial License, see [LICENSE](LICENSE.txt) for more information.
 
@@ -10,6 +10,7 @@
     cwd = os.getcwd()
     os.chdir(current_path)
     from . import _chdb  # noqa
+
     os.chdir(cwd)
     engine_version = str(_chdb.query("SELECT version();", "CSV").get_memview().tobytes())[3:-4]
 else:
@@ -22,6 +23,7 @@
 except:  # pragma: no cover
     __version__ = "unknown"
 
+
 # return pyarrow table
 def to_arrowTable(res):
     """convert res to arrow table"""
@@ -33,15 +35,17 @@ def to_arrowTable(res):
         print(f'ImportError: {e}')
         print('Please install pyarrow and pandas via "pip install pyarrow pandas"')
         raise ImportError('Failed to import pyarrow or pandas') from None
-        
+
     return pa.RecordBatchFileReader(res.get_memview()).read_all()
 
+
 # return pandas dataframe
 def to_df(r):
     """"convert arrow table to Dataframe"""
     t = to_arrowTable(r)
     return t.to_pandas(use_threads=True)
 
+
 # wrap _chdb functions
 def query(sql, output_format="CSV", **kwargs):
     lower_output_format = output_format.lower()
 
@@ -0,0 +1,14 @@
+# try import pyarrow and pandas, if failed, raise ImportError with suggestion
+try:
+    import pyarrow as pa
+    import pandas as pd
+except ImportError as e:
+    print(f'ImportError: {e}')
+    print('Please install pyarrow and pandas via "pip install pyarrow pandas"')
+    raise ImportError('Failed to import pyarrow or pandas') from None
+
+# check if pandas version >= 2.0.0
+if pd.__version__[0] < '2':
+    print('Please upgrade pandas to version 2.0.0 or higher to have better performance')
+
+from .query import *