Skip to content

Commit d17a7ad

Browse files
committed
Update api-docs.txt
1 parent 7d2a4a5 commit d17a7ad

File tree

1 file changed

+112
-7
lines changed

1 file changed

+112
-7
lines changed

pointblank/data/api-docs.txt

Lines changed: 112 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -43,11 +43,13 @@ Validate(data: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None
4343
----------
4444
data
4545
The table to validate, which could be a DataFrame object, an Ibis table object, a CSV
46-
file path, or a Parquet file path. When providing a CSV or Parquet file path (as a string
47-
or `pathlib.Path` object), the file will be automatically loaded using an available
48-
DataFrame library (Polars or Pandas). Parquet input also supports glob patterns,
49-
directories containing .parquet files, and Spark-style partitioned datasets. Read the
50-
*Supported Input Table Types* section for details on the supported table types.
46+
file path, a Parquet file path, or a database connection string. When providing a CSV or
47+
Parquet file path (as a string or `pathlib.Path` object), the file will be automatically
48+
loaded using an available DataFrame library (Polars or Pandas). Parquet input also supports
49+
glob patterns, directories containing .parquet files, and Spark-style partitioned datasets.
50+
Connection strings enable direct database access via Ibis with optional table specification
51+
using the `::table_name` suffix. Read the *Supported Input Table Types* section for details
52+
on the supported table types.
5153
tbl_name
5254
An optional name to assign to the input table object. If no value is provided, a name will
5355
be generated based on whatever information is available. This table name will be displayed
@@ -120,6 +122,7 @@ Validate(data: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None
120122
- CSV files (string path or `pathlib.Path` object with `.csv` extension)
121123
- Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
122124
extension, or partitioned dataset)
125+
- Database connection strings (URI format with optional table specification)
123126

124127
The table types marked with an asterisk need to be prepared as Ibis tables (with type of
125128
`ibis.expr.types.relations.Table`). Furthermore, the use of `Validate` with such tables requires
@@ -130,6 +133,20 @@ Validate(data: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None
130133
provided. The file will be automatically detected and loaded using the best available DataFrame
131134
library. The loading preference is Polars first, then Pandas as a fallback.
132135

136+
Connection strings follow database URL formats and must also specify a table using the
137+
`::table_name` suffix. Examples include:
138+
139+
```
140+
"duckdb:///path/to/database.ddb::table_name"
141+
"sqlite:///path/to/database.db::table_name"
142+
"postgresql://user:password@localhost:5432/database::table_name"
143+
"mysql://user:password@localhost:3306/database::table_name"
144+
"bigquery://project/dataset::table_name"
145+
"snowflake://user:password@account/database/schema::table_name"
146+
```
147+
148+
When using connection strings, the Ibis library with the appropriate backend driver is required.
149+
133150
Thresholds
134151
----------
135152
The `thresholds=` parameter is used to set the failure-condition levels for all validation
@@ -512,6 +529,33 @@ Validate(data: 'FrameT | Any', tbl_name: 'str | None' = None, label: 'str | None
512529

513530
Both Polars and Pandas handle partitioned datasets natively, so this works seamlessly with
514531
either DataFrame library. The loading preference is Polars first, then Pandas as a fallback.
532+
533+
### Working with Database Connection Strings
534+
535+
The `Validate` class supports database connection strings for direct validation of database
536+
tables. Connection strings must specify a table using the `::table_name` suffix:
537+
538+
```python
539+
# Get path to a DuckDB database file from package data
540+
duckdb_path = pb.get_data_path("game_revenue", "duckdb")
541+
542+
validation_9 = (
543+
pb.Validate(
544+
data=f"duckdb:///{duckdb_path}::game_revenue",
545+
label="DuckDB Game Revenue Validation"
546+
)
547+
.col_exists(["player_id", "session_id", "item_revenue"])
548+
.col_vals_gt(columns="item_revenue", value=0)
549+
.interrogate()
550+
)
551+
552+
validation_9
553+
```
554+
555+
For comprehensive documentation on supported connection string formats, error handling, and
556+
installation requirements, see the [`connect_to_table()`](`pointblank.connect_to_table`)
557+
function. This function handles all the connection logic and provides helpful error messages
558+
when table specifications are missing or backend dependencies are not installed.
515559

516560

517561
Thresholds(warning: 'int | float | bool | None' = None, error: 'int | float | bool | None' = None, critical: 'int | float | bool | None' = None) -> None
@@ -8802,8 +8846,14 @@ preview(data: 'FrameT | Any', columns_subset: 'str | list[str] | Column | None'
88028846
Parameters
88038847
----------
88048848
data
8805-
The table to preview, which could be a DataFrame object or an Ibis table object. Read the
8806-
*Supported Input Table Types* section for details on the supported table types.
8849+
The table to preview, which could be a DataFrame object, an Ibis table object, a CSV
8850+
file path, a Parquet file path, or a database connection string. When providing a CSV or
8851+
Parquet file path (as a string or `pathlib.Path` object), the file will be automatically
8852+
loaded using an available DataFrame library (Polars or Pandas). Parquet input also supports
8853+
glob patterns, directories containing .parquet files, and Spark-style partitioned datasets.
8854+
Connection strings enable direct database access via Ibis with optional table specification
8855+
using the `::table_name` suffix. Read the *Supported Input Table Types* section for details
8856+
on the supported table types.
88078857
columns_subset
88088858
The columns to display in the table, by default `None` (all columns are shown). This can
88098859
be a string, a list of strings, a `Column` object, or a `ColumnSelector` object. The latter
@@ -8854,12 +8904,34 @@ preview(data: 'FrameT | Any', columns_subset: 'str | list[str] | Column | None'
88548904
- PySpark table (`"pyspark"`)*
88558905
- BigQuery table (`"bigquery"`)*
88568906
- Parquet table (`"parquet"`)*
8907+
- CSV files (string path or `pathlib.Path` object with `.csv` extension)
8908+
- Parquet files (string path, `pathlib.Path` object, glob pattern, directory with `.parquet`
8909+
extension, or partitioned dataset)
8910+
- Database connection strings (URI format with optional table specification)
88578911

88588912
The table types marked with an asterisk need to be prepared as Ibis tables (with type of
88598913
`ibis.expr.types.relations.Table`). Furthermore, using `preview()` with these types of tables
88608914
requires the Ibis library (`v9.5.0` or above) to be installed. If the input table is a Polars or
88618915
Pandas DataFrame, the availability of Ibis is not needed.
88628916

8917+
To use a CSV file, ensure that a string or `pathlib.Path` object with a `.csv` extension is
8918+
provided. The file will be automatically detected and loaded using the best available DataFrame
8919+
library. The loading preference is Polars first, then Pandas as a fallback.
8920+
8921+
Connection strings follow database URL formats and must also specify a table using the
8922+
`::table_name` suffix. Examples include:
8923+
8924+
```
8925+
"duckdb:///path/to/database.ddb::table_name"
8926+
"sqlite:///path/to/database.db::table_name"
8927+
"postgresql://user:password@localhost:5432/database::table_name"
8928+
"mysql://user:password@localhost:3306/database::table_name"
8929+
"bigquery://project/dataset::table_name"
8930+
"snowflake://user:password@account/database/schema::table_name"
8931+
```
8932+
8933+
When using connection strings, the Ibis library with the appropriate backend driver is required.
8934+
88638935
Examples
88648936
--------
88658937
It's easy to preview a table using the `preview()` function. Here's an example using the
@@ -8918,6 +8990,39 @@ preview(data: 'FrameT | Any', columns_subset: 'str | list[str] | Column | None'
89188990
columns_subset=pb.col(pb.starts_with("item") | pb.matches("player"))
89198991
)
89208992
```
8993+
8994+
### Working with CSV Files
8995+
8996+
The `preview()` function can directly accept CSV file paths, making it easy to preview data
8997+
stored in CSV files without manual loading:
8998+
8999+
You can also use a Path object to specify the CSV file:
9000+
9001+
### Working with Parquet Files
9002+
9003+
The `preview()` function can directly accept Parquet files and datasets in various formats:
9004+
9005+
You can also use glob patterns and directories:
9006+
9007+
```python
9008+
# Multiple Parquet files with glob patterns
9009+
pb.preview("data/sales_*.parquet")
9010+
9011+
# Directory containing Parquet files
9012+
pb.preview("parquet_data/")
9013+
9014+
# Partitioned Parquet dataset
9015+
pb.preview("sales_data/") # Auto-discovers partition columns
9016+
```
9017+
9018+
### Working with Database Connection Strings
9019+
9020+
The `preview()` function supports database connection strings for direct preview of database
9021+
tables. Connection strings must specify a table using the `::table_name` suffix:
9022+
9023+
For comprehensive documentation on supported connection string formats, error handling, and
9024+
installation requirements, see the [`connect_to_table()`](`pointblank.connect_to_table`)
9025+
function.
89219026

89229027

89239028
col_summary_tbl(data: 'FrameT | Any', tbl_name: 'str | None' = None) -> 'GT'

0 commit comments

Comments
 (0)