Python connector: add first-class table handles and object-based Arrow/CDF read APIs

## Summary

The Python connector currently makes users construct `"<profile>#<share>.<schema>.<table>"` strings for common read paths. That works, but it is awkward, easy to get wrong, and makes object-oriented workflows difficult.

Arrow-native consumption is also underexposed. This makes integration with Arrow consumers such as DuckDB less direct than it should be, and it forces users toward eager materialization even when they want batch-oriented reads.

The same problem exists for change data feed: CDF is still only exposed through legacy free functions, so it does not participate in the new object model.

## Motivation

Today, a representative Python workflow looks like this:

```python
profile_file = "recipient.share"
table_url = profile_file + "#share.schema.table"
data = delta_sharing.load_as_pandas(table_url, limit=10)
```

Pain points:

- Users must manually build and parse `table_url` strings.
- The API shape does not reflect the underlying concepts already present in the connector (`SharingClient`, `Table`, snapshots).
- Arrow-native use cases are not first-class.
- Lazy batch-oriented consumption for engines like DuckDB is not easy to discover.
- CDF is disconnected from the new table-oriented object model.

## Proposal

Add an additive object-based API alongside the existing URL-based API.

### Snapshot surface

```python
client = delta_sharing.SharingClient("recipient.share")
table = client.table("share.schema.table")

pdf = table.snapshot(limit=10).to_pandas()
arrow_table = table.snapshot(limit=10).to_arrow()
batches = table.snapshot(limit=10).to_record_batches()
reader = table.snapshot(limit=10).to_record_batch_reader()
```

Also add a URL-based Arrow helper for parity:

```python
arrow_table = delta_sharing.load_as_arrow("recipient.share#share.schema.table", limit=10)
```

### CDF surface

```python
client = delta_sharing.SharingClient("recipient.share")
table = client.table("share.schema.table")

changes = table.changes(starting_version=5)
pdf = changes.to_pandas()
arrow_table = changes.to_arrow()
reader = changes.to_record_batch_reader()
```

### Design goals

- Keep the existing URL-based APIs working unchanged.
- Make the new API additive, not a replacement.
- Keep query configuration on `snapshot(...)` and `changes(...)`, with `to_*()` methods acting as materializers.
- Support both eager Arrow materialization and lazy Arrow batch consumption.
- Make it easy for engines like DuckDB to consume a `RecordBatchReader` directly.
- Bring CDF into the same object model without changing legacy CDF semantics.

## Compatibility requirements

This should not disrupt existing users.

- `load_as_pandas(...)` remains supported.
- `load_table_changes_as_pandas(...)` remains supported.
- The `"<profile>#<share>.<schema>.<table>"` format remains supported.
- New examples should demonstrate the object-based API.
- Existing syntax should remain documented as a compatibility path.

## Implementation notes

The implementation should make pandas an adapter over shared reader logic rather than the only primary surface.

In particular:

- `SharingClient.table("share.schema.table")` should return a first-class table handle.
- `table.snapshot(...)` should configure snapshot reads.
- `to_arrow()`, `to_record_batches()`, and `to_record_batch_reader()` should share a common Arrow read path.
- `table.changes(...)` should mirror `table.snapshot(...)` and expose the same materializers.
- Legacy CDF behavior should be preserved: only use delta format when explicitly requested.

## Docs and examples

If we proceed, the PR should include:

- Python README updates for the new table-handle, snapshot, Arrow, and CDF APIs.
- Example updates showing snapshot-oriented pandas and Arrow syntax.
- A new Arrow quickstart that demonstrates `to_arrow`, `to_record_batches`, `to_record_batch_reader`, and DuckDB integration.
- Any extra example dependency requirements, such as `duckdb`, documented explicitly.

## Validation

The PR should include:

- Unit tests for Arrow table reads.
- Unit tests for lazy `RecordBatch` and `RecordBatchReader` reads.
- A regression test asserting the legacy `load_as_pandas(...)` result matches the new table-handle `snapshot(...).to_pandas(...)` result for the same table.
- CDF tests covering `table.changes(...).to_pandas()`, `to_arrow()`, `to_record_batches()`, and `to_record_batch_reader()`.

## Open questions

Is `client.table("share.schema.table")` the right naming, and is `table.changes(...)` the right extension point for object-based CDF?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python connector: add first-class table handles and object-based Arrow/CDF read APIs #860

Summary

Motivation

Proposal

Snapshot surface

CDF surface

Design goals

Compatibility requirements

Implementation notes

Docs and examples

Validation

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Python connector: add first-class table handles and object-based Arrow/CDF read APIs #860

Description

Summary

Motivation

Proposal

Snapshot surface

CDF surface

Design goals

Compatibility requirements

Implementation notes

Docs and examples

Validation

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions