Data source Ingress (for `Cosmograph.get_buffered_arrow_table` or `ingress` arg of `cosmo`)

See also [Issue: get_buffered_arrow_table can be slow with big dat: Possible improvements](https://github.com/cosmograph-org/py_cosmograph/issues/46)

## Extended Data Ingress Support

Beyond caching optimizations, we should implement flexible **data adapters** (also called **data transformers** or **ingress handlers**) that accept multiple input formats and normalize them to the widget's requirements.

Currently, the widget assumes Pandas DataFrames, but users may have data in various formats. Supporting multiple input types reduces boilerplate, improves usability, and enables performance optimizations when users can provide data closer to the target format.

### Proposed Input Format Support

The widget should accept data in order of increasing efficiency:

1. **Pandas DataFrame** (current support) - Most common, but requires full conversion pipeline
2. **Apache Arrow Table (`pa.Table`)** - Skip `from_pandas()` conversion, directly serialize to IPC
3. **Arrow RecordBatch** - Similar efficiency to `pa.Table`
4. **Pre-serialized Arrow IPC bytes** - Maximum efficiency, zero conversion overhead when data is already in target format
5. **File paths/URLs** - Convenient for large datasets: `"data.parquet"`, `"s3://bucket/data.arrow"`
6. **DuckDB queries** - Enable direct graph construction from analytical queries
7. **Polars DataFrame** - Native Arrow interop, efficient conversion

### Implementation Approach

The data adapter should:
- **Type-dispatch** based on input: check type and route to appropriate conversion path
- **Zero-copy where possible**: leverage Arrow's zero-copy capabilities between formats
- **Lazy evaluation**: for file paths/queries, only load data when needed
- **Format detection**: infer format from file extensions or content sniffing

### Benefits

**Performance**: Users working with Arrow-native formats (Parquet, Feather, Arrow IPC) or databases that support Arrow (DuckDB, BigQuery) can bypass expensive conversions entirely.

**Ergonomics**: Reduces user code from:
```python
# Current - user must handle conversion
import pandas as pd
df = pd.read_parquet("graph_data.parquet")
widget.points = df
```

To:
```python
# Proposed - direct path specification
widget.points = "graph_data.parquet"  # or pa.Table, or bytes, etc.
```

**Scalability**: Pre-serialized bytes input enables streaming architectures where data is prepared/cached separately from visualization, critical for large-scale deployments.

---

**Note from Thor Whalen**: I (Thor Whalen) can implement these data ingress transformers to handle the various input formats and routing logic. This is a common pattern that significantly improves library flexibility and user experience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data source Ingress (for `Cosmograph.get_buffered_arrow_table` or `ingress` arg of `cosmo`) #47

Extended Data Ingress Support

Proposed Input Format Support

Implementation Approach

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Data source Ingress (for Cosmograph.get_buffered_arrow_table or ingress arg of cosmo) #47

Description

Extended Data Ingress Support

Proposed Input Format Support

Implementation Approach

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Data source Ingress (for `Cosmograph.get_buffered_arrow_table` or `ingress` arg of `cosmo`) #47