Skip to content

Conversation

@littleKitchen
Copy link

Summary

This PR implements the improvements outlined in #1398 to enhance the Jupyter notebook experience for Ballista.

Implementation Checklist

All items from the issue have been implemented:

  • Add example .ipynb notebooks to python/examples/

    • getting_started.ipynb - Basic connection and queries
    • dataframe_api.ipynb - DataFrame transformations
    • distributed_queries.ipynb - Multi-stage query examples
  • Document notebook support in Python README

    • Added comprehensive Jupyter section with examples for magic commands, HTML rendering, and plan visualization
  • Create ballista.jupyter module with magic commands

    • Full implementation with BallistaMagics class
    • Graceful fallback when IPython is not available
  • Add %ballista connect/status/tables/schema line magics

    • connect: Connect to Ballista cluster
    • status: Show connection status
    • tables: List registered tables
    • schema: Show table schema
    • disconnect: Disconnect from cluster
    • history: Show query history
  • Add %%sql cell magic

    • Line magic for single-line queries (%sql SELECT ...)
    • Cell magic for multi-line queries (%%sql)
    • Variable assignment support (%%sql my_result)
    • --no-display and --limit N options
  • Add explain_visual() method for query plan rendering

    • Generates DOT/SVG visualization of execution plans
    • Supports Jupyter _repr_html_ for inline display
    • Fallback HTML representation when graphviz is not installed
    • .save() method for exporting to files
  • Add progress indicator support for long-running queries

    • collect_with_progress() method on DataFrame
    • Callback support for custom progress handling
    • Jupyter-aware progress display
  • Consider JupySQL integration

    • Documented as alternative in README for users who prefer the JupySQL ecosystem

Additional Improvements

  • ExecutionPlanVisualization class for plan rendering with DOT/SVG conversion
  • tables() method on BallistaSessionContext for listing registered tables
  • Optional jupyter dependency group in pyproject.toml
  • Comprehensive test coverage with 45 tests passing

Usage Examples

# Load the extension
%load_ext ballista.jupyter

# Connect to a Ballista cluster
%ballista connect df://localhost:50050

# Execute SQL queries
%sql SELECT COUNT(*) FROM orders

%%sql my_result
SELECT customer_id, SUM(amount) as total
FROM orders
GROUP BY customer_id
ORDER BY total DESC
LIMIT 10

# Visualize execution plan
df.explain_visual()

# Track progress on long queries
batches = df.collect_with_progress()

Testing

All 45 tests pass:

  • Existing tests: 6 passed
  • New Jupyter module tests: 20 passed
  • New notebook features tests: 19 passed

Closes #1398

…s and examples

This PR implements all items from the checklist in issue apache#1398:

## Implementation Checklist

- [x] Add example .ipynb notebooks to python/examples/
  - getting_started.ipynb - Basic connection and queries
  - dataframe_api.ipynb - DataFrame transformations
  - distributed_queries.ipynb - Multi-stage query examples

- [x] Document notebook support in Python README
  - Added comprehensive Jupyter section with examples

- [x] Create ballista.jupyter module with magic commands
  - Full implementation with BallistaMagics class

- [x] Add %ballista connect/status/tables/schema line magics
  - connect: Connect to Ballista cluster
  - status: Show connection status
  - tables: List registered tables
  - schema: Show table schema
  - disconnect: Disconnect from cluster
  - history: Show query history

- [x] Add %%sql cell magic
  - Line magic for single-line queries
  - Cell magic for multi-line queries
  - Variable assignment support
  - --no-display and --limit options

- [x] Add explain_visual() method for query plan rendering
  - Generates DOT/SVG visualization
  - Supports Jupyter _repr_html_
  - Fallback when graphviz not installed

- [x] Add progress indicator support for long-running queries
  - collect_with_progress() method
  - Callback support for custom progress handling
  - Jupyter-aware display

- [x] Consider JupySQL integration
  - Documented as alternative in README

## Additional Features

- ExecutionPlanVisualization class for plan rendering
- tables() method on BallistaSessionContext
- Optional jupyter dependency in pyproject.toml
- Comprehensive test coverage (45 tests passing)

Closes apache#1398
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Jupyter notebook support with SQL magic commands and examples

1 participant