|
| 1 | +# Generating SQL |
| 2 | + |
| 3 | +Suppose you want to write Polars syntax and translate it to SQL. |
| 4 | +For example, what's the SQL equivalent to: |
| 5 | + |
| 6 | +```python exec="1" source="above" session="generating-sql" |
| 7 | +import narwhals as nw |
| 8 | +from narwhals.typing import IntoFrameT |
| 9 | + |
| 10 | + |
| 11 | +def avg_monthly_price(df_native: IntoFrameT) -> IntoFrameT: |
| 12 | + return ( |
| 13 | + nw.from_native(df_native) |
| 14 | + .group_by(nw.col("date").dt.truncate("1mo")) |
| 15 | + .agg(nw.col("price").mean()) |
| 16 | + .sort("date") |
| 17 | + .to_native() |
| 18 | + ) |
| 19 | +``` |
| 20 | + |
| 21 | +? |
| 22 | + |
| 23 | +There are several ways to find out. |
| 24 | + |
| 25 | +## Via SQLFrame (most lightweight solution) |
| 26 | + |
| 27 | +The most lightweight solution which does not require any heavy dependencies, nor |
| 28 | +any actual table or dataframe, is with SQLFrame. |
| 29 | + |
| 30 | +```python exec="1" source="above" session="generating-sql" result="sql" |
| 31 | +from sqlframe.standalone import StandaloneSession |
| 32 | + |
| 33 | +session = StandaloneSession.builder.getOrCreate() |
| 34 | +session.catalog.add_table("prices", column_mapping={"date": "date", "price": "float"}) |
| 35 | +df = nw.from_native(session.read.table("prices")) |
| 36 | + |
| 37 | +print(avg_monthly_price(df).sql(dialect="duckdb")) |
| 38 | +``` |
| 39 | + |
| 40 | +Or, to print the SQL code in a different dialect (say, databricks): |
| 41 | + |
| 42 | +```python exec="1" source="above" session="generating-sql" result="sql" |
| 43 | +print(avg_monthly_price(df).sql(dialect="databricks")) |
| 44 | +``` |
| 45 | + |
| 46 | +## Via DuckDB |
| 47 | + |
| 48 | +You can also generate SQL directly from DuckDB. |
| 49 | + |
| 50 | +```python exec="1" source="above" session="generating-sql" result="sql" |
| 51 | +import duckdb |
| 52 | + |
| 53 | +conn = duckdb.connect() |
| 54 | +conn.sql("""CREATE TABLE prices (date DATE, price DOUBLE);""") |
| 55 | + |
| 56 | +df = nw.from_native(conn.table("prices")) |
| 57 | +print(avg_monthly_price(df).sql_query()) |
| 58 | +``` |
| 59 | + |
| 60 | +To make it look a bit prettier, we can pass it to [SQLGlot](https://github.com/tobymao/sqlglot): |
| 61 | + |
| 62 | +```python exec="1" source="above" session="generating-sql" result="sql" |
| 63 | +import sqlglot |
| 64 | + |
| 65 | +print(sqlglot.transpile(avg_monthly_price(df).sql_query(), pretty=True)[0]) |
| 66 | +``` |
| 67 | + |
| 68 | +## Via Ibis |
| 69 | + |
| 70 | +We can also use Ibis to generate SQL: |
| 71 | + |
| 72 | +```python exec="1" source="above" session="generating-sql" result="sql" |
| 73 | +import ibis |
| 74 | + |
| 75 | +t = ibis.table({"date": "date", "price": "double"}, name="prices") |
| 76 | +print(ibis.to_sql(avg_monthly_price(t))) |
| 77 | +``` |
0 commit comments