diff --git a/docs/StardustDocs/d.tree b/docs/StardustDocs/d.tree index bfe90160fd..e9ebd5d9f9 100644 --- a/docs/StardustDocs/d.tree +++ b/docs/StardustDocs/d.tree @@ -12,6 +12,7 @@ + diff --git a/docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md b/docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md new file mode 100644 index 0000000000..0a2dff8872 --- /dev/null +++ b/docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md @@ -0,0 +1,225 @@ +# Kotlin DataFrame for SQL & Backend Developers + + +Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook. + + + +Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe! + + + +Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook. + + +This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts. + +We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook. + +It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem. + +If you plan to work on a Gradle project without a notebook, we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*). +This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline. + + + +## Quick Setup + +To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code: + +```kotlin +%useLatestDescriptors +%use dataframe +``` + +This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame +rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe). + +--- + +## 1. What is a DataFrame? + +If you’re used to SQL, a **DataFrame** is conceptually like a **table**: + +- **Rows**: ordered records of data +- **Columns**: named, typed fields +- **Schema**: a mapping of column names to types + +Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening. + +Unlike a relational DB table: + +- A DataFrame **lives in memory** — there’s no storage engine or transaction log +- It’s **immutable** — each operation produces a *new* DataFrame +- There is **no concept of foreign keys or relations** between DataFrames +- It can be created from *any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md), in-memory objects + +--- + +## 2. Reading Data From SQL + +Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis. + +| Approach | Example | +|------------------------------------|---------| +| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` | +| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` | +| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` | +| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` | + +```kotlin +import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig + +val dbConfig = DbConnectionConfig( + url = "jdbc:postgresql://localhost:5432/mydb", + user = "postgres", + password = "secret" +) + +// Table +val customers = DataFrame.readSqlTable(dbConfig, "customers") + +// Query +val salesByRegion = DataFrame.readSqlQuery(dbConfig, """ + SELECT region, SUM(amount) AS total + FROM sales + GROUP BY region +""") + +// From JDBC connection +connection.readDataFrame("SELECT * FROM orders") + +// From ResultSet +val rs = connection.createStatement().executeQuery("SELECT * FROM orders") +rs.readDataFrame(connection) +``` + +More information could be found [here](readSqlDatabases.md). + +## 3. Why It’s Not an ORM + +Frameworks like **Hibernate** or **Exposed**: +- Map DB tables to Kotlin objects (entities) +- Track object changes and sync them back to the database +- Focus on **persistence** and **transactions** + +Kotlin DataFrame: +- Has no persistence layer +- Doesn’t try to map rows to mutable entities +- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines** +- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin**](Compiler-Plugin.md) updates the type-safe API automatically under the hood. + - You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from data or transformations. +- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations. + +Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM. + +--- + +## 4. Key Differences from SQL & ORMs + +| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame | +|------------------------------------|-------------------------------------|---------------------------|------------------| +| **Storage** | Persistent | Persistent | In-memory only | +| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations | +| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin | +| **Relations** | Foreign keys | Mapped via annotations | Not applicable | +| **Transactions** | Yes | Yes | Not applicable | +| **Indexes** | Yes | Yes (via DB) | Not applicable | +| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) | +| **Joins** | `JOIN` keyword | Eager/lazy loading | `.join()` / `.leftJoin()` DSL | +| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | `.groupBy().aggregate()` | +| **Filtering** | `WHERE` | Criteria API / query DSL | `.filter { ... }` | +| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable | +| **Execution** | On DB engine | On DB engine | In JVM process | + +--- + +## 5. SQL → Kotlin DataFrame Cheatsheet + +### DDL Analogues + +| SQL DDL Command / Example | Kotlin DataFrame Equivalent | +|---------------------------|-----------------------------| +| **Create table:**
`CREATE TABLE person (name text, age int);` | `@DataSchema`
`interface Person {`
` val name: String`
` val age: Int`
`}` | +| **Add column:**
`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` | +| **Rename column:**
`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` | +| **Drop column:**
`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` | +| **Modify column type:**
`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to()` | + +### DDL Analogues (TODO: decide to remove first DDL section or this) + +| SQL DDL Command | Kotlin DataFrame Equivalent | +|--------------------------------|------------------------------------------------------------------| +| `CREATE TABLE` | Define `@DataSchema` interface or class
`@DataSchema`
`interface Person {`
` val name: String`
` val age: Int`
`}` | +| `ALTER TABLE ADD COLUMN` | `.add("newCol") { ... }` | +| `ALTER TABLE DROP COLUMN` | `.remove("colName")` | +| `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` | +| `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to()` | + +--- + +### DML Analogues + +| SQL DML Command / Example | Kotlin DataFrame Equivalent | +|----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------| +| `SELECT col1, col2` | `df.select { col1 and col2 }` | +| `WHERE amount > 100` | `df.filter { amount > 100 }` | +| `ORDER BY amount DESC` | `df.sortByDesc { amount }` | +| `GROUP BY region` | `df.groupBy { region }` | +| `SUM(amount)` | `.aggregate { sum(amount) }` | +| `JOIN` | `.join(otherDf) { id match right.id }` | +| `LIMIT 5` | `.take(5)` | +| **Pivot:**
`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum(amount) }` | +| **Explode array column:**
`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` | +| **Update column:**
`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` | + + +## 6. Example: SQL vs DataFrame Side-by-Side + +**SQL (PostgreSQL):** +```sql +SELECT region, SUM(amount) AS total +FROM sales +WHERE amount > 0 +GROUP BY region +ORDER BY total DESC +LIMIT 5; +``` + +```kotlin +sales.filter { amount > 0 } + .groupBy { region } + .aggregate { sum(amount).into("total") } + .sortByDesc { total } + .take(5) +``` + +## In conclusion + +- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin. +- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md). +- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations. +- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources. +- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted. + +## What's Next? +If you're ready to go through a complete example, we recommend our [Quickstart Guide](quickstart.md) +— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step. + +Ready to go deeper? Check out what’s next: + +- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets, + API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame. + +- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do. + +- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md). + +- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)** + and make working with your data both convenient and type-safe. + +- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)** + for auto-generated column access in your IntelliJ IDEA projects. + +- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning + [Kandy Documentation](https://kotlin.github.io/kandy).