Skip to content

Add SQL-to-Kotlin DataFrame transition guide for backend developers #1377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/StardustDocs/d.tree
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
<toc-element topic="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md">
<toc-element topic="Trobleshooting.md"/>
</toc-element>
<toc-element topic="Guide-for-backend-SQL-developers.md"/>
</toc-element>

<toc-element topic="Setup.md" accepts-web-file-names="gettingstarted">
Expand Down
225 changes: 225 additions & 0 deletions docs/StardustDocs/topics/guides/Guide-for-backend-SQL-developers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Kotlin DataFrame for SQL & Backend Developers

<web-summary>
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
</web-summary>

<card-summary>
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
</card-summary>

<link-summary>
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
</link-summary>

This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does ORM mean? Object Relational...? I'm not familiar with the concept, maybe you could expand the abbreviation the first time? :)


We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? I suppose "Backend Developers" would rather use this in a Gradle project, not in KTNB. I mean, they can combine approaches (try doing what they need first in the notebook and then using this code in the project), but it seems we should also include information about setting it up in Gradle projects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's good to recommend notebooks, but we should also provide a Gradle-only option


It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframes


If you plan to work on a Gradle project without a notebook, we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
This plugin generates type-safe schemas at compile time, tracking schema changes throughout your data pipeline.

<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide-->

## Quick Setup
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tab here about setting up in Gradle, see comment above.


To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*a cell with the following code


```kotlin
%useLatestDescriptors
%use dataframe
```

This will load all necessary DataFrame dependencies (of the latest stable version) and all imports, as well as DataFrame
rendering. Learn more [here](SetupKotlinNotebook.md#integrate-kotlin-dataframe).

---

## 1. What is a DataFrame?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow here (and everywhere in the guide) our spelling conventions;
https://kotlin.github.io/dataframe/spellingconventions.html

DataFrame as object/type/class should be in backtics.
"dataframe" as a concept of tabular data (both abstract or concrete) should be written in lowercase.


If you’re used to SQL, a **DataFrame** is conceptually like a **table**:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here


- **Rows**: ordered records of data
- **Columns**: named, typed fields
- **Schema**: a mapping of column names to types

Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*nested dataframes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also link to DataColumn.md#framecolumn etc.


Unlike a relational DB table:

- A DataFrame **lives in memory** — there’s no storage engine or transaction log
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframe, or *`DataFrame` if you're talking about the instance of the class

- It’s **immutable** — each operation produces a *new* DataFrame
- There is **no concept of foreign keys or relations** between DataFrames
- It can be created from *any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md), in-memory objects

---

## 2. Reading Data From SQL

Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.

| Approach | Example |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you hit ctrl+alt+L, the file will be reformatted, including this table :)

|------------------------------------|---------|
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |

```kotlin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use Korro here. That way you automatically check the examples against any API changes in the future.

import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig

val dbConfig = DbConnectionConfig(
url = "jdbc:postgresql://localhost:5432/mydb",
user = "postgres",
password = "secret"
)

// Table
val customers = DataFrame.readSqlTable(dbConfig, "customers")

// Query
val salesByRegion = DataFrame.readSqlQuery(dbConfig, """
SELECT region, SUM(amount) AS total
FROM sales
GROUP BY region
""")

// From JDBC connection
connection.readDataFrame("SELECT * FROM orders")

// From ResultSet
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
rs.readDataFrame(connection)
```

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a visual example here - original DB schema image and Output dataframe.

More information could be found [here](readSqlDatabases.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*can be found


## 3. Why It’s Not an ORM

Frameworks like **Hibernate** or **Exposed**:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to hibernate and exposed for people unfamiliar with these :)

- Map DB tables to Kotlin objects (entities)
- Track object changes and sync them back to the database
- Focus on **persistence** and **transactions**

Kotlin DataFrame:
- Has no persistence layer
- Doesn’t try to map rows to mutable entities
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
- You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from data or transformations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*the data

- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.

Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETL?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for my server-side ignorance XD


---

## 4. Key Differences from SQL & ORMs

| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
|------------------------------------|-------------------------------------|---------------------------|------------------|
| **Storage** | Persistent | Persistent | In-memory only |
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kotlin DataFrame: -or defined manually as @DataSchema data classes or -interfaces. If that's too long, you could also say "or defined manually" and link to the correct page :)

| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin |
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
| **Transactions** | Yes | Yes | Not applicable |
| **Indexes** | Yes | Yes (via DB) | Not applicable |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a different type of index than "accessing a row based on index"?

| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
| **Joins** | `JOIN` keyword | Eager/lazy loading | `.join()` / `.leftJoin()` DSL |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the join page :), similar for the ones below

| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | `.groupBy().aggregate()` |
| **Filtering** | `WHERE` | Criteria API / query DSL | `.filter { ... }` |
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
| **Execution** | On DB engine | On DB engine | In JVM process |

---

## 5. SQL → Kotlin DataFrame Cheatsheet

### DDL Analogues
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the dataframe concepts in this table :)


| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
|---------------------------|-----------------------------|
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code samples have weird formatting (no indentation). Can we put a full codeblock in a table? alternatively add non-breaking spaces in front of the vals :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to data schema page

| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |

### DDL Analogues (TODO: decide to remove first DDL section or this)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the dataframe concepts in this table :)


| SQL DDL Command | Kotlin DataFrame Equivalent |
|--------------------------------|------------------------------------------------------------------|
| `CREATE TABLE` | Define `@DataSchema` interface or class <br>`@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
| `ALTER TABLE ADD COLUMN` | `.add("newCol") { ... }` |
| `ALTER TABLE DROP COLUMN` | `.remove("colName")` |
| `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` |
| `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to<NewType>()` |

Copy link
Preview

Copilot AI Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO comment indicates incomplete documentation structure. Either merge the duplicate DDL sections or remove one of them to avoid confusion.

Suggested change

Copilot uses AI. Check for mistakes.

---

### DML Analogues

| SQL DML Command / Example | Kotlin DataFrame Equivalent |
|----------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
| `GROUP BY region` | `df.groupBy { region }` |
| `SUM(amount)` | `.aggregate { sum(amount) }` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use sum { amount }, i think sum(amount) is deprecated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another reason to use korro as much as possible :D

(though, I don't think korro can help inside tables like here, can it?

| `JOIN` | `.join(otherDf) { id match right.id }` |
| `LIMIT 5` | `.take(5)` |
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum(amount) }` |
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |


## 6. Example: SQL vs DataFrame Side-by-Side
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*vs. or *V/S


**SQL (PostgreSQL):**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if WriterSide has a side-by-side-like view. like here https://www.jetbrains.com/exposed/, if not, it looks good like this too :)

```sql
SELECT region, SUM(amount) AS total
FROM sales
WHERE amount > 0
GROUP BY region
ORDER BY total DESC
LIMIT 5;
```

```kotlin
sales.filter { amount > 0 }
.groupBy { region }
.aggregate { sum(amount).into("total") }
.sortByDesc { total }
.take(5)
```

## In conclusion
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Conclusion


- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin.
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean with "evolving API support"?

- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*dataframe

- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources.
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss something here about the disadvantages of DataFrame. We need to be honest too, if you have a large database with millions of rows, doing analysis with DF is likely not a good idea

## What's Next?
If you're ready to go through a complete example, we recommend our [Quickstart Guide](quickstart.md)
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.

Ready to go deeper? Check out what’s next:

- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
Copy link
Collaborator

@AndreiKingsley AndreiKingsley Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the first two points, it is better to refer to the quickstart guide, which will show the user the basics of working with DataFrame.
(simple logic: User just completed reading a DF from file -> "Ok, what should I do next?" -> The QS guide provides answers to this question! )

API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.

- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.

- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).

- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
and make working with your data both convenient and type-safe.

- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
for auto-generated column access in your IntelliJ IDEA projects.

- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning

[Kandy Documentation](https://kotlin.github.io/kandy).