Skip to content

Commit 8cc297c

Browse files
authored
Add SQL-to-Kotlin DataFrame transition guide for backend developers (#1377)
* Add SQL-to-Kotlin DataFrame transition guide for backend developers Includes a comprehensive guide to help SQL and ORM users adapt to Kotlin DataFrame. Covers key concepts, equivalents for SQL/ORM operations, and practical examples. Updated TOC to include the new guide. * Expand Kotlin DataFrame guide for SQL/ORM developers Added details on setting up Kotlin DataFrame in Gradle projects, clarified DDL/DML equivalents, and enhanced examples. Updated TOC with a new Hibernate Interop reference and improved formatting consistency across sections.
1 parent fdd08c4 commit 8cc297c

File tree

3 files changed

+249
-0
lines changed

3 files changed

+249
-0
lines changed

docs/StardustDocs/d.tree

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
<toc-element topic="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md">
1313
<toc-element topic="Trobleshooting.md"/>
1414
</toc-element>
15+
<toc-element topic="Guide-for-backend-SQL-developers.md"/>
1516
</toc-element>
1617

1718
<toc-element topic="Setup.md" accepts-web-file-names="gettingstarted">
Lines changed: 245 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,245 @@
1+
# Kotlin DataFrame for SQL & Backend Developers
2+
3+
<web-summary>
4+
Quickly transition from SQL to Kotlin DataFrame: load your datasets, perform essential transformations, and visualize your results — directly within a Kotlin Notebook.
5+
</web-summary>
6+
7+
<card-summary>
8+
Switching from SQL? Kotlin DataFrame makes it easy to load, process, analyze, and visualize your data — fully interactive and type-safe!
9+
</card-summary>
10+
11+
<link-summary>
12+
Explore Kotlin DataFrame as a SQL or ORM user: read your data, transform columns, group or join tables, and build insightful visualizations with Kotlin Notebook.
13+
</link-summary>
14+
15+
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar
16+
SQL and ORM operations to DataFrame concepts.
17+
18+
If you plan to work on a Gradle project without a Kotlin Notebook,
19+
we recommend installing the library together with our [**experimental Kotlin compiler plugin**](Compiler-Plugin.md) (available since version 2.2.*).
20+
This plugin generates type-safe schemas at compile time,
21+
tracking schema changes throughout your data pipeline.
22+
23+
## Add Kotlin DataFrame Gradle dependency
24+
25+
You could read more about the setup of the Gradle build in the [Gradle Setup Guide](SetupGradle.md).
26+
27+
In your Gradle build file (`build.gradle` or `build.gradle.kts`), add the Kotlin DataFrame library as a dependency:
28+
29+
<tabs>
30+
<tab title="Kotlin DSL">
31+
32+
```kotlin
33+
dependencies {
34+
implementation("org.jetbrains.kotlinx:dataframe:%dataFrameVersion%")
35+
}
36+
```
37+
38+
</tab>
39+
40+
<tab title="Groovy DSL">
41+
42+
```groovy
43+
dependencies {
44+
implementation 'org.jetbrains.kotlinx:dataframe:%dataFrameVersion%'
45+
}
46+
```
47+
48+
</tab>
49+
</tabs>
50+
51+
---
52+
53+
## 1. What is a dataframe?
54+
55+
If you’re used to SQL, a **dataframe** is conceptually like a **table**:
56+
57+
- **Rows**: ordered records of data
58+
- **Columns**: named, typed fields
59+
- **Schema**: a mapping of column names to types
60+
61+
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md)
62+
columns can contain *[nested dataframes](DataColumn.md#framecolumn)* or *column groups*,
63+
allowing you to represent and transform tree-like structures without flattening.
64+
65+
Unlike a relational DB table:
66+
67+
- A DataFrame object **lives in memory** — there’s no storage engine or transaction log
68+
- It’s **immutable** — each operation produces a *new* DataFrame
69+
- There is **no concept of foreign keys or relations** between DataFrames
70+
- It can be created from
71+
*any* [source](Data-Sources.md): [CSV](CSV-TSV.md), [JSON](JSON.md), [SQL tables](SQL.md), [Apache Arrow](ApacheArrow.md),
72+
in-memory objects
73+
74+
---
75+
76+
## 2. Reading Data From SQL
77+
78+
Kotlin DataFrame integrates with JDBC, so you can bring SQL data into memory for analysis.
79+
80+
| Approach | Example |
81+
|----------------------------------|---------------------------------------------------------------------|
82+
| **From a table** | `val df = DataFrame.readSqlTable(dbConfig, "customers")` |
83+
| **From a SQL query** | `val df = DataFrame.readSqlQuery(dbConfig, "SELECT * FROM orders")` |
84+
| **From a JDBC Connection** | `val df = connection.readDataFrame("SELECT * FROM orders")` |
85+
| **From a ResultSet (extension)** | `val df = resultSet.readDataFrame(connection)` |
86+
87+
```kotlin
88+
import org.jetbrains.kotlinx.dataframe.io.DbConnectionConfig
89+
90+
val dbConfig = DbConnectionConfig(
91+
url = "jdbc:postgresql://localhost:5432/mydb",
92+
user = "postgres",
93+
password = "secret"
94+
)
95+
96+
// Table
97+
val customers = DataFrame.readSqlTable(dbConfig, "customers")
98+
99+
// Query
100+
val salesByRegion = DataFrame.readSqlQuery(
101+
dbConfig, """
102+
SELECT region, SUM(amount) AS total
103+
FROM sales
104+
GROUP BY region
105+
"""
106+
)
107+
108+
// From JDBC connection
109+
connection.readDataFrame("SELECT * FROM orders")
110+
111+
// From ResultSet
112+
val rs = connection.createStatement().executeQuery("SELECT * FROM orders")
113+
rs.readDataFrame(connection)
114+
```
115+
116+
More information can be found [here](readSqlDatabases.md).
117+
118+
## 3. Why It’s Not an ORM
119+
120+
Frameworks like **[Hibernate](https://hibernate.org/orm/)** or **[Exposed](https://github.com/JetBrains/Exposed)**:
121+
122+
- Map DB tables to Kotlin objects (entities)
123+
- Track object changes and sync them back to the database
124+
- Focus on **persistence** and **transactions**
125+
126+
Kotlin DataFrame:
127+
128+
- Has no persistence layer
129+
- Doesn’t try to map rows to mutable entities
130+
- Focuses on **in-memory analytics**, **transformations**, and **type-safe pipelines**
131+
- The **main idea** is that the schema *changes together with your transformations* — and the [**Compiler Plugin
132+
**](Compiler-Plugin.md) updates the type-safe API automatically under the hood.
133+
- You don’t have to manually define or recreate schemas every time — the plugin infers them dynamically from the data or
134+
transformations.
135+
- In ORMs, the mapping layer is **frozen** — schema changes require manual model edits and migrations.
136+
137+
Think of Kotlin DataFrame as a **data analysis/ETL tool**, not an ORM.
138+
139+
---
140+
141+
## 4. Key Differences from SQL & ORMs
142+
143+
| Feature / Concept | SQL Databases (PostgreSQL, MySQL…) | ORM (Hibernate, Exposed…) | Kotlin DataFrame |
144+
|----------------------------|------------------------------------|------------------------------------|---------------------------------------------------------------------|
145+
| **Storage** | Persistent | Persistent | In-memory only |
146+
| **Schema definition** | `CREATE TABLE` DDL | Defined in entity classes | Derived from data or transformations or defined manually |
147+
| **Schema change** | `ALTER TABLE` | Manual migration of entity classes | Automatic via transformations + Compiler Plugin or defined manually |
148+
| **Relations** | Foreign keys | Mapped via annotations | Not applicable |
149+
| **Transactions** | Yes | Yes | Not applicable |
150+
| **DB Indexes** | Yes | Yes (via DB) | Not applicable |
151+
| **Data manipulation** | SQL DML (`INSERT`, `UPDATE`) | CRUD mapped to DB | Transformations only (immutable) |
152+
| **Joins** | `JOIN` keyword | Eager/lazy loading | [`.join()` / `.leftJoin()` DSL](join.md) |
153+
| **Grouping & aggregation** | `GROUP BY` | DB query with groupBy | [`.groupBy().aggregate()`](groupBy.md) |
154+
| **Filtering** | `WHERE` | Criteria API / query DSL | [`.filter { ... }`](filter.md) |
155+
| **Permissions** | `GRANT` / `REVOKE` | DB-level permissions | Not applicable |
156+
| **Execution** | On DB engine | On DB engine | In JVM process |
157+
158+
---
159+
160+
## 5. SQL → Kotlin DataFrame Cheatsheet
161+
162+
### DDL Analogues
163+
164+
| SQL DDL Command / Example | Kotlin DataFrame Equivalent |
165+
|---------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
166+
| **Create table:**<br>`CREATE TABLE person (name text, age int);` | `@DataSchema`<br>`interface Person {`<br>` val name: String`<br>` val age: Int`<br>`}` |
167+
| **Add column:**<br>`ALTER TABLE sales ADD COLUMN profit numeric GENERATED ALWAYS AS (revenue - cost) STORED;` | `.add("profit") { revenue - cost }` |
168+
| **Rename column:**<br>`ALTER TABLE sales RENAME COLUMN old_name TO new_name;` | `.rename { old_name }.into("new_name")` |
169+
| **Drop column:**<br>`ALTER TABLE sales DROP COLUMN old_col;` | `.remove { old_col }` |
170+
| **Modify column type:**<br>`ALTER TABLE sales ALTER COLUMN amount TYPE numeric;` | `.convert { amount }.to<Double>()` |
171+
172+
---
173+
174+
### DML Analogues
175+
176+
| SQL DML Command / Example | Kotlin DataFrame Equivalent |
177+
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|
178+
| `SELECT col1, col2` | `df.select { col1 and col2 }` |
179+
| `WHERE amount > 100` | `df.filter { amount > 100 }` |
180+
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` |
181+
| `GROUP BY region` | `df.groupBy { region }` |
182+
| `SUM(amount)` | `.aggregate { sum { amount } }` |
183+
| `JOIN` | `.join(otherDf) { id match right.id }` |
184+
| `LIMIT 5` | `.take(5)` |
185+
| **Pivot:** <br>`SELECT * FROM crosstab('SELECT region, year, SUM(amount) FROM sales GROUP BY region, year') AS ct(region text, y2023 int, y2024 int);` | `.pivot(region, year) { sum { amount } }` |
186+
| **Explode array column:** <br>`SELECT id, unnest(tags) AS tag FROM products;` | `.explode { tags }` |
187+
| **Update column:** <br>`UPDATE sales SET amount = amount * 1.2;` | `.update { amount }.with { it * 1.2 }` |
188+
189+
## 6. Example: SQL vs. DataFrame Side-by-Side
190+
191+
**SQL (PostgreSQL):**
192+
193+
```sql
194+
SELECT region, SUM(amount) AS total
195+
FROM sales
196+
WHERE amount > 0
197+
GROUP BY region
198+
ORDER BY total DESC LIMIT 5;
199+
```
200+
201+
```kotlin
202+
sales.filter { amount > 0 }
203+
.groupBy { region }
204+
.aggregate { sum(amount).into("total") }
205+
.sortByDesc { total }
206+
.take(5)
207+
```
208+
209+
## In Conclusion
210+
211+
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe
212+
** and fully integrated into Kotlin.
213+
- The main focus is **readability** and schema change safety via
214+
the [Compiler Plugin](Compiler-Plugin.md).
215+
- It is neither a database nor an ORM — a Kotlin DataFrame library does not store data or manage transactions but works as an in-memory
216+
layer for analytics and transformations.
217+
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working
218+
with JSON-like structures and combining multiple data sources.
219+
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the
220+
JVM, while keeping your code easily refactorable and IDE-assisted.
221+
- Use Kotlin DataFrame for small- and average-sized datasets, but for large datasets, consider using a more
222+
**performant** database engine.
223+
224+
## What's Next?
225+
226+
If you're ready to go through a complete example, we recommend our **[Quickstart Guide](quickstart.md)**
227+
— you'll learn the basics of reading data, transforming it, and creating visualization step-by-step.
228+
229+
Ready to go deeper? Check out what’s next:
230+
231+
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets,
232+
API usage examples, and practical scenarios that help you understand the main features of Kotlin DataFrame.
233+
234+
- 🛠️ **[Browse the operations overview](operations.md)** to learn what Kotlin DataFrame can do.
235+
236+
- 🧠 **Understand the design** and core concepts in the [library overview](concepts.md).
237+
238+
- 🔤 **[Learn more about Extension Properties](extensionPropertiesApi.md)**
239+
and make working with your data both convenient and type-safe.
240+
241+
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)**
242+
for auto-generated column access in your IntelliJ IDEA projects.
243+
244+
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations
245+
[Kandy Documentation](https://kotlin.github.io/kandy).

docs/StardustDocs/topics/guides/Guides-And-Examples.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ Explore our structured, in-depth guides to steadily improve your Kotlin DataFram
2424

2525
<img src="quickstart_preview.png" border-effect="rounded" width="705"/>
2626

27+
* [](Guide-for-backend-SQL-developers.md) — migration guide for backend developers with SQL/ORM experience moving to Kotlin DataFrame
28+
2729
* [](extensionPropertiesApi.md) — learn about extension properties for [`DataFrame`](DataFrame.md)
2830
and make working with your data both convenient and type-safe.
2931

@@ -60,6 +62,7 @@ and make working with your data both convenient and type-safe.
6062
* [Apache Spark Interop (With Kotlin Spark API)](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/kotlinSpark)
6163
* [Multik Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/multik)
6264
* [JetBrains Exposed Interop](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/exposed)
65+
* [Hibernate ORM](https://github.com/Kotlin/dataframe/tree/master/examples/idea-examples/unsupported-data-sources/src/main/kotlin/org/jetbrains/kotlinx/dataframe/examples/hibernate)
6366
* [OpenAPI Guide](https://github.com/Kotlin/dataframe/blob/master/examples/notebooks/json/KeyValueAndOpenApi.ipynb)
6467
— learn how to parse and explore [OpenAPI](https://swagger.io) JSON structures using Kotlin DataFrame,
6568
enabling structured access and intuitive analysis of complex API schemas (*experimental*, supports OpenAPI 3.0.0).

0 commit comments

Comments
 (0)