-
Notifications
You must be signed in to change notification settings - Fork 76
Add SQL-to-Kotlin DataFrame transition guide for backend developers #1377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Includes a comprehensive guide to help SQL and ORM users adapt to Kotlin DataFrame. Covers key concepts, equivalents for SQL/ORM operations, and practical examples. Updated TOC to include the new guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a comprehensive guide for backend developers with SQL experience to transition to Kotlin DataFrame. The guide provides SQL-to-DataFrame mappings and explains key conceptual differences between SQL databases, ORMs, and Kotlin DataFrame.
- Introduces Kotlin DataFrame concepts through familiar SQL terminology
- Provides side-by-side comparisons of SQL commands and DataFrame operations
- Explains the differences between DataFrame, SQL databases, and ORMs
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
Guide-for-backend-SQL-developers.md | New comprehensive guide document with SQL-to-DataFrame mappings and conceptual explanations |
d.tree | Adds the new guide to the documentation navigation structure |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| `ALTER TABLE DROP COLUMN` | `.remove("colName")` | | ||
| `ALTER TABLE RENAME COLUMN` | `.rename { oldName }.into("newName")` | | ||
| `ALTER TABLE MODIFY COLUMN` | `.convert { colName }.to<NewType>()` | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This TODO comment indicates incomplete documentation structure. Either merge the duplicate DDL sections or remove one of them to avoid confusion.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an information about supported SQL DBs (and how to integrate unsupported !) with references to https://kotlin.github.io/dataframe/sql.html.
|
||
This guide helps Kotlin backend developers with SQL experience quickly adapt to **Kotlin DataFrame**, mapping familiar SQL and ORM operations to DataFrame concepts. | ||
|
||
We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? I suppose "Backend Developers" would rather use this in a Gradle project, not in KTNB. I mean, they can combine approaches (try doing what they need first in the notebook and then using this code in the project), but it seems we should also include information about setting it up in Gradle projects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, it's good to recommend notebooks, but we should also provide a Gradle-only option
val rs = connection.createStatement().executeQuery("SELECT * FROM orders") | ||
rs.readDataFrame(connection) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a visual example here - original DB schema image and Output dataframe.
|
||
Ready to go deeper? Check out what’s next: | ||
|
||
- 📘 **[Explore in-depth guides and various examples](Guides-And-Examples.md)** with different datasets, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the first two points, it is better to refer to the quickstart guide, which will show the user the basics of working with DataFrame.
(simple logic: User just completed reading a DF from file -> "Ok, what should I do next?" -> The QS guide provides answers to this question! )
|
||
<!---IMPORT org.jetbrains.kotlinx.dataframe.samples.guides.QuickStartGuide--> | ||
|
||
## Quick Setup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a tab here about setting up in Gradle, see comment above.
|
||
--- | ||
|
||
## 1. What is a DataFrame? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow here (and everywhere in the guide) our spelling conventions;
https://kotlin.github.io/dataframe/spellingconventions.html
DataFrame
as object/type/class should be in backtics.
"dataframe" as a concept of tabular data (both abstract or concrete) should be written in lowercase.
Can be interesting that columns store any values, not just primitives or predefined set of types. List, File, Map, user objects - DataFrame offers fully generic storage. |
|
||
We recommend starting with [**Kotlin Notebook**](SetupKotlinNotebook.md) — an IDE-integrated tool similar to Jupyter Notebook. | ||
|
||
It lets you explore data interactively, render DataFrames, create plots, and use all your IDE features within the JVM ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*dataframes
|
||
## Quick Setup | ||
|
||
To start working with Kotlin DataFrame in a Kotlin Notebook, run the cell with the next code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*a cell with the following code
|
||
## 1. What is a DataFrame? | ||
|
||
If you’re used to SQL, a **DataFrame** is conceptually like a **table**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
- **Columns**: named, typed fields | ||
- **Schema**: a mapping of column names to types | ||
|
||
Kotlin DataFrame also supports [**hierarchical, JSON-like data**](hierarchical.md) — columns can contain *nested DataFrames* or *column groups*, allowing you to represent and transform tree-like structures without flattening. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*nested dataframes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can also link to DataColumn.md#framecolumn etc.
|
||
Unlike a relational DB table: | ||
|
||
- A DataFrame **lives in memory** — there’s no storage engine or transaction log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*dataframe, or *`DataFrame`
if you're talking about the instance of the class
.take(5) | ||
``` | ||
|
||
## In conclusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Conclusion
## In conclusion | ||
|
||
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin. | ||
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean with "evolving API support"?
|
||
- Kotlin DataFrame keeps the familiar SQL-style workflow (select → filter → group → aggregate) but makes it **type-safe** and fully integrated into Kotlin. | ||
- The main focus is **readability**, schema change safety, and evolving API support via the [Compiler Plugin](Compiler-Plugin.md). | ||
- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*dataframe
- It is neither a database nor an ORM — a DataFrame does not store data or manage transactions but works as an in-memory layer for analytics and transformations. | ||
- It does not provide some SQL features (permissions, transactions, indexes) — but offers convenient tools for working with JSON-like structures and combining multiple data sources. | ||
- Use Kotlin DataFrame as a **type-safe DSL** for post-processing, merging data sources, and analytics directly on the JVM, while keeping your code easily refactorable and IDE-assisted. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I miss something here about the disadvantages of DataFrame. We need to be honest too, if you have a large database with millions of rows, doing analysis with DF is likely not a good idea
- 💡 **[Use Kotlin DataFrame Compiler Plugin](Compiler-Plugin.md)** | ||
for auto-generated column access in your IntelliJ IDEA projects. | ||
|
||
- 📊 **Master Kandy** for stunning and expressive DataFrame visualizations learning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
learning
| `WHERE amount > 100` | `df.filter { amount > 100 }` | | ||
| `ORDER BY amount DESC` | `df.sortByDesc { amount }` | | ||
| `GROUP BY region` | `df.groupBy { region }` | | ||
| `SUM(amount)` | `.aggregate { sum(amount) }` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use sum { amount }
, i think sum(amount)
is deprecated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another reason to use korro as much as possible :D
(though, I don't think korro can help inside tables like here, can it?
No description provided.