Skip to content

Commit 27a7176

Browse files
authored
Update of the documentation for the 0.12.1 release (#556)
* Add SQL schema generation guide to gradleReference.md Extended the gradleReference documentation by adding a section on how to generate schema for existing SQL tables. This includes JDBC connection establishment and usage limitations and support for particular databases. * Finished gradleReference.md changes * Removed OpenAPI schemas section from schemasGradle.md file because it is a duplication of the information * Add SQL schema import tutorial and update documentation Added a new document providing a step-by-step guide on importing SQL metadata as a schema in Gradle project. In addition, minor documentation enhancements were made across multiple files to ensure clarity and precision. An unnecessary OpenAPI schemas section was removed from the schemasGradle.md file due to duplication. * Update SQL schema import guide and enhance documentation. * The modification of the guide in the SQL schema import document provides a clearer instruction on how to import data. The change improves the language use in the guide, ensuring better understandability for users following the instructions for importing schema data from an SQL table or query.
1 parent f841b32 commit 27a7176

File tree

6 files changed

+240
-75
lines changed

6 files changed

+240
-75
lines changed

docs/StardustDocs/topics/gradleReference.md

Lines changed: 86 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ dataframes {
1010
}
1111
}
1212
```
13-
Note than name of the file and the interface are normalized: split by '_' and ' ' and joined to camel case.
13+
Note that the name of the file and the interface are normalized: split by '_' and ' ' and joined to CamelCase.
1414
You can set parsing options for CSV:
1515
```kotlin
1616
dataframes {
@@ -23,24 +23,36 @@ dataframes {
2323
}
2424
}
2525
```
26-
In this case output path will depend on your directory structure. For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
27-
`. Note that name of the Kotlin file is derived from the name of the data file with the suffix `.Generated` and the package
28-
is derived from the directory structure with child directory `dataframe`. The name of the **data schema** itself is `JetbrainsRepositories`. You could specify it explicitly:
26+
In this case, the output path will depend on your directory structure.
27+
For project with package `org.example` path will be `build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
28+
`.
29+
30+
Note that the name of the Kotlin file is derived from the name of the data file with the suffix
31+
`.Generated` and the package
32+
is derived from the directory structure with child directory `dataframe`.
33+
34+
The name of the **data schema** itself is `JetbrainsRepositories`.
35+
You could specify it explicitly:
36+
2937
```kotlin
3038
schema {
3139
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/MyName.Generated.kt
3240
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
3341
name = "MyName"
3442
}
3543
```
36-
If you want to change default package for all schemas:
44+
45+
If you want to change the default package for all schemas:
46+
3747
```kotlin
3848
dataframes {
3949
packageName = "org.example"
4050
// Schemas...
4151
}
4252
```
53+
4354
Then you can set packageName for specific schema exclusively:
55+
4456
```kotlin
4557
dataframes {
4658
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
@@ -50,7 +62,9 @@ dataframes {
5062
}
5163
}
5264
```
53-
If you want non-default name and package, consider using fully-qualified name:
65+
66+
If you want non-default name and package, consider using fully qualified name:
67+
5468
```kotlin
5569
dataframes {
5670
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
@@ -60,7 +74,10 @@ dataframes {
6074
}
6175
}
6276
```
63-
By default, plugin will generate output in specified source set. Source set could be specified for all schemas or for specific schema:
77+
78+
By default, the plugin will generate output in a specified source set.
79+
Source set could be specified for all schemas or for specific schema:
80+
6481
```kotlin
6582
dataframes {
6683
packageName = "org.example"
@@ -76,7 +93,9 @@ dataframes {
7693
}
7794
}
7895
```
79-
But if you need generated files in other directory, set `src`:
96+
97+
If you need the generated files to be put in another directory, set `src`:
98+
8099
```kotlin
81100
dataframes {
82101
// output: schemas/org/example/test/OtherName.Generated.kt
@@ -87,10 +106,63 @@ dataframes {
87106
}
88107
}
89108
```
109+
## Schema Definitions from SQL Databases
110+
111+
To generate a schema for an existing SQL table,
112+
you need to define a few parameters to establish a JDBC connection:
113+
URL (passing to `data` field), username, and password.
114+
115+
Also, the `tableName` parameter should be specified to convert the data from the table with that name to the dataframe.
116+
117+
```kotlin
118+
dataframes {
119+
schema {
120+
data = "jdbc:mariadb://localhost:3306/imdb"
121+
name = "org.example.imdb.Actors"
122+
jdbcOptions {
123+
user = "root"
124+
password = "pass"
125+
tableName = "actors"
126+
}
127+
}
128+
}
129+
```
130+
131+
To generate a schema for the result of an SQL query,
132+
you need to define the same parameters as before together with the SQL query to establish connection.
133+
134+
```kotlin
135+
dataframes {
136+
schema {
137+
data = "jdbc:mariadb://localhost:3306/imdb"
138+
name = "org.example.imdb.TarantinoFilms"
139+
jdbcOptions {
140+
user = "root"
141+
password = "pass"
142+
sqlQuery = """
143+
SELECT name, year, rank,
144+
GROUP_CONCAT (genre) as "genres"
145+
FROM movies JOIN movies_directors ON movie_id = movies.id
146+
JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
147+
WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
148+
GROUP BY name, year, rank
149+
ORDER BY year
150+
"""
151+
}
152+
}
153+
}
154+
```
155+
156+
**NOTE:** This is an experimental functionality and, for now,
157+
we only support four databases: MariaDB, MySQL, PostgreSQL, and SQLite.
158+
159+
Additionally, support for JSON and date-time types is limited.
160+
Please take this into consideration when using these functions.
90161

91162
## DSL reference
92-
Inside `dataframes` you can configure parameters that will apply to all schemas. Configuration inside `schema` will override these defaults for specific schema.
93-
Here is full DSL for declaring data schemas:
163+
Inside `dataframes` you can configure parameters that will apply to all schemas.
164+
Configuration inside `schema` will override these defaults for a specific schema.
165+
Here is the full DSL for declaring data schemas:
94166

95167
```kotlin
96168
dataframes {
@@ -101,8 +173,8 @@ dataframes {
101173
// KOTLIN SCRIPT: DataSchemaVisibility.INTERNAL DataSchemaVisibility.IMPLICIT_PUBLIC, DataSchemaVisibility.EXPLICIT_PUBLIC
102174
// GROOVY SCRIPT: 'internal', 'implicit_public', 'explicit_public'
103175

104-
withoutDefaultPath() // disable default path for all schemas
105-
// i.e. plugin won't copy "data" property of the schemas to generated companion objects
176+
withoutDefaultPath() // disable a default path for all schemas
177+
// i.e., plugin won't copy "data" property of the schemas to generated companion objects
106178

107179
// split property names by delimiters (arguments of this method), lowercase parts and join to camel case
108180
// enabled by default
@@ -125,8 +197,8 @@ dataframes {
125197
withNormalizationBy('_') // enable property names normalization for this schema and use these delimiters
126198
withoutNormalization() // disable property names normalization for this schema
127199

128-
withoutDefaultPath() // disable default path for this schema
129-
withDefaultPath() // enable default path for this schema
200+
withoutDefaultPath() // disable the default path for this schema
201+
withDefaultPath() // enable the default path for this schema
130202
}
131203
}
132204
```

docs/StardustDocs/topics/readSqlDatabases.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,21 @@ val df = DataFrame.readSqlTable(dbConfig, tableName, 100)
8080

8181
df.print()
8282
```
83+
## Getting Started with Notebooks
8384

85+
To use the latest version of the Kotlin DataFrame library
86+
and a specific version of the JDBC driver for your database (MariaDB is used as an example below) in your Notebook, run the following cell.
87+
88+
```jupyter
89+
%use dataframe
90+
91+
USE {
92+
dependencies("org.mariadb.jdbc:mariadb-java-client:$version")
93+
}
94+
```
8495

96+
**NOTE:** The user should specify the version of the JDBC driver.
97+
8598
## Reading Specific Tables
8699

87100
These functions read all data from a specific table in the database.
@@ -220,7 +233,7 @@ The versions with a limit parameter will only read up to the specified number of
220233
This function allows reading a ResultSet object from your SQL database
221234
and transforms it into an AnyFrame object.
222235

223-
The `dbType: DbType` parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc),
236+
The `dbType: DbType` parameter specifies the type of our database (e.g., PostgreSQL, MySQL, etc.),
224237
supported by a library.
225238
Currently, the following classes are available: `H2, MariaDb, MySql, PostgreSql, Sqlite`.
226239

docs/StardustDocs/topics/schemas.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ Here's a list of the most popular use cases with Data Schemas.
3535
Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library.
3636
Schema interfaces should also be extracted if this code uses Custom Data Schemas.
3737

38-
* [**Import OpenAPI Schemas in Gradle project**](schemasImportOpenApiGradle.md) <br/>
38+
* [**Schema Definitions from SQL Databases in Gradle Project**](schemasImportSqlGradle.md) <br/>
39+
When you need to take data from the SQL database.
40+
41+
* [**Import OpenAPI Schemas in Gradle Project**](schemasImportOpenApiGradle.md) <br/>
3942
When you need to take data from the endpoint with OpenAPI Schema.
4043

4144
* [**Import Data Schemas, e.g. from OpenAPI, in Jupyter**](schemasImportOpenApiJupyter.md) <br/>

docs/StardustDocs/topics/schemasGradle.md

Lines changed: 1 addition & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ interface Person {
5858
}
5959
```
6060

61-
#### Execute assemble task to generate type-safe accessors for schemas:
61+
#### Execute the `assemble` task to generate type-safe accessors for schemas:
6262

6363
<!---FUN useProperties-->
6464

@@ -150,60 +150,3 @@ print(df.fullName.count { it.contains("kotlin") })
150150
```
151151

152152
<!---END-->
153-
154-
### OpenAPI Schemas
155-
156-
JSON schema inference is great, but it's not perfect. However, more and more APIs offer
157-
[OpenAPI (Swagger)](https://swagger.io/) specifications. Aside from API endpoints, they also hold
158-
[Data Models](https://swagger.io/docs/specification/data-models/) which include all the information about the types
159-
that can be returned from or supplied to the API. Why should we reinvent the wheel and write our own schema inference
160-
when we can use the one provided by the API? Not only will we now get the proper names of the types, but we will also
161-
get enums, correct inheritance and overall better type safety.
162-
163-
First of all, you will need the extra dependency:
164-
165-
```kotlin
166-
implementation("org.jetbrains.kotlinx:dataframe-openapi:$dataframe_version")
167-
```
168-
169-
OpenAPI type schemas can be generated using both methods described above:
170-
171-
```kotlin
172-
@file:ImportDataSchema(
173-
path = "https://petstore3.swagger.io/api/v3/openapi.json",
174-
name = "PetStore",
175-
)
176-
177-
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
178-
```
179-
180-
```kotlin
181-
dataframes {
182-
schema {
183-
data = "https://petstore3.swagger.io/api/v3/openapi.json"
184-
name = "PetStore"
185-
}
186-
}
187-
```
188-
189-
The only difference is that the name provided is now irrelevant, since the type names are provided by the OpenAPI spec.
190-
(If you were wondering, yes, the Kotlin DataFrame library can tell the difference between an OpenAPI spec and normal JSON data)
191-
192-
After importing the data schema, you can now start to import any JSON data you like using the generated schemas.
193-
For instance, one of the types in the schema above is `PetStore.Pet` (which can also be
194-
explored [here](https://petstore3.swagger.io/)),
195-
so let's parse some Pets:
196-
197-
```kotlin
198-
val df: DataFrame<PetStore.Pet> =
199-
PetStore.Pet.readJson("https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available")
200-
```
201-
202-
Now you will have a correctly typed [`DataFrame`](DataFrame.md)!
203-
204-
You can also always ctrl+click on the `PetStore.Pet` type to see all the generated schemas.
205-
206-
If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting
207-
something as
208-
type-fluid as JSON to a strongly typed language), please open an issue on
209-
the [Github repo](https://github.com/Kotlin/dataframe/issues).

docs/StardustDocs/topics/schemasImportOpenApiGradle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,4 +61,4 @@ You can also always ctrl+click on the `PetStore.Pet` type to see all the generat
6161
If you experience any issues with the OpenAPI support (since there are many gotchas and edge-cases when converting
6262
something as
6363
type-fluid as JSON to a strongly typed language), please open an issue on
64-
the [Github repo](https://github.com/Kotlin/dataframe/issues).
64+
the [GitHub repo](https://github.com/Kotlin/dataframe/issues).

0 commit comments

Comments
 (0)