|
| 1 | +[//]: # (title: Data Schemas) |
| 2 | + |
| 3 | +The Kotlin DataFrame library provides typed data access via |
| 4 | +[generation of extension properties](extensionPropertiesApi.md) for type |
| 5 | +[`DataFrame<T>`](DataFrame.md) (as well as [`DataRow<T>`](DataRow.md)), where |
| 6 | +`T` is a marker class that represents `DataSchema` of [`DataFrame`](DataFrame.md). |
| 7 | + |
| 8 | +Schema of [`DataFrame`](DataFrame.md) is a mapping from column names to column types of [`DataFrame`](DataFrame.md). |
| 9 | +Data schema can be interpreted as a Kotlin interface or class. If the dataframe is hierarchical - contains |
| 10 | +[column group](DataColumn.md#columngroup) or [column of dataframes](DataColumn.md#framecolumn), data schema |
| 11 | +takes it into account and there is a separate class for each column group or inner `DataFrame`. |
| 12 | + |
| 13 | +For example, consider a simple hierarchical dataframe from |
| 14 | +<resource src="example.csv"></resource>. |
| 15 | + |
| 16 | +This dataframe consists of two columns: `name`, which is a `String` column, and `info`, |
| 17 | +which is a [**column group**](DataColumn.md#columngroup) containing two nested |
| 18 | +[value columns](DataColumn.md#valuecolumn) — |
| 19 | +`age` of type `Int`, and `height` of type `Double`. |
| 20 | + |
| 21 | +<table> |
| 22 | + <thead> |
| 23 | + <tr> |
| 24 | + <th>name</th> |
| 25 | + <th colspan="2">info</th> |
| 26 | + </tr> |
| 27 | + <tr> |
| 28 | + <th></th> |
| 29 | + <th>age</th> |
| 30 | + <th>height</th> |
| 31 | + </tr> |
| 32 | + </thead> |
| 33 | + <tbody> |
| 34 | + <tr> |
| 35 | + <td>Alice</td> |
| 36 | + <td>23</td> |
| 37 | + <td>175.5</td> |
| 38 | + </tr> |
| 39 | + <tr> |
| 40 | + <td>Bob</td> |
| 41 | + <td>27</td> |
| 42 | + <td>160.2</td> |
| 43 | + </tr> |
| 44 | + </tbody> |
| 45 | +</table> |
| 46 | + |
| 47 | +Data schema corresponding to this dataframe can be represented like this : |
| 48 | + |
| 49 | +```kotlin |
| 50 | +// Data schema of the "info" column group |
| 51 | +@DataSchema |
| 52 | +data class Info( |
| 53 | + val age: Int, |
| 54 | + val height: Float |
| 55 | +) |
| 56 | + |
| 57 | +// Data schema of the entire dataframe |
| 58 | +@DataSchema |
| 59 | +data class Person( |
| 60 | + val info: Info, |
| 61 | + val name: String |
| 62 | +) |
| 63 | +``` |
| 64 | + |
| 65 | +[Extension properties](extensionPropertiesApi.md) for the `DataFrame<Person>` |
| 66 | +are generated according to this schema and can be used for accessing columns and usage in operations: |
| 67 | + |
| 68 | +```kotlin |
| 69 | +// Assuming `df` has type DataFrame<Person> |
| 70 | + |
| 71 | +// Get "age" column from "info" group |
| 72 | +df.info.age |
| 73 | + |
| 74 | +// Select "name" and "height" columns |
| 75 | +df.select { name and info.height } |
| 76 | + |
| 77 | +// Filter rows by age value |
| 78 | +df.filter { age >= 18} |
| 79 | +``` |
| 80 | + |
| 81 | + |
| 82 | +## Popular use cases with Data Schemas |
| 83 | + |
| 84 | +Here's a list of the most popular use cases with Data Schemas. |
| 85 | + |
| 86 | +* [**Data Schemas in Gradle projects**](schemasGradle.md) <br/> |
| 87 | + If you are developing a server application and building it with Gradle. |
| 88 | + |
| 89 | +* [**DataSchema workflow in Jupyter**](schemasJupyter.md) <br/> |
| 90 | + If you prefer Notebooks. |
| 91 | + |
| 92 | +* [**Schema inheritance**](schemasInheritance.md) <br/> |
| 93 | + It's worth knowing how to reuse Data Schemas generated earlier. |
| 94 | + |
| 95 | +* [**Custom Data Schemas**](schemasCustom.md) <br/> |
| 96 | + Sometimes it is necessary to create your own scheme. |
| 97 | + |
| 98 | +* [**Use external Data Schemas in Jupyter**](schemasExternalJupyter.md) <br/> |
| 99 | + Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library. |
| 100 | + Schema interfaces should also be extracted if this code uses Custom Data Schemas. |
| 101 | + |
| 102 | +* [**Schema Definitions from SQL Databases in Gradle Project**](schemasImportSqlGradle.md) <br/> |
| 103 | + When you need to take data from the SQL database. |
| 104 | + |
| 105 | +* [**Import OpenAPI 3.0.0 Schemas (Experimental) in Gradle Project**](schemasImportOpenApiGradle.md) <br/> |
| 106 | + When you need to take data from the endpoint with OpenAPI Schema. |
| 107 | + |
| 108 | +* [**Import Data Schemas, e.g. from OpenAPI 3.0.0 (Experimental), in Jupyter**](schemasImportOpenApiJupyter.md) <br/> |
| 109 | + Similar to [importing OpenAPI Data Schemas in Gradle projects](schemasImportOpenApiGradle.md), |
| 110 | + you can also do this in Jupyter Notebooks. |
0 commit comments