@@ -11,248 +11,33 @@ It ignores order of columns in [`DataFrame`](DataFrame.md), but tracks column hi
11
11
12
12
In Jupyter environment compile-time [ ` DataFrame ` ] ( DataFrame.md ) schema is synchronized with real-time data after every cell execution.
13
13
14
- In IDEA projects, you can use the [ Gradle plugin] ( gradle .md#configuration) to extract schema from the dataset
14
+ In IDEA projects, you can use the [ Gradle plugin] ( schemasGradle .md#configuration) to extract schema from the dataset
15
15
and generate extension properties.
16
16
17
- ## DataSchema workflow in Jupyter
18
17
19
- After execution of cell
18
+ ## Popular use cases with Data Schemas
20
19
21
- <!-- -FUN createDfNullable -->
20
+ Here's a list of the most popular use cases with Data Schemas.
22
21
23
- ``` kotlin
24
- val df = dataFrameOf(" name" , " age" )(
25
- " Alice" , 15 ,
26
- " Bob" , null
27
- )
28
- ```
22
+ * [ ** Data Schemas in Gradle projects** ] ( schemasGradle.md ) <br />
23
+ If you are developing a server application and building it with Gradle.
29
24
30
- <!-- -END-->
25
+ * [ ** DataSchema workflow in Jupyter** ] ( schemasJupyter.md ) <br />
26
+ If you prefer Notebooks.
31
27
32
- the following actions take place:
28
+ * [ ** Schema inheritance** ] ( schemasInheritance.md ) <br />
29
+ It's worth knowing how to reuse Data Schemas generated earlier.
33
30
34
- 1 . Columns in ` df ` are analyzed to extract data schema
35
- 2 . Empty interface with [ ` DataSchema ` ] ( schema.md ) annotation is generated:
31
+ * [ ** Custom Data Schemas ** ] ( schemasCustom.md ) < br />
32
+ Sometimes it is necessary to create your own scheme.
36
33
37
- ``` kotlin
38
- @DataSchema
39
- interface DataFrameType
40
- ```
34
+ * [ ** Use external Data Schemas in Jupyter** ] ( schemasExternalJupyter.md ) <br />
35
+ Sometimes it is convenient to extract reusable code from Jupyter Notebook into the Kotlin JVM library.
36
+ Schema interfaces should also be extracted if this code uses Custom Data Schemas.
41
37
42
- 3 . Extension properties for this [ ` DataSchema ` ] ( schema.md ) are generated:
38
+ * [ ** Import OpenAPI Schemas in Gradle project** ] ( schemasImportOpenApiGradle.md ) <br />
39
+ When you need to take data from the endpoint with OpenAPI Schema.
43
40
44
- ``` kotlin
45
- val ColumnsContainer <DataFrameType >.age: DataColumn <Int ?> @JvmName(" DataFrameType_age" ) get() = this [" age" ] as DataColumn <Int ?>
46
- val DataRow <DataFrameType >.age: Int? @JvmName(" DataFrameType_age" ) get() = this [" age" ] as Int?
47
- val ColumnsContainer <DataFrameType >.name: DataColumn <String > @JvmName(" DataFrameType_name" ) get() = this [" name" ] as DataColumn <String >
48
- val DataRow <DataFrameType >.name: String @JvmName(" DataFrameType_name" ) get() = this [" name" ] as String
49
- ```
50
-
51
- Every column produces two extension properties:
52
-
53
- * Property for ` ColumnsContainer<DataFrameType> ` returns column
54
- * Property for ` DataRow<DataFrameType> ` returns cell value
55
-
56
- 4 . ` df ` variable is typed by schema interface:
57
-
58
- ``` kotlin
59
- val temp = df
60
- ```
61
-
62
- ``` kotlin
63
- val df = temp.cast<DataFrameType >()
64
- ```
65
-
66
- > _ Note, that object instance after casting remains the same. See [ cast] ( cast.md ) .
67
-
68
- To log all these additional code executions, use cell magic
69
-
70
- ```
71
- %trackExecution -all
72
- ```
73
-
74
- ## Schema inheritance
75
-
76
- In order to reduce amount of generated code, previously generated [ ` DataSchema ` ] ( schema.md ) interfaces are reused and only new
77
- properties are introduced
78
-
79
- Let's filter out all ` null ` values from ` age ` column and add one more column of type ` Boolean ` :
80
-
81
- ``` kotlin
82
- val filtered = df.filter { age != null }.add(" isAdult" ) { age!! > 18 }
83
- ```
84
-
85
- New schema interface for ` filtered ` variable will be derived from previously generated ` DataFrameType ` :
86
-
87
- ``` kotlin
88
- @DataSchema
89
- interface DataFrameType1 : DataFrameType
90
- ```
91
-
92
- Extension properties for data access are generated only for new and overriden members of ` DataFrameType1 ` interface:
93
-
94
- ``` kotlin
95
- val ColumnsContainer <DataFrameType1 >.age: DataColumn <Int > get() = this [" age" ] as DataColumn <Int >
96
- val DataRow <DataFrameType1 >.age: Int get() = this [" age" ] as Int
97
- val ColumnsContainer <DataFrameType1 >.isAdult: DataColumn <Boolean > get() = this [" isAdult" ] as DataColumn <Boolean >
98
- val DataRow <DataFrameType1 >.isAdult: String get() = this [" isAdult" ] as Boolean
99
- ```
100
-
101
- Then variable ` filtered ` is cast to new interface:
102
-
103
- ``` kotlin
104
- val temp = filtered
105
- ```
106
-
107
- ``` kotlin
108
- val filtered = temp.cast<DataFrameType1 >()
109
- ```
110
-
111
- ## Custom data schemas
112
-
113
- You can define your own [ ` DataSchema ` ] ( schema.md ) interfaces and use them in functions and classes to represent [ ` DataFrame ` ] ( DataFrame.md ) with
114
- specific set of columns:
115
-
116
- ``` kotlin
117
- @DataSchema
118
- interface Person {
119
- val name: String
120
- val age: Int
121
- }
122
- ```
123
-
124
- After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be
125
- generated. Now we can use these properties to create functions for typed [ ` DataFrame ` ] ( DataFrame.md ) :
126
-
127
- ``` kotlin
128
- fun DataFrame<Person>.splitName () = split { name }.by(" ," ).into(" firstName" , " lastName" )
129
- fun DataFrame<Person>.adults () = filter { age > 18 }
130
- ```
131
-
132
- In Jupyter these functions will work automatically for any [ ` DataFrame ` ] ( DataFrame.md ) that matches ` Person ` schema:
133
-
134
- <!-- -FUN extendedDf-->
135
-
136
- ``` kotlin
137
- val df = dataFrameOf(" name" , " age" , " weight" )(
138
- " Merton, Alice" , 15 , 60.0 ,
139
- " Marley, Bob" , 20 , 73.5
140
- )
141
- ```
142
-
143
- <!-- -END-->
144
-
145
- Schema of ` df ` is compatible with ` Person ` , so auto-generated schema interface will inherit from it:
146
-
147
- ``` kotlin
148
- @DataSchema(isOpen = false )
149
- interface DataFrameType : Person
150
-
151
- val ColumnsContainer <DataFrameType >.weight: DataColumn <Double > get() = this [" weight" ] as DataColumn <Double >
152
- val DataRow <DataFrameType >.weight: Double get() = this [" weight" ] as Double
153
- ```
154
-
155
- Despite ` df ` has additional column ` weight ` , previously defined functions for ` DataFrame<Person> ` will work for it:
156
-
157
- <!-- -FUN splitNameWorks-->
158
-
159
- ``` kotlin
160
- df.splitName()
161
- ```
162
-
163
- <!-- -END-->
164
-
165
- ``` text
166
- firstName lastName age weight
167
- Merton Alice 15 60.000
168
- Marley Bob 20 73.125
169
- ```
170
-
171
- <!-- -FUN adultsWorks-->
172
-
173
- ``` kotlin
174
- df.adults()
175
- ```
176
-
177
- <!-- -END-->
178
-
179
- ``` text
180
- name age weight
181
- Marley, Bob 20 73.5
182
- ```
183
-
184
- In JVM project you will have to [ cast] ( cast.md ) [ ` DataFrame ` ] ( DataFrame.md ) explicitly to the target interface:
185
-
186
- ``` kotlin
187
- df.cast<Person >().splitName()
188
- ```
189
-
190
- ## Use external data schemas in Jupyter
191
-
192
- Sometimes it is convenient to extract reusable code from Jupyter notebook into Kotlin JVM library. If this code
193
- uses [ Custom data schemas] ( #custom-data-schemas ) , schema interfaces should also be extracted. In order to enable support
194
- them in Jupyter, you should register them in
195
- library [ integration class] ( https://github.com/Kotlin/kotlin-jupyter/blob/master/docs/libraries.md ) with ` useSchema `
196
- function:
197
-
198
- ``` kotlin
199
- @DataSchema
200
- interface Person {
201
- val name: String
202
- val age: Int
203
- }
204
-
205
- fun DataFrame<Person>.countAdults () = count { it[Person ::age] > 18 }
206
-
207
- @JupyterLibrary
208
- internal class Integration : JupyterIntegration () {
209
-
210
- override fun Builder.onLoaded () {
211
- onLoaded {
212
- useSchema<Person >()
213
- }
214
- }
215
- }
216
- ```
217
-
218
- After loading this library into Jupyter notebook, schema interfaces for all [ ` DataFrame ` ] ( DataFrame.md ) variables that match ` Person `
219
- schema will derive from ` Person `
220
-
221
- <!-- -FUN createDf-->
222
-
223
- ``` kotlin
224
- val df = dataFrameOf(" name" , " age" )(
225
- " Alice" , 15 ,
226
- " Bob" , 20
227
- )
228
- ```
229
-
230
- <!-- -END-->
231
-
232
- Now ` df ` is assignable to ` DataFrame<Person> ` and ` countAdults ` is available:
233
-
234
- ``` kotlin
235
- df.countAdults()
236
- ```
237
-
238
- ## Import Data Schemas, e.g. from OpenAPI, in Jupyter
239
-
240
- Similar to [ importing OpenAPI data schemas in Gradle projects] ( gradle.md#openapi-schemas ) , you can also
241
- do this in Jupyter notebooks. There is only a slight difference in notation:
242
-
243
- Import the schema using any path (` String ` ), ` URL ` , or ` File ` :
244
-
245
- ``` kotlin
246
- val PetStore = importDataSchema(" https://petstore3.swagger.io/api/v3/openapi.json" )
247
- ```
248
-
249
- and then from next cell you run and onwards, you can call, for example:
250
-
251
- ``` kotlin
252
- val df = PetStore .Pet .readJson(" https://petstore3.swagger.io/api/v3/pet/findByStatus?status=available" )
253
- ```
254
-
255
- So, very similar indeed!
256
-
257
- (Note: The type of ` PetStore ` will be generated as ` PetStoreDataSchema ` , but this doesn't affect the way you can use
258
- it.)
41
+ * [ ** Import Data Schemas, e.g. from OpenAPI, in Jupyter** ] ( schemasImportOpenApiJupyter.md ) <br />
42
+ Similar to [ importing OpenAPI Data Schemas in Gradle projects] ( schemasImportOpenApiGradle.md ) ,
43
+ you can also do this in Jupyter Notebooks.
0 commit comments