|
2 | 2 |
|
3 | 3 | <!---IMPORT org.jetbrains.kotlinx.dataframe.samples.api.Access--> |
4 | 4 |
|
5 | | -[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns. |
| 5 | +[`DataFrame`](DataFrame.md) provides a DSL for selecting an arbitrary set of columns: the Columns Selection DSL. |
6 | 6 |
|
7 | 7 | Column selectors are used in many operations: |
8 | 8 |
|
@@ -39,6 +39,195 @@ df.move { name.firstName and name.lastName }.after { city } |
39 | 39 | </tab> |
40 | 40 | </tabs> |
41 | 41 |
|
| 42 | +#### Functions Overview: |
| 43 | + |
| 44 | +##### First (Col), Last (Col), Single (Col) |
| 45 | +`first {}`, `firstCol()`, `last {}`, `lastCol()`, `single {}`, `singleCol()` |
| 46 | + |
| 47 | +Returns the first, last, or single column from the top-level, specified [column group](DataColumn.md#columngroup), |
| 48 | +or `ColumnSet` that adheres to the optional given condition. If no column adheres to the given condition, |
| 49 | +`NoSuchElementException` is thrown. |
| 50 | + |
| 51 | +##### Col |
| 52 | +`col(name)`, `col(5)`, `this[5]` |
| 53 | + |
| 54 | +Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a column with the given |
| 55 | +argument from the top-level or specified [column group](DataColumn.md#columngroup). The argument can be either an |
| 56 | +index (`Int`) or a reference to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`; |
| 57 | +any [AccessApi](apiLevels.md)). |
| 58 | + |
| 59 | +##### Value Col, Frame Col, Col Group |
| 60 | +`valueCol(name)`, `valueCol(5)`, `frameCol(name)`, `frameCol(5)`, `colGroup(name)`, `colGroup(5)` |
| 61 | + |
| 62 | +Creates a [ColumnAccessor](DataColumn.md#column-accessors) (or `SingleColumn`) for a |
| 63 | +[value column](DataColumn.md#valuecolumn) / [frame column](DataColumn.md#framecolumn) / |
| 64 | +[column group](DataColumn.md#columngroup) with the given argument from the top-level or |
| 65 | +specified [column group](DataColumn.md#columngroup). The argument can be either an index (`Int`) or a reference |
| 66 | +to a column (`String`, `ColumnPath`, `KProperty`, or `ColumnAccessor`; any [AccessApi](apiLevels.md)). |
| 67 | +The functions can be both typed and untyped (in case you're supplying a column name, -path, or index). |
| 68 | +These functions throw an `IllegalArgumentException` if the column found is not the right kind. |
| 69 | + |
| 70 | +##### Cols |
| 71 | +`cols {}`, `cols()`, `cols(colA, colB)`, `cols(1, 5)`, `cols(1..5)`, `[{}]` |
| 72 | + |
| 73 | +Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup), |
| 74 | +or `ColumnSet`. |
| 75 | +You can use either a `ColumnFilter`, or any of the `vararg` overloads for any [AccessApi](apiLevels.md). |
| 76 | +The function can be both typed and untyped (in case you're supplying a column name, -path, or index (range)). |
| 77 | + |
| 78 | +##### Range of Columns |
| 79 | +`colA.."colB"` |
| 80 | + |
| 81 | +Creates a `ColumnSet` containing all columns from `colA` to `colB` (inclusive) from the top-level. |
| 82 | +Columns inside [column groups](DataColumn.md#columngroup) are also supported |
| 83 | +(as long as they share the same direct parent), as well as any combination of [AccessApi](apiLevels.md). |
| 84 | + |
| 85 | +##### Value Columns, Frame Columns, Column Groups |
| 86 | +`valueCols {}`, `valueCols()`, `frameCols {}`, `frameCols()`, `colGroups {}`, `colGroups()` |
| 87 | + |
| 88 | +Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup), |
| 89 | +or `ColumnSet` containing only [value columns](DataColumn.md#valuecolumn) / [frame columns](DataColumn.md#framecolumn) / |
| 90 | +[column groups](DataColumn.md#columngroup) that adhere to the optional condition. |
| 91 | + |
| 92 | +##### Cols of Kind |
| 93 | +`colsOfKind(Value, Frame) {}`, `colsOfKind(Group, Frame)` |
| 94 | + |
| 95 | +Creates a subset of columns (`ColumnSet`) from the top-level, specified [column group](DataColumn.md#columngroup), |
| 96 | +or `ColumnSet` containing only columns of the specified kind(s) that adhere to the optional condition. |
| 97 | + |
| 98 | +##### All (Cols) |
| 99 | +`all()`, `allCols()` |
| 100 | + |
| 101 | +Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup), |
| 102 | +or `ColumnSet`. This is the opposite of `none()` and equivalent to `cols()` without filter. |
| 103 | +Note, on [column groups](DataColumn.md#columngroup), `all` is named `allCols` instead to avoid confusion. |
| 104 | + |
| 105 | +##### All (Cols) After, -Before, -From, -Up To |
| 106 | +`allAfter(colA)`, `allBefore(colA)`, `allColsFrom(colA)`, `allColsUpTo(colA)` |
| 107 | + |
| 108 | +Creates a `ColumnSet` containing a subset of columns from the top-level, |
| 109 | +specified [column group](DataColumn.md#columngroup), or `ColumnSet`. |
| 110 | +The subset includes: |
| 111 | +- `all(Cols)Before(colA)`: All columns before the specified column, excluding that column. |
| 112 | +- `all(Cols)After(colA)`: All columns after the specified column, excluding that column. |
| 113 | +- `all(Cols)From(colA)`: All columns from the specified column, including that column. |
| 114 | +- `all(Cols)UpTo(colA)`: All columns up to the specified column, including that column. |
| 115 | + |
| 116 | +NOTE: The `{}` overloads of these functions in the Plain DSL and on [column groups](DataColumn.md#columngroup) |
| 117 | +are a `ColumnSelector` (relative to the receiver). |
| 118 | +On `ColumnSets` they are a `ColumnFilter` instead. |
| 119 | + |
| 120 | +##### Cols at any Depth |
| 121 | +`colsAtAnyDepth {}`, `colsAtAnyDepth()` |
| 122 | + |
| 123 | +Creates a `ColumnSet` containing all columns from the top-level, specified [column group](DataColumn.md#columngroup), |
| 124 | +or `ColumnSet` at any depth if they satisfy the optional given predicate. This means that columns (of all three kinds!) |
| 125 | +nested inside [column groups](DataColumn.md#columngroup) are also included. |
| 126 | +This function can also be followed by another `ColumnSet` filter-function like `colsOf<>()`, `single()`, |
| 127 | +or `valueCols()`. |
| 128 | + |
| 129 | +**For example:** |
| 130 | + |
| 131 | +Depth-first search to a column containing the value "Alice": |
| 132 | + |
| 133 | +`df.select { colsAtAnyDepth().first { "Alice" in it.values() } }` |
| 134 | + |
| 135 | +The columns at any depth excluding the top-level: |
| 136 | + |
| 137 | +`df.select { colGroups().colsAtAnyDepth() }` |
| 138 | + |
| 139 | +All [value-](DataColumn.md#valuecolumn) and [frame columns](DataColumn.md#framecolumn) at any depth: |
| 140 | + |
| 141 | +`df.select { colsAtAnyDepth { !it.isColumnGroup } }` |
| 142 | + |
| 143 | +All value columns at any depth nested under a column group named "myColGroup": |
| 144 | + |
| 145 | +`df.select { myColGroup.colsAtAnyDepth().valueCols() }` |
| 146 | + |
| 147 | + |
| 148 | +**Converting from deprecated syntax:** |
| 149 | + |
| 150 | +`dfs { condition }` -> `colsAtAnyDepth { condition }` |
| 151 | + |
| 152 | +`allDfs(includeGroups = false)` -> `colsAtAnyDepth { includeGroups || !it.isColumnGroup() }` |
| 153 | + |
| 154 | +`dfsOf<Type> { condition }` -> `colsAtAnyDepth().colsOf<Type> { condition }` |
| 155 | + |
| 156 | +`cols { condition }.recursively()` -> `colsAtAnyDepth { condition }` |
| 157 | + |
| 158 | +`first { condition }.rec()` -> `colsAtAnyDepth { condition }.first()` |
| 159 | + |
| 160 | +`all().recursively()` -> `colsAtAnyDepth()` |
| 161 | + |
| 162 | +##### Cols in Groups |
| 163 | +`colsInGroups {}`, `colsInGroups()` |
| 164 | + |
| 165 | +Creates a `ColumnSet` containing all columns that are nested in the [column groups](DataColumn.md#columngroup) at |
| 166 | +the top-level, specified [column group](DataColumn.md#columngroup), or `ColumnSet` adhering to an optional predicate. |
| 167 | +This is useful if you want to select all columns that are "one level down". |
| 168 | + |
| 169 | +This function used to be called `children()` in the past. |
| 170 | + |
| 171 | +**For example:** |
| 172 | + |
| 173 | +To get the columns inside all [column groups](DataColumn.md#columngroup) in a [dataframe](DataFrame.md), |
| 174 | +instead of having to write: |
| 175 | + |
| 176 | +`df.select { colGroupA.cols() and colGroupB.cols() ... }` |
| 177 | + |
| 178 | +you can use: |
| 179 | + |
| 180 | +`df.select { colsInGroups() }` |
| 181 | + |
| 182 | +or with filter: |
| 183 | + |
| 184 | +`df.select { colsInGroups { "user" in it.name } }` |
| 185 | + |
| 186 | +Similarly, you can take the columns inside all [column groups](DataColumn.md#columngroup) in a `ColumnSet`: |
| 187 | + |
| 188 | +`df.select { colGroups { "my" in it.name }.colsInGroups() }` |
| 189 | + |
| 190 | +##### Take (Last) (Cols) (While) |
| 191 | +`take(5)`, `takeLastCols(2)`, `takeLastWhile {}`, `takeColsWhile {}`, |
| 192 | + |
| 193 | +Creates a `ColumnSet` containing the first / last `n` columns from the top-level, |
| 194 | +specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition. |
| 195 | +Note, to avoid ambiguity, `take` is called `takeCols` when called on a [column group](DataColumn.md#columngroup). |
| 196 | + |
| 197 | +##### Drop (Last) (Cols) (While) |
| 198 | +`drop(5)`, `dropLastCols(2)`, `dropLastWhile {}`, `dropColsWhile {}` |
| 199 | + |
| 200 | +Creates a `ColumnSet` without the first / last `n` columns from the top-level, |
| 201 | +specified [column group](DataColumn.md#columngroup), or `ColumnSet` or those that adhere to the given condition. |
| 202 | +Note, to avoid ambiguity, `drop` is called `dropCols` when called on a [column group](DataColumn.md#columngroup). |
| 203 | + |
| 204 | +##### Select from [Column Group](DataColumn.md#columngroup) |
| 205 | +`colGroupA.select {}`, `"colGroupA" {}` |
| 206 | + |
| 207 | +Creates a `ColumnSet` containing the columns selected by the provided `ColumnsSelector` relative to the specified |
| 208 | +[column group](DataColumn.md#columngroup). In practice, this means you're opening a new selection scope inside a |
| 209 | +[column group](DataColumn.md#columngroup) and selecting columns from there. |
| 210 | +The selected columns are referenced individually and "unpacked" from their parent |
| 211 | +[column group](DataColumn.md#columngroup). |
| 212 | + |
| 213 | +**For example:** |
| 214 | + |
| 215 | +Select `myColGroup.someCol` and all `String` columns from `myColGroup`: |
| 216 | + |
| 217 | +`df.select { myColGroup.select { someCol and colsOf<String>() } }` |
| 218 | + |
| 219 | + |
| 220 | + |
| 221 | +`df.select { "myGroupCol" { "colA" and expr("newCol") { colB + 1 } } }` |
| 222 | + |
| 223 | +`df.select { "pathTo"["myGroupCol"].select { "colA" and "colB" } }` |
| 224 | + |
| 225 | +`df.select { it["myGroupCol"].asColumnGroup()() { "colA" and "colB" } }` |
| 226 | + |
| 227 | +TODO |
| 228 | + |
| 229 | +#### Examples: |
| 230 | + |
42 | 231 | **Select columns by name:** |
43 | 232 |
|
44 | 233 | <!---FUN columnSelectors--> |
|
0 commit comments