First argument of the `select:where:` message should be an array of column names. They will not affect the selection of rows, but the resulting data frame will contain only these columns. Second argument should be a block with boolean conditions that will be applied to each row of data frame. Only those rows that make a block return `true` will be selected. In your conditions you will be referencing the features of your observations. For example, in Iris dataset you might want to select those flowers that belong to `#setosa` species and have the width of sepal equal to `3`. To make queries more readable, DataFrame provides a querying language that allows you to specify the columns which you are using in your conditions as arguments of the where-block, and use these arguments in your conditions. So, for example, a block `[ :species | species = #setosa ]` passed to `select:where:` message will be translated to `[ :row | (row atKey: #species) = #setosa ]` and applied to every row of data frame. This means that all the arguments of the block you pass must correspond to the column names of your data frame.
0 commit comments