Skip to content

Commit e0c205c

Browse files
committed
Implemented groupBy for DataFrame
1 parent d64cd2e commit e0c205c

File tree

8 files changed

+108
-5
lines changed

8 files changed

+108
-5
lines changed
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
as yet unclassified
2+
group: colNameOrArray by: colName
3+
4+
| left right |
5+
6+
left := colNameOrArray isArray
7+
ifTrue: [ self columns: colNameOrArray ] "a DataFrame"
8+
ifFalse: [ self column: colNameOrArray ]. "a DataSeries"
9+
10+
right := self column: colName.
11+
12+
^ left groupBy: right.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
as yet unclassified
2+
groupBy: colName
3+
4+
| groupedColNames |
5+
6+
"We exclude the column by which we are grouping"
7+
groupedColNames := self columnNames copyWithout: colName.
8+
9+
^ DataFrameGrouped
10+
group: (self columns: groupedColNames)
11+
by: (self column: colName)
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
as yet unclassified
2+
group: aDataFrame by: aSeries
3+
4+
^ self new split: aDataFrame by: aSeries
Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,20 @@
11
private
22
apply: aBlock
33

4-
"TODO"
4+
| colNames numberOfRows numberOfColumns result |
5+
6+
colNames := (groups at: 1) columnNames.
7+
8+
numberOfRows := groups size.
9+
numberOfColumns := colNames size.
10+
11+
result := DataFrame new: (numberOfRows @ numberOfColumns).
12+
result rowNames: groups keys.
13+
result columnNames: colNames.
14+
15+
groups doWithIndex: [ :df :i |
16+
1 to: colNames size do: [ :j |
17+
result at: i at: j put:
18+
(aBlock value: (df columnAt: j)) ] ].
19+
20+
^ result
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
printing
2+
printOn: aStream
3+
4+
super printOn: aStream.
5+
aStream cr.
6+
7+
groups doWithIndex: [ :eachDataFrame :i |
8+
(groups keys at: i) printOn: aStream.
9+
aStream cr.
10+
eachDataFrame printOn: aStream.
11+
12+
i = groups size
13+
ifFalse: [ aStream cr; cr ] ]
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
private
2+
split: aDataFrame by: aSeries
3+
4+
| seriesUnique |
5+
6+
aDataFrame numberOfRows = aSeries size
7+
ifFalse: [ SizeMismatch signal ].
8+
9+
seriesUnique := aSeries unique asArray.
10+
11+
groups := seriesUnique collect: [ :eachUnique |
12+
| aList df |
13+
aList := LinkedList new.
14+
15+
aSeries doWithIndex: [ :each :i |
16+
each = eachUnique
17+
ifTrue: [ aList add: (aDataFrame rowAt: i) ] ].
18+
19+
df := DataFrame fromRows: aList.
20+
df columnNames: aDataFrame columnNames.
21+
df ].
22+
23+
groups := groups asDataSeries.
24+
groups keys: seriesUnique.
25+
26+
^ self

DataFrame-Core.package/DataFrameGrouped.class/properties.json

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,9 @@
55
"classinstvars" : [ ],
66
"pools" : [ ],
77
"classvars" : [ ],
8-
"instvars" : [ ],
8+
"instvars" : [
9+
"groups"
10+
],
911
"name" : "DataFrameGrouped",
1012
"type" : "normal"
1113
}

README.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,19 +31,38 @@ series keys: #(k1 k2 k3).
3131
```
3232

3333
### Creating a DataFrame
34-
The easiest way to create an object of `DataFrame` class is by passing it an array of rows or columns (an array of arrays). Creating a DataFrame from rows allows us to see how the data will be arranged in a table. It's very readable and can be handy if we need to write some simple examples for testing
34+
There are four ways of creating a data frame:
35+
1. Creating an empty data frame, then filling it with data
36+
2. Creating a data frame from an array of rows
37+
3. Creating a data frame from an array of columns
38+
4. Reading data from a file
39+
40+
#### Creating an empty DataFrame
41+
You can create an empty instance of `DataFrame` using the `new` message
42+
43+
```smalltalk
44+
df := DataFrame new.
45+
```
46+
The data can be added later using the `add:` message.
47+
```smalltalk
48+
df add: #('Barcelona' 1.609 true).
49+
```
50+
51+
#### Creating a DataFrame from an array of rows
52+
This way is the best for creating simple examples for testing since you can see how the data will be arranged in your data frame.
3553

3654
```smalltalk
37-
df := DataFrame fromRows: #(
55+
df := DataFrame rows: #(
3856
('Barcelona' 1.609 true)
3957
('Dubai' 2.789 true)
4058
('London' 8.788 false)).
4159
```
4260

61+
#### Creating a DataFrame from an array of columns
4362
We can do the same by passing an array of columns
4463

4564
```smalltalk
46-
df := DataFrame fromColumns: #(
65+
df := DataFrame rows: #(
4766
('Barcelona' 'Dubai' 'London')
4867
(1.609 2.789 8.788)
4968
(true true false)).

0 commit comments

Comments
 (0)