You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-47Lines changed: 26 additions & 47 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -82,24 +82,13 @@ Keep in mind that both `add:atKey:` and `atKey:put:` messages don't create a new
82
82
83
83
### Creating a DataFrame
84
84
There are four ways of creating a data frame:
85
-
1.Creating an empty data frame, then filling it with data
86
-
2.Creating a data frame from an array of rows
87
-
3.Creating a data frame from an array of columns
88
-
4.Reading data from a file
85
+
1.from an array of rows or columns
86
+
2. from matrix
87
+
3. from file
88
+
4.loading a built-in dataset
89
89
90
-
#### Creating an empty DataFrame
91
-
You can create an empty instance of `DataFrame` using the `new` message
92
-
93
-
```smalltalk
94
-
df := DataFrame new.
95
-
```
96
-
The data can be added later using the `add:` message.
97
-
```smalltalk
98
-
df add: #('Barcelona' 1.609 true).
99
-
```
100
-
101
-
#### Creating a DataFrame from an array of rows
102
-
This way is the best for creating simple examples for testing since you can see how the data will be arranged in your data frame.
90
+
#### 1. Creating a DataFrame from an array of rows or columns
91
+
The easiest and most straightforward way of creating a DataFrame is by passing all data in an array of arrays to `fromRows:` or `fromColumns:` message. Here is an example of initializing a DataFrame with rows:
103
92
104
93
```smalltalk
105
94
df := DataFrame fromRows: #(
@@ -108,8 +97,7 @@ df := DataFrame fromRows: #(
108
97
('London' 8.788 false)).
109
98
```
110
99
111
-
#### Creating a DataFrame from an array of columns
112
-
We can do the same by passing an array of columns
100
+
The same data frame can be created from the array of columns
This is the most common way of creating a data frame. You have some dataset in a file (CSV, Excel etc.) - just ask a `DataFrame` to read it. At this point only CSV files are supported, but very soon you will also be able to read the data from other formats.
123
-
124
-
```smalltalk
125
-
df := DataFrame fromCSV: 'path/to/your/file.csv'.
126
-
```
127
-
128
-
### Loading the built-in datasets
129
-
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. At this only two datasets are supported - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) and a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data).
109
+
Since the names of rows and columns are not provided, they are initialized with their default values: `(1 to: self numberOfRows)` and `(1 to: self numberOfColumns)`. Both `rowNames` and `columnNames` can always be changed by passing an array of new names to a corresponding accessor. This array must be of the same size as the number of rows and columns.
130
110
131
111
```smalltalk
132
-
df := DataFrame loadIris.
133
-
df := DataFrame loadHousing.
112
+
df columnNames: #(City Population BeenThere).
113
+
df rowNames: #(A B C).
134
114
```
135
115
136
-
### Exploring the created DataFrame
137
-
If we print (Ctrl+P) the data frame that was created from an array of rows or columns as described in previous sections, we will see the following table
116
+
If you print (Ctrl+P) this data frame, you will this pretty-printed table that can be coppied and pasted into letters, blog posts, and tutorials (such as this one)
138
117
139
118
```
140
-
| 1 2 3
141
-
---+-------------------------
142
-
1 | Barcelona 1.609 true
143
-
2 | Dubai 2.789 true
144
-
3 | London 8.788 false
119
+
| CityPopulation BeenThere
120
+
---+----------------------------------
121
+
A | Barcelona 1.609 true
122
+
B | Dubai 2.789 true
123
+
C | London 8.788 false
145
124
```
146
125
147
-
As you can see, both row and column names were automatically set to numeric sequences. We can using change them by passing an array of new names. This array must be of the same size as the number of rows and columns.
126
+
#### 3. Reading data from file
127
+
This is the most common way of creating a data frame. You have some dataset in a file (CSV, Excel etc.) - just ask a DataFrame to read it. At this point only CSV files are supported, but very soon you will also be able to read the data from other formats.
148
128
149
129
```smalltalk
150
-
df columnNames: #(City Population SomeBool).
151
-
df rowNames: #(A B C).
130
+
df := DataFrame fromCSV: 'path/to/your/file.csv'.
152
131
```
153
132
154
-
Now if we print our data frame, it will look like this
133
+
### 4. Loading the built-in datasets
134
+
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data), and a tipping dataset.
155
135
156
-
```
157
-
| City Population SomeBool
158
-
---+---------------------------------
159
-
A | Barcelona 1.609 true
160
-
B | Dubai 2.789 true
161
-
C | London 8.788 false
136
+
```smalltalk
137
+
DataFrame loadIris.
138
+
DataFrame loadHousing.
139
+
DataFrame loadTips.
162
140
```
163
141
142
+
### Exploring the created DataFrame
164
143
To get the dimensions of a data frame, its rows, and columns, we can say
0 commit comments