Skip to content

Commit ba6fd4e

Browse files
committed
Update README.md [ci skip]
1 parent 416130a commit ba6fd4e

File tree

1 file changed

+26
-47
lines changed

1 file changed

+26
-47
lines changed

README.md

Lines changed: 26 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -82,24 +82,13 @@ Keep in mind that both `add:atKey:` and `atKey:put:` messages don't create a new
8282

8383
### Creating a DataFrame
8484
There are four ways of creating a data frame:
85-
1. Creating an empty data frame, then filling it with data
86-
2. Creating a data frame from an array of rows
87-
3. Creating a data frame from an array of columns
88-
4. Reading data from a file
85+
1. from an array of rows or columns
86+
2. from matrix
87+
3. from file
88+
4. loading a built-in dataset
8989

90-
#### Creating an empty DataFrame
91-
You can create an empty instance of `DataFrame` using the `new` message
92-
93-
```smalltalk
94-
df := DataFrame new.
95-
```
96-
The data can be added later using the `add:` message.
97-
```smalltalk
98-
df add: #('Barcelona' 1.609 true).
99-
```
100-
101-
#### Creating a DataFrame from an array of rows
102-
This way is the best for creating simple examples for testing since you can see how the data will be arranged in your data frame.
90+
#### 1. Creating a DataFrame from an array of rows or columns
91+
The easiest and most straightforward way of creating a DataFrame is by passing all data in an array of arrays to `fromRows:` or `fromColumns:` message. Here is an example of initializing a DataFrame with rows:
10392

10493
```smalltalk
10594
df := DataFrame fromRows: #(
@@ -108,8 +97,7 @@ df := DataFrame fromRows: #(
10897
('London' 8.788 false)).
10998
```
11099

111-
#### Creating a DataFrame from an array of columns
112-
We can do the same by passing an array of columns
100+
The same data frame can be created from the array of columns
113101

114102
```smalltalk
115103
df := DataFrame fromColumns: #(
@@ -118,49 +106,40 @@ df := DataFrame fromColumns: #(
118106
(true true false)).
119107
```
120108

121-
#### Reading data from a file
122-
This is the most common way of creating a data frame. You have some dataset in a file (CSV, Excel etc.) - just ask a `DataFrame` to read it. At this point only CSV files are supported, but very soon you will also be able to read the data from other formats.
123-
124-
```smalltalk
125-
df := DataFrame fromCSV: 'path/to/your/file.csv'.
126-
```
127-
128-
### Loading the built-in datasets
129-
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. At this only two datasets are supported - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) and a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data).
109+
Since the names of rows and columns are not provided, they are initialized with their default values: `(1 to: self numberOfRows)` and `(1 to: self numberOfColumns)`. Both `rowNames` and `columnNames` can always be changed by passing an array of new names to a corresponding accessor. This array must be of the same size as the number of rows and columns.
130110

131111
```smalltalk
132-
df := DataFrame loadIris.
133-
df := DataFrame loadHousing.
112+
df columnNames: #(City Population BeenThere).
113+
df rowNames: #(A B C).
134114
```
135115

136-
### Exploring the created DataFrame
137-
If we print (Ctrl+P) the data frame that was created from an array of rows or columns as described in previous sections, we will see the following table
116+
If you print (Ctrl+P) this data frame, you will this pretty-printed table that can be coppied and pasted into letters, blog posts, and tutorials (such as this one)
138117

139118
```
140-
| 1 2 3
141-
---+-------------------------
142-
1 | Barcelona 1.609 true
143-
2 | Dubai 2.789 true
144-
3 | London 8.788 false
119+
| City Population BeenThere
120+
---+----------------------------------
121+
A | Barcelona 1.609 true
122+
B | Dubai 2.789 true
123+
C | London 8.788 false
145124
```
146125

147-
As you can see, both row and column names were automatically set to numeric sequences. We can using change them by passing an array of new names. This array must be of the same size as the number of rows and columns.
126+
#### 3. Reading data from file
127+
This is the most common way of creating a data frame. You have some dataset in a file (CSV, Excel etc.) - just ask a DataFrame to read it. At this point only CSV files are supported, but very soon you will also be able to read the data from other formats.
148128

149129
```smalltalk
150-
df columnNames: #(City Population SomeBool).
151-
df rowNames: #(A B C).
130+
df := DataFrame fromCSV: 'path/to/your/file.csv'.
152131
```
153132

154-
Now if we print our data frame, it will look like this
133+
### 4. Loading the built-in datasets
134+
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data), and a tipping dataset.
155135

156-
```
157-
| City Population SomeBool
158-
---+---------------------------------
159-
A | Barcelona 1.609 true
160-
B | Dubai 2.789 true
161-
C | London 8.788 false
136+
```smalltalk
137+
DataFrame loadIris.
138+
DataFrame loadHousing.
139+
DataFrame loadTips.
162140
```
163141

142+
### Exploring the created DataFrame
164143
To get the dimensions of a data frame, its rows, and columns, we can say
165144

166145
```smalltalk

0 commit comments

Comments
 (0)