You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24-5Lines changed: 24 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -87,7 +87,7 @@ There are four ways of creating a data frame:
87
87
3. from file
88
88
4. loading a built-in dataset
89
89
90
-
#### 1. Creating a DataFrame from an array of rows or columns
90
+
#### 1. Creating DataFrame from an array of rows or columns
91
91
The easiest and most straightforward way of creating a DataFrame is by passing all data in an array of arrays to `fromRows:` or `fromColumns:` message. Here is an example of initializing a DataFrame with rows:
92
92
93
93
```smalltalk
@@ -123,15 +123,34 @@ B | Dubai 2.789 true
123
123
C | London 8.788 false
124
124
```
125
125
126
+
#### 2. Creating DataFrame from a Matrix
127
+
By it's nature DataFrame is similar to a matrix. It works like a table of values, supports matrix accessors, such as `at:at:` or `at:at:put:` and in some cases can be treated like a matrix. Some classes provide tabular data in matrix format. For example TabularWorksheet class of [Tabular]() package that is used for reading XLSX files. To initialize a DataFrame from a maxtrix of values, use `fromMatrix:` method
128
+
129
+
```smalltalk
130
+
matrix := Matrix
131
+
rows: 3 columns: 3
132
+
contents:
133
+
#('Barcelona' 1.609 true
134
+
'Dubai' 2.789 true
135
+
'London' 8.788 false).
136
+
137
+
df := DataFrame fromMatrix: matrix.
138
+
```
139
+
140
+
Once again, the names of rows and columns are set to their default values.
141
+
126
142
#### 3. Reading data from file
127
-
This is the most common way of creating a data frame. You have some dataset in a file (CSV, Excel etc.) - just ask a DataFrame to read it. At this point only CSV files are supported, but very soon you will also be able to read the data from other formats.
143
+
In most real-world scenarios the data is located in a file or database. The support for database connections will be added in future releases. Right now DataFrame provides you the methods for loading data from two most commot file formats: CSV and XLSX
128
144
129
145
```smalltalk
130
-
df := DataFrame fromCSV: 'path/to/your/file.csv'.
146
+
DataFrame fromCSV: 'path/to/your/file.csv'.
147
+
DataFrame fromXLSX: 'path/to/your/file.xlsx'.
131
148
```
132
149
133
-
### 4. Loading the built-in datasets
134
-
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data), and a tipping dataset.
150
+
Since JSON does not store data as a table, it is not possible to read such file directly into a DataFrame. However, you can parse JSON using [NeoJSON](https://ci.inria.fr/pharo-contribution/job/EnterprisePharoBook/lastSuccessfulBuild/artifact/book-result/NeoJSON/NeoJSON.html) or any other library, construct an array of rows and pass it to `fromRows:` message, as described in previous sections.
151
+
152
+
#### 4. Loading the built-in datasets
153
+
DataFrame provides several famous datasets for you to play with. They are compact and can be loaded with a simple message. An this point there are three datasets that can be loaded in this way - [Iris flower dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set), a simplified [Boston Housing dataset](https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data), and [Restaurant tipping dataset](https://vincentarelbundock.github.io/Rdatasets/doc/reshape2/tips.html).
0 commit comments