You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+3-58Lines changed: 3 additions & 58 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Data frames are the one of the essential parts of the data science toolkit. They
18
18
A data frame is like a database inside a variable. It is an object which can be created, modified, copied, serialized, debugged, inspected, and garbage collected. It allows you to communicate with your data quickly and effortlessly, using just a few lines of code. DataFrame project is similar to [pandas](https://pandas.pydata.org/) library in Python or built-in [data.frame](https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/data.frame) class in R.
19
19
20
20
## Installation
21
-
The following script installs DataFrame into the Pharo image
21
+
To install DataFrame, go to the Playground (`Ctrl+OW`) in your fresh Pharo image and execute the following Metacello script (select it and press Do-it button or `Ctrl+D`):
22
22
23
23
```smalltalk
24
24
Metacello new
@@ -27,65 +27,10 @@ Metacello new
27
27
load.
28
28
```
29
29
30
+
## Simple example
31
+
30
32
## DataFrame Booklet
31
33
32
34
For more information, please read [Data Analysis Made Simple with Pharo DataFrame](https://github.com/SquareBracketAssociates/Booklet-DataFrame) - a booklet that serves as the main source of documentation for the DataFrame project. It describes the complete API of DataFrame and DataSeries data structures, and provides examples for each method.
This is a small example that will demonstrate how DataFrame can be used for collecting and preprocessing the dataset of methods. For more detailed information read the [DataFrame booklet](https://github.com/SquareBracketAssociates/Booklet-DataFrame).
39
-
40
-
### Collecting all methods from the image
41
-
42
-
First we collect an array of all methods in the image - instances of CompiledMethod class that belong to some package:
From each method we extract its name, source code, package name, and class name. We remove first line of source code of each method to remove method's name from it:
48
-
```Smalltalk
49
-
rows := methods collect: [ :method |
50
-
{
51
-
method package name .
52
-
method methodClass name .
53
-
method selector .
54
-
method sourceCode copyAfter: Character cr
55
-
} ].
56
-
```
57
-
### Creating a DataFrame
58
-
We create a DataFrame and specify the names of its columns:
We add a new column with number of arguments for each method. To do that we count the number of occurences of `:` symbol in method's name:
66
-
```Smalltalk
67
-
methodsData
68
-
addColumn: ((methodsData column: #methodName)
69
-
collect: [ :name | name occurrencesOf: $: ])
70
-
named: #numberOfArgs.
71
-
```
72
-
### Filtering data
73
-
Now we select only those methods that belong to package [Renraku](https://github.com/Uko/Renraku), have at least one argument, and source code with less than 5 tokens:
First we sort methods by their names and then we sort the result by number of arguments in descending order:
82
-
```Smalltalk
83
-
renrakuMethods
84
-
sortBy: #methodName;
85
-
sortDescendingBy: #numberOfArgs.
86
-
```
87
-
### Selecting specific columns
88
-
We selecting only 4 columns (without className) and specify their order. If you inspect the result of this query, you will see the table similar to the one in a screenshot above.
0 commit comments