Skip to content

Commit c892e30

Browse files
committed
looking on how to create pyjviz output example on github
1 parent 4962781 commit c892e30

File tree

8 files changed

+475
-2
lines changed

8 files changed

+475
-2
lines changed

README.md

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,9 @@
11
# pyjviz
2-
visualization of pyjanitor method chains
2+
3+
`pyjviz` is Python package to visual support of programmers and data engeneers efforts using `pyjanitor` package.
4+
`pyjviz` provides simple way to see method call chains flow and intermidiate results.
5+
6+
## Quick start
37

48
to run examples install pyjanitor, rdflib and graphviz. After that you can install pyjviz:
59

@@ -11,8 +15,57 @@ cd examples/scripts
1115
python a0.py
1216
```
1317

14-
Resulting logs are in ~/.pyjviz/rdflog - visualized output stored in .png files.
18+
Resulting logs are in ~/.pyjviz/rdflog - visualized output stored in .svg files.
19+
20+
## How pyjviz helps pyjanitor users?
21+
22+
Consider pyjanitor example why-janitor.py. Modified version is given below (also avaliable here):
23+
24+
```python
25+
import numpy as np
26+
import pandas as pd
27+
import janitor
28+
import pyjviz
29+
30+
# Sample Data curated for this example
31+
company_sales = {
32+
'SalesMonth': ['Jan', 'Feb', 'Mar', 'April'],
33+
'Company1': [150.0, 200.0, 300.0, 400.0],
34+
'Company2': [180.0, 250.0, np.nan, 500.0],
35+
'Company3': [400.0, 500.0, 600.0, 675.0]
36+
}
37+
38+
print(pd.DataFrame.from_dict(company_sales))
39+
# SalesMonth Company1 Company2 Company3
40+
# 0 Jan 150.0 180.0 400.0
41+
# 1 Feb 200.0 250.0 500.0
42+
# 2 Mar 300.0 NaN 600.0
43+
# 3 April 400.0 500.0 675.0
44+
45+
with pyjviz.CB() as sg:
46+
df = (
47+
pd.DataFrame.from_dict(company_sales)
48+
.remove_columns(["Company1"])
49+
.dropna(subset=["Company2", "Company3"])
50+
.rename_column("Company2", "Amazon")
51+
.rename_column("Company3", "Facebook")
52+
.add_column("Google", [450.0, 550.0, 800.0])
53+
)
54+
55+
# Output looks like this:
56+
# Out[15]:
57+
# SalesMonth Amazon Facebook Google
58+
# 0 Jan 180.0 400.0 450.0
59+
# 1 Feb 250.0 500.0 550.0
60+
# 3 April 500.0 675.0 800.0
61+
62+
# comment line below to fix spurious apply calls caused by pandas printing implementation
63+
print(df)
64+
65+
pyjviz.save_dot(vertical = True)
66+
```
1567

68+
The besides usual output to stdout the code will produce this SVG file.
1669
pyjviz visualization of pyjanitor method pipes (or chains) is based on dumping of RDF log of pyjanitor method calls into rdf log file. Resulting RDF log file contains graph of method calls where user could trace method execution as well as user-defined data useful for visual inspection. Note that visualisation of pyjviz RDF log is not a main goal of provided package. Graphviz visualization avaiable in the package is rather reference implementation with quite limited capablities. However RDF structure defined in rdflog.shacl.ttl could be used by SPARQL processor for visualization and other tasks.
1770

1871
Obj is representation of pyjanitor object like pandas DataFrame. However input args are not objects rather object states. The state of object is represeneted by RDF class ObjState. The idea to separate object and object state is introduced to enable pyjviz to visualize situation when object has mutliple states used in method chain due to in-place operations. Such practice is discouraged by most of data packages but still may be used. In most cases where object has only state defined when object is created there is not difference betwen object and object state since there is one-to-one correspondence (isomorfism). So in some context below refernce to an object may imply object state instead.

doc/tmp/tmp1kgya8pf.html

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Company2</th>
7+
<th>Company3</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr>
12+
<th>0</th>
13+
<td>Jan</td>
14+
<td>180.0</td>
15+
<td>400.0</td>
16+
</tr>
17+
<tr>
18+
<th>1</th>
19+
<td>Feb</td>
20+
<td>250.0</td>
21+
<td>500.0</td>
22+
</tr>
23+
<tr>
24+
<th>3</th>
25+
<td>April</td>
26+
<td>500.0</td>
27+
<td>675.0</td>
28+
</tr>
29+
</tbody>
30+
</table>

doc/tmp/tmp987a9g36.html

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Company1</th>
7+
<th>Company2</th>
8+
<th>Company3</th>
9+
</tr>
10+
</thead>
11+
<tbody>
12+
<tr>
13+
<th>0</th>
14+
<td>Jan</td>
15+
<td>150.0</td>
16+
<td>180.0</td>
17+
<td>400.0</td>
18+
</tr>
19+
<tr>
20+
<th>1</th>
21+
<td>Feb</td>
22+
<td>200.0</td>
23+
<td>250.0</td>
24+
<td>500.0</td>
25+
</tr>
26+
<tr>
27+
<th>2</th>
28+
<td>Mar</td>
29+
<td>300.0</td>
30+
<td>nan</td>
31+
<td>600.0</td>
32+
</tr>
33+
<tr>
34+
<th>3</th>
35+
<td>April</td>
36+
<td>400.0</td>
37+
<td>500.0</td>
38+
<td>675.0</td>
39+
</tr>
40+
</tbody>
41+
</table>

doc/tmp/tmp9bzyuxo5.html

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Company2</th>
7+
<th>Company3</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr>
12+
<th>0</th>
13+
<td>Jan</td>
14+
<td>180.0</td>
15+
<td>400.0</td>
16+
</tr>
17+
<tr>
18+
<th>1</th>
19+
<td>Feb</td>
20+
<td>250.0</td>
21+
<td>500.0</td>
22+
</tr>
23+
<tr>
24+
<th>2</th>
25+
<td>Mar</td>
26+
<td>nan</td>
27+
<td>600.0</td>
28+
</tr>
29+
<tr>
30+
<th>3</th>
31+
<td>April</td>
32+
<td>500.0</td>
33+
<td>675.0</td>
34+
</tr>
35+
</tbody>
36+
</table>

doc/tmp/tmpf4mp97po.html

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Amazon</th>
7+
<th>Company3</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr>
12+
<th>0</th>
13+
<td>Jan</td>
14+
<td>180.0</td>
15+
<td>400.0</td>
16+
</tr>
17+
<tr>
18+
<th>1</th>
19+
<td>Feb</td>
20+
<td>250.0</td>
21+
<td>500.0</td>
22+
</tr>
23+
<tr>
24+
<th>3</th>
25+
<td>April</td>
26+
<td>500.0</td>
27+
<td>675.0</td>
28+
</tr>
29+
</tbody>
30+
</table>

doc/tmp/tmphyl44fvd.html

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Amazon</th>
7+
<th>Facebook</th>
8+
<th>Google</th>
9+
</tr>
10+
</thead>
11+
<tbody>
12+
<tr>
13+
<th>0</th>
14+
<td>Jan</td>
15+
<td>180.0</td>
16+
<td>400.0</td>
17+
<td>450.0</td>
18+
</tr>
19+
<tr>
20+
<th>1</th>
21+
<td>Feb</td>
22+
<td>250.0</td>
23+
<td>500.0</td>
24+
<td>550.0</td>
25+
</tr>
26+
<tr>
27+
<th>3</th>
28+
<td>April</td>
29+
<td>500.0</td>
30+
<td>675.0</td>
31+
<td>800.0</td>
32+
</tr>
33+
</tbody>
34+
</table>

doc/tmp/tmpul6mvb3o.html

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<table border='1' class='dataframe'>
2+
<thead>
3+
<tr style='text-align: right;'>
4+
<th></th>
5+
<th>SalesMonth</th>
6+
<th>Amazon</th>
7+
<th>Facebook</th>
8+
</tr>
9+
</thead>
10+
<tbody>
11+
<tr>
12+
<th>0</th>
13+
<td>Jan</td>
14+
<td>180.0</td>
15+
<td>400.0</td>
16+
</tr>
17+
<tr>
18+
<th>1</th>
19+
<td>Feb</td>
20+
<td>250.0</td>
21+
<td>500.0</td>
22+
</tr>
23+
<tr>
24+
<th>3</th>
25+
<td>April</td>
26+
<td>500.0</td>
27+
<td>675.0</td>
28+
</tr>
29+
</tbody>
30+
</table>

0 commit comments

Comments
 (0)