Skip to content

Commit d3e346b

Browse files
committed
added arrow between node into think-bayes2 example
1 parent a3c2b97 commit d3e346b

File tree

6 files changed

+124
-52
lines changed

6 files changed

+124
-52
lines changed

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,13 @@ The [`result`][res] of run is SVG file with clickable nodes to provide the way t
5757

5858
[res]: https://asmirnov69.github.io/pyjviz-poc/docs/why-janitor.py.ttl.dot.svg
5959

60-
pyjviz visualization of pyjanitor method chains is based on dumping of RDF log of pyjanitor method calls into rdf log file. Resulting RDF log file contains graph of method calls where user could trace method execution as well as user-defined data useful for visual inspection. Note that visualisation of pyjviz RDF log is not a main goal of provided package. Graphviz visualization avaiable in the package is rather reference implementation with quite limited capablities. However RDF structure defined in rdflog.shacl.ttl could be used by SPARQL processor for visualization and other tasks.
60+
## How it works?
61+
62+
`pyjviz` visualization of `pyjanitor` method chains is based on dumping of RDF triples `pyjanitor` method calls into log file. Each call of pyjanitor-registered method (including pandas native method calls) is traced and relevant data is saved into RDF log. Resulting RDF log file contains graph of method calls where user could find the trace of method execution as well as other data useful for visual inspection. The structure of RDF graph saved into log file is desribed in rdflog.shacl.ttl.
63+
64+
NOTE THAT visualisation of pyjviz RDF log is not a main goal of provided package. Graphviz-based visualization avaiable in the package is rather reference implementation with quite limited (but still useful) capablities. RDF log is kept in the files and queried using rdflib-provided SPARQL implementation.
65+
66+
--------
6167

6268
Obj is representation of pyjanitor object like pandas DataFrame. However input args are not objects rather object states. The state of object is represeneted by RDF class ObjState. The idea to separate object and object state is introduced to enable pyjviz to visualize situation when object has mutliple states used in method chain due to in-place operations. Such practice is discouraged by most of data packages but still may be used. In most cases where object has only state defined when object is created there is not difference betwen object and object state since there is one-to-one correspondence (isomorfism). So in some context below refernce to an object may imply object state instead.
6369

examples/notebooks/conditional-join-w-cmp.ipynb

Lines changed: 90 additions & 50 deletions
Large diffs are not rendered by default.

pyjviz/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from .wb_stack_entries import *
44
from .obj_utils import *
55
from .viz import *
6+
from .rdf_node import arrow
67

78
from .pf_pandas import enable_pf_pandas__
89

pyjviz/rdf_node.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import uuid
22

33
from . import fstriplestore
4+
from . import obj_tracking
45

56
class RDFNode:
67
def __init__(self, rdf_type, label):
@@ -14,3 +15,13 @@ def __init__(self, rdf_type, label):
1415
label_obj = f'"{self.label}"' if self.label else 'rdf:nil'
1516
rdfl.dump_triple(self.uri, "rdf:label", label_obj)
1617

18+
def arrow(from_obj, arrow_label, to_obj):
19+
rdfl = fstriplestore.triple_store
20+
arrow_uri = f"<Arrow#{str(uuid.uuid4())}>"
21+
rdfl.dump_triple(arrow_uri, "rdf:type", "<Arrow>")
22+
t_from_obj, from_found = obj_tracking.tracking_store.get_tracking_obj(from_obj)
23+
t_to_obj, to_found = obj_tracking.tracking_store.get_tracking_obj(to_obj)
24+
rdfl.dump_triple(arrow_uri, "<arrow-from>", t_from_obj.last_obj_state_uri)
25+
rdfl.dump_triple(arrow_uri, "<arrow-to>", t_to_obj.last_obj_state_uri)
26+
rdfl.dump_triple(arrow_uri, "<arrow-label>", '"' + arrow_label + '"')
27+

pyjviz/viz.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,15 @@ def dump_dot_code(g, vertical, show_objects, popup_output):
297297
print(f"""
298298
node_{uri_to_dot_id(to_obj)} -> node_{uri_to_dot_id(from_obj)} [label="{pred_s}", penwidth=2.5];
299299
""", file = out_fd)
300-
300+
301+
if 1:
302+
rq = "select ?from ?to ?arrow_label { [] rdf:type <Arrow>; <arrow-from> ?from; <arrow-to> ?to; <arrow-label> ?arrow_label }"
303+
for from_obj, to_obj, arrow_label in g.query(rq, base = fstriplestore.base_uri):
304+
print(f"""
305+
node_{uri_to_dot_id(from_obj)} -> node_{uri_to_dot_id(to_obj)} [label="{arrow_label.toPython()}", penwidth=4.5];
306+
""", file = out_fd)
307+
308+
301309

302310
print("}", file = out_fd)
303311
return out_fd.getvalue()

rdflog.shacl.ttl

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,12 @@
77
@prefix sh: <http://www.w3.org/ns/shacl#> .
88
@prefix dash: <http://datashapes.org/dash#> .
99

10+
<Arrow> rdf:type rdfs:Class; rdf:type sh:NodeShape; dash:closedByType true;
11+
sh:property [sh:path rdf:type; sh:minCount 1; sh:maxCount 1; sh:class <Arrow>];
12+
sh:property [sh:path <from>; sh:minCount 1; sh:maxCount 1; sh:class <RDFNode>];
13+
sh:property [sh:path <to>; sh:minCount 1; sh:maxCount 1; sh:class <RDFNode>];
14+
sh:property [sh:path <arrow-label>; sh:minCount 1; sh:maxCount 1; sh:dataclass xsd:string].
15+
1016
<WithBlock> rdf:type rdfs:Class; rdf:type sh:NodeShape; dash:closedByType true;
1117
sh:property [sh:path rdf:type; sh:minCount 1; sh:maxCount 1; sh:class <WithBlock>];
1218
sh:property [sh:path rdf:label; sh:minCount 1; sh:maxCount 1; sh:datatype xsd:string];

0 commit comments

Comments
 (0)