Skip to content

Commit 6a7e928

Browse files
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
1 parent a822d4d commit 6a7e928

25 files changed

+276
-204
lines changed

README.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ The [`resulting SVG file`][res] of the run shows method chain calls along with i
5959

6060
## How it works?
6161

62-
`pyjviz` provides the way to create logfile which contains RDF graph of program behaviour. The visualization features of pyjviz provided in the package itself are based on RDF graph translation to graphviz dot lanuage. `pyjanitor` method chains are represented using certain RDF data schema (ref here to shacl defs). Using `pandas` extentions API `pyjanitor` (and `pandas`) method call arguments and returns are saved into RDF log.
62+
`pyjviz` provides the way to create logfile which contains RDF graph of program behaviour. The visualization features of pyjviz provided in the package itself are based on RDF graph translation to graphviz dot lanuage. `pyjanitor` method chains are represented using certain RDF data schema (ref here to shacl defs). Using `pandas` extentions API `pyjanitor` (and `pandas`) method call arguments and returns are saved into RDF log.
6363

6464
> **Note**
6565
> Visualisation of pyjviz RDF graph is not a main goal of provided package. Graphviz-based visualization avaiable in the package is rather reference implementation with quite limited (but still useful) capablities.
@@ -68,12 +68,10 @@ Python objects from `pyjviz` point of view have `object identity` and `object st
6868

6969
E.g. the simplest form of pandas dataframe 'carbon copy' can be obtained via using output of method head() then converted to HTML format - result of df.head().to_html() call. More comprehensive CC would be dataframe plot as generated by .plot method and saved as byte sequence. Note that 'carbon copy' is not necessary capture all details of original object state. If one need to have precise object state she would have to use CC class which guarantee that. CC like that would be based on .to_csv method in example above.
7070

71-
The way how particular call argument/return or other python objects are saved into RDF log is specified using CCGlance `carbon copy` class. For pandas dataframe it will save just shape of dataframe and its head() output serialized as HTML. If user wants to have other CC of the object it is always possible to use .cc() method ((ref here, rename .pin() to .cc())
71+
The way how particular call argument/return or other python objects are saved into RDF log is specified using CCGlance `carbon copy` class. For pandas dataframe it will save just shape of dataframe and its head() output serialized as HTML. If user wants to have other CC of the object it is always possible to use .cc() method ((ref here, rename .pin() to .cc())
7272

7373
--------
7474

7575
Obj is representation of pyjanitor object like pandas DataFrame. However input args are not objects rather object states. The state of object is represeneted by RDF class ObjState. The idea to separate object and object state is introduced to enable pyjviz to visualize situation when object has mutliple states used in method chain due to in-place operations. Such practice is discouraged by most of data packages but still may be used. In most cases where object has only state defined when object is created there is not difference betwen object and object state since there is one-to-one correspondence (isomorfism). So in some context below refernce to an object may imply object state instead.
7676

7777
pyjviz also introduce MethodCall RDF class. It represents pyjanitor method call. MethodCall object has incoming links from input objects and outgoing link an object representing retirn object.
78-
79-

doc/tmp/tmp2eu_4xcg.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,4 +31,4 @@
3131
<td>800.0</td>
3232
</tr>
3333
</tbody>
34-
</table>
34+
</table>

doc/tmp/tmpc3kafay7.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@
2727
<td>675.0</td>
2828
</tr>
2929
</tbody>
30-
</table>
30+
</table>

doc/tmp/tmpd_ffzcp4.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,4 +33,4 @@
3333
<td>675.0</td>
3434
</tr>
3535
</tbody>
36-
</table>
36+
</table>

doc/tmp/tmpij_bu_q6.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@
2727
<td>675.0</td>
2828
</tr>
2929
</tbody>
30-
</table>
30+
</table>

doc/tmp/tmpqji_p6l7.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@
2727
<td>675.0</td>
2828
</tr>
2929
</tbody>
30-
</table>
30+
</table>

doc/tmp/tmpt16kh8sh.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,4 @@
3838
<td>675.0</td>
3939
</tr>
4040
</tbody>
41-
</table>
41+
</table>

examples/image-processing/image-processing.py

Lines changed: 63 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -18,92 +18,115 @@
1818
# apply overload causes problems in printing of dataframes
1919
# so it is included here to make nested call visualization work
2020
old_apply = pd.Series.apply
21+
22+
2123
@pf.register_series_method
2224
def apply(s: pd.Series, func) -> pd.Series:
2325
ret = old_apply(s, func)
2426
return ret
2527

28+
2629
@pf.register_series_method
2730
def load_images(file_pathes: pd.Series) -> pd.DataFrame:
28-
#ipdb.set_trace()
31+
# ipdb.set_trace()
2932
df = pd.DataFrame()
3033
for file_path in file_pathes:
3134
x_image = imread(file_path)
3235
im_name = os.path.basename(file_path)
33-
df = df.append({'im_name': im_name, 'image': x_image}, ignore_index = True)
34-
36+
df = df.append(
37+
{"im_name": im_name, "image": x_image}, ignore_index=True
38+
)
39+
3540
return df
3641

42+
3743
@pf.register_dataframe_method
3844
def subplot(df: pd.DataFrame, *, image_col, title_col, title):
3945
return df
4046

47+
4148
@pf.register_dataframe_method
4249
def binarize_images(df: pd.DataFrame, thresholding_method) -> pd.DataFrame:
43-
df['gray_leaf'] = df.image.apply(rgb2gray)
44-
df['binarized'] = None
50+
df["gray_leaf"] = df.image.apply(rgb2gray)
51+
df["binarized"] = None
4552
for t in df.itertuples():
4653
thresh = thresholding_method(t.gray_leaf)
47-
df.at[t.Index, 'binarized'] = (t.gray_leaf < thresh)
54+
df.at[t.Index, "binarized"] = t.gray_leaf < thresh
4855
return df
4956

57+
5058
@pf.register_dataframe_method
5159
def morphology(df: pd.DataFrame) -> pd.DataFrame:
52-
df['closed'] = df.binarized.apply(area_closing)
53-
df['opened'] = df.closed.apply(area_opening)
60+
df["closed"] = df.binarized.apply(area_closing)
61+
df["opened"] = df.closed.apply(area_opening)
5462
return df
5563

64+
5665
@pf.register_dataframe_method
5766
def labeling(df):
58-
df['label_im'] = df.opened.apply(label)
59-
df['regions'] = df.label_im.apply(regionprops)
67+
df["label_im"] = df.opened.apply(label)
68+
df["regions"] = df.label_im.apply(regionprops)
6069
return df
6170

71+
6272
@pf.register_dataframe_method
6373
def get_properties_of_each_region(df: pd.DataFrame) -> pd.DataFrame:
64-
properties = ['area','convex_area','bbox_area',
65-
'major_axis_length', 'minor_axis_length',
66-
'perimeter', 'equivalent_diameter',
67-
'mean_intensity', 'solidity', 'eccentricity']
74+
properties = [
75+
"area",
76+
"convex_area",
77+
"bbox_area",
78+
"major_axis_length",
79+
"minor_axis_length",
80+
"perimeter",
81+
"equivalent_diameter",
82+
"mean_intensity",
83+
"solidity",
84+
"eccentricity",
85+
]
6886
res_df = []
6987
for t in df.itertuples():
70-
#ipdb.set_trace()
71-
p_df = pd.DataFrame(regionprops_table(t.label_im, t.gray_leaf, properties=properties))
88+
# ipdb.set_trace()
89+
p_df = pd.DataFrame(
90+
regionprops_table(t.label_im, t.gray_leaf, properties=properties)
91+
)
7292
p_df = p_df[(p_df.index != 0) & (p_df.area > 100)]
73-
p_df['im_name'] = t.im_name
93+
p_df["im_name"] = t.im_name
7494
res_df.append(p_df)
7595
return pd.concat(res_df)
7696

97+
7798
@pf.register_dataframe_method
7899
def apply_feature_engeneering(df: pd.DataFrame) -> pd.DataFrame:
79-
df['ratio_length'] = (df['major_axis_length'] / df['minor_axis_length'])
80-
df['perimeter_ratio_major'] = (df['perimeter'] / df['major_axis_length'])
81-
df['perimeter_ratio_minor'] = (df['perimeter'] / df['minor_axis_length'])
82-
df['area_ratio_convex'] = df['area'] / df['convex_area']
83-
df['area_ratio_bbox'] = df['area'] / df['bbox_area']
84-
df['peri_over_dia'] = df['perimeter'] / df['equivalent_diameter']
85-
final_df = df[df.drop('type', axis=1).columns].astype(float)
100+
df["ratio_length"] = df["major_axis_length"] / df["minor_axis_length"]
101+
df["perimeter_ratio_major"] = df["perimeter"] / df["major_axis_length"]
102+
df["perimeter_ratio_minor"] = df["perimeter"] / df["minor_axis_length"]
103+
df["area_ratio_convex"] = df["area"] / df["convex_area"]
104+
df["area_ratio_bbox"] = df["area"] / df["bbox_area"]
105+
df["peri_over_dia"] = df["perimeter"] / df["equivalent_diameter"]
106+
final_df = df[df.drop("type", axis=1).columns].astype(float)
86107
final_df = final_df.replace(np.inf, 0)
87-
final_df['type'] = df['type']
108+
final_df["type"] = df["type"]
88109
return final_df
89110

111+
90112
file_pathes = pd.Series(glob.glob("dataset/*.jpg"))
91113

92114
with pyjviz.CB("initial-phase") as cc:
93-
initial_phase_df = (file_pathes
94-
.load_images()#.subplot(image_col = 'image', title_col = 'im_name', title = '(Original Image by Gino Borja, AIM)')
95-
.binarize_images(threshold_otsu)#.subplot(image_col = 'binarized', title_col = file_pathes, title = 'binarized')
96-
.morphology()#.subplot(image_col = 'opened', title_col = file_pathes, title = 'opened')
97-
.labeling()#.subplot(image_col = 'label_im', title_col = file_pathes, title = 'labeled')
98-
)
99-
if 1:
115+
initial_phase_df = (
116+
file_pathes.load_images() # .subplot(image_col = 'image', title_col = 'im_name', title = '(Original Image by Gino Borja, AIM)')
117+
.binarize_images(
118+
threshold_otsu
119+
) # .subplot(image_col = 'binarized', title_col = file_pathes, title = 'binarized')
120+
.morphology() # .subplot(image_col = 'opened', title_col = file_pathes, title = 'opened')
121+
.labeling() # .subplot(image_col = 'label_im', title_col = file_pathes, title = 'labeled')
122+
)
123+
if 1:
100124
with pyjviz.CB("build-features"):
101-
final_df = (initial_phase_df
102-
.get_properties_of_each_region()
103-
.assign(type = lambda x: x.im_name.apply(lambda x: x.split('.')[0]))
104-
.drop(columns = 'im_name')
105-
.apply_feature_engeneering()
106-
)
107-
108-
pyjviz.save_dot(vertical = True)
125+
final_df = (
126+
initial_phase_df.get_properties_of_each_region()
127+
.assign(type=lambda x: x.im_name.apply(lambda x: x.split(".")[0]))
128+
.drop(columns="im_name")
129+
.apply_feature_engeneering()
130+
)
109131

132+
pyjviz.save_dot(vertical=True)

examples/scripts/a-1.py

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,34 +7,36 @@
77
import typing
88
import pandas as pd
99

10-
TestDF = typing.NewType('TestDF', pd.DataFrame)
11-
TestDF.columns = ['a']
10+
TestDF = typing.NewType("TestDF", pd.DataFrame)
11+
TestDF.columns = ["a"]
12+
1213

1314
@pf.register_dataframe_method
1415
def a0(df: pd.DataFrame) -> TestDF:
1516
print("a0")
1617
return pd.DataFrame(df)
17-
#return df
18+
# return df
19+
1820

1921
if __name__ == "__main__":
2022
print(TestDF, TestDF.__name__, TestDF.__supertype__)
2123
print(TestDF.columns)
2224

23-
df = pd.DataFrame({'a': range(10)})
25+
df = pd.DataFrame({"a": range(10)})
2426

25-
#ipdb.set_trace()
27+
# ipdb.set_trace()
2628

27-
#df.obj_chain_path = "c"
28-
#pyjviz.curr_methods_chain_path = "c"
29+
# df.obj_chain_path = "c"
30+
# pyjviz.curr_methods_chain_path = "c"
2931
with pyjviz.CB("c"):
3032
df0 = df.a0()
3133
df1 = df.a0().a0()
3234
df2 = df.a0()
33-
34-
#df.obj_chain_path = None
35+
36+
# df.obj_chain_path = None
3537
print(df.describe())
36-
37-
#df1.obj_chain_path = None
38+
39+
# df1.obj_chain_path = None
3840
print(df1.describe())
3941

4042
pyjviz.save_dot()

examples/scripts/a0.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
#import ipdb
1+
# import ipdb
22
import janitor
33
import pandas_flavor as pf
44
import pyjviz
@@ -7,29 +7,30 @@
77
import typing
88
import pandas as pd
99

10-
TestDF = typing.NewType('TestDF', pd.DataFrame)
11-
TestDF.columns = ['a']
10+
TestDF = typing.NewType("TestDF", pd.DataFrame)
11+
TestDF.columns = ["a"]
12+
1213

1314
@pf.register_dataframe_method
1415
def a0(df: pd.DataFrame) -> TestDF:
15-
#ipdb.set_trace()
16+
# ipdb.set_trace()
1617
print("a0")
1718
return pd.DataFrame(df)
18-
#return df
19+
# return df
20+
1921

20-
if __name__ == "__main__":
22+
if __name__ == "__main__":
2123
print(TestDF, TestDF.__name__, TestDF.__supertype__)
2224
print(TestDF.columns)
2325

24-
df = pd.DataFrame({'a': range(10)})
26+
df = pd.DataFrame({"a": range(10)})
2527

2628
with pyjviz.CB("c") as C:
2729
df0 = df.a0()
2830
df1 = df.a0().a0()
2931
df2 = df.a0()
3032

31-
#ipdb.set_trace()
33+
# ipdb.set_trace()
3234
print(df1.describe())
3335

34-
pyjviz.save_dot(show_objects = True)
35-
36+
pyjviz.save_dot(show_objects=True)

0 commit comments

Comments
 (0)