Skip to content

Commit a934d6b

Browse files
author
AlvarBer
committed
Release 0.6-beta
End of the second iteration
2 parents 42a3905 + 000d35f commit a934d6b

38 files changed

+829
-324
lines changed

docs/Makefile

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
PDF := persimmon.pdf # PDF Main Target
2-
MARKDOWN := introduction.md state_of_the_art.md workflow.md milestones.md \
3-
risk_analysis.md interface.md implementation.md type_checking.md \
4-
postmortem.md # Markdown files
5-
MARKDOWN_COMPLUTENSE := introduction.md focus.md state_of_the_art.md \
6-
workflow.md milestones.md risk_analysis.md interface.md implementation.md \
2+
MARKDOWN := introduction.md literature.md workflow.md milestones.md \
3+
risk.md interface.md implementation.md type_checking.md \
4+
evaluation.md postmortem.md # Markdown files
5+
MARKDOWN_COMPLUTENSE := introduction.md focus.md literature.md \
6+
workflow.md milestones.md risk.md interface.md implementation.md \
77
type_checking.md analysis.md postmortem.md
88
APPENDICES := package_organization.md how.md # Appendix after bibliography
99
METADATA := metadata.yaml # Metadata files (Author, Date, Title, etc..)
@@ -30,7 +30,7 @@ $(PDF): $(MARKDOWN) $(APPENDIX) $(TEMPLATE) $(IMAGES) $(BIBLIOGRAPHY) $(CSL) $(M
3030
pandoc --smart --standalone --latex-engine xelatex --template $(TEMPLATE) \
3131
--bibliography $(BIBLIOGRAPHY) --csl $(CSL) --table-of-contents \
3232
--top-level-division chapter --metadata date:"$(shell date +%Y/%m/%d)" \
33-
--metadata sansfont:"Helvetica Neue LT Com" \
33+
--metadata sansfont:"Helvetica Neue LT Com" --highlight-style breezedark\
3434
$(METADATA) $(MARKDOWN) --include-after-body $(APPENDIX) -o $@
3535

3636
complutense: $(MARKDOWN_COMPLUTENSE) $(APPENDIX) $(TEMPLATE) $(IMAGES) $(BIBLIOGRAPHY) $(CSL) $(METADATA)
@@ -54,8 +54,8 @@ twocol: $(MARKDOWN) $(APPENDIX) $(TEMPLATE) $(IMAGES) $(BIBLIOGRAPHY) $(CSL) $(M
5454
# For standalone images
5555
images/%.pdf: graphs/%.tex
5656
xelatex $< > /dev/null
57-
mv $*.pdf images/
58-
rm -f $*.log $*.aux
57+
@mv $*.pdf images/
58+
@rm -f $*.log $*.aux
5959

6060
$(APPENDIX): $(APPENDICES)
6161
pandoc --smart --no-tex-ligatures --top-level-division chapter $(APPENDICES) -o $@

docs/graphs/hierarchical.tex

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
\documentclass{standalone}
2+
3+
\usepackage{tikz}
4+
5+
\begin{document}
6+
\begin{tikzpicture}[sibling distance=5em]
7+
\node {Blackboard}
8+
child { node {Block α}
9+
child { node {OutPin} } }
10+
child { node {Connection} }
11+
child { node {Block β}
12+
child { node {InPin} } };
13+
\end{tikzpicture}
14+
\end{document}

docs/graphs/logical.tex

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
\documentclass{standalone}
2+
3+
\usepackage{tikz}
4+
\usetikzlibrary{positioning}
5+
6+
\begin{document}
7+
\begin{tikzpicture}
8+
\node at (0, 0) (c) {Connection};
9+
\node [left = 1cm of c] (alpha) {Block α};
10+
\node [below = 1cm of alpha] (o) {OutPin};
11+
\draw [<->] (alpha) -- (o);
12+
\node [below left = 1cm and -1cm of c] (e) {end};
13+
\draw [<-] (o) -- (e);
14+
\node [right = 1cm of c] (beta) {Block β};
15+
\node [below = 1cm of beta] (i) {InPin};
16+
\draw [<->] (beta) -- (i);
17+
\node [below right = 1cm and -1cm of c] (s) {start};
18+
\draw [->] (s) -- (i);
19+
\draw [->] (c) -- (e);
20+
\draw [->] (c) -- (s);
21+
\draw [->] (o) -- (c);
22+
\draw [->] (i) -- (c);
23+
\end{tikzpicture}
24+
\end{document}

docs/persimmon.bib

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,21 @@ @online{sexy
4141
urldate = {2017-02-25},
4242
}
4343

44+
@online{hunt,
45+
title = {The Hunt For Unicorn Data Scientists Lifts Salaries For All Data Analytics Professionals},
46+
author = {Gil Press},
47+
url = {https://www.forbes.com/sites/gilpress/2015/10/09/the-hunt-for-unicorn-data-scientists-lifts-salaries-for-all-data-analytics-professionals/#38147ccc5258},
48+
year = {2015},
49+
urldate = {2017-04-03},
50+
}
51+
52+
@online{unicorn,
53+
title = {Data scientists: 'As rare as unicorns'},
54+
author = {Jeanne G. Harris and Ray Eitel-Porter},
55+
url = {https://www.theguardian.com/media-network/2015/feb/12/data-scientists-as-rare-as-unicorns},
56+
year = {2015},
57+
urldate = {2017-04-03},
58+
}
4459

4560
% Data Visualization
4661
@online{principles,

docs/src/evaluation.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
Evaluation
2+
==========
3+
4+
On this chapter the evaluation process and how the survey was designed is
5+
explained.
6+
7+
Method
8+
------
9+
Based on in place recollection, mainly based on a questionnaire, plus some
10+
additional information that is harvested by the system (mainly timings).
11+
12+
The questionnaire selected is the System Usability Scale.
13+
14+
15+
Proposed tasks
16+
--------------
17+
The evaluation is composed by three different closed tasks.
18+
19+
* First task is the creation of a simple workflow, the objective of
20+
this task being to introduce Persimmon to the participants in the simplest
21+
terms.
22+
- First the participants have to load the iris file, using the csv input
23+
block and navigating the filesystem to get the file `iris.csv`.
24+
- Then they have to spawn the SVM block and connect the previous input
25+
block to this block, they do not need to change any of the parameters
26+
of the block.
27+
- After the SMV block has been placed a cross validation block has to be
28+
spawned and connected to the result of the SVM block.
29+
- Finally the result of the cross validation has to be connected to a
30+
print output block.
31+
* Second task is modifying the previous workflow to create a more complex
32+
worflow. It is only slightly more complex than the previous one, but it
33+
introduces the concept of re-cabling to the participants.
34+
- Add a prediction block.
35+
- Save to file.
36+
* Third task and final task. This one involves adding hyper-parameter tunning,
37+
which in turns means providing a dictionary with desired parameters.
38+
<!--
39+
- Create an entirely new workflow, either by putting it on the same
40+
blackboard or on a new one.
41+
-->
42+
- Use `gridsearch` for hyper-parameter tunning.
43+
- Use print output block again to return best hyper-parameters.
44+
45+
<!-- Actual evaluation -->

docs/src/implementation.md

Lines changed: 120 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
Implementation
22
==============
3-
<!-- High level overview + low level overview -->
3+
4+
The system is implemented in python, using the `Kivy` framework for the
5+
frontend and multiple scientific tools such as `Numpy`, `Scipy`, `Pandas` and
6+
most important `scikit-learn` for the backend.
47

58

69
First Iteration
710
---------------
8-
![Sketch of the first interface](images/sketch_1.png)
9-
1011
For the first iteration the priority was to get a proof of concept in order to
1112
see where the difficulties can appear, with a few simple classifiers and
1213
cross-validation techniques. As such a button-based interface with very limited
@@ -20,7 +21,7 @@ Trees, but gives good results in wide variety of problems.
2021
All these classifiers have few parameters on their respective sklearn
2122
implementations, and for this prototype the interface did not allow modifying
2223
any of them, as the it would have cluttered and it was not a necessary feature.
23-
Also all of them are classifiers, as it simplies the interface, since
24+
Also all of them are classifiers, as it simplifies the interface, since
2425
regressors and clustering have some incompatibilities.
2526

2627
Apart from the temporary interface the backend had to be built. Since the
@@ -33,8 +34,6 @@ executed those.
3334

3435
Second Iteration
3536
----------------
36-
![Sketch of the second interface](images/sketch_2.png)
37-
3837
For the second interface the drag and drop feel was the main priority.
3938
As such after developing the tab panel draggable boxes were developed, these
4039
boxes needed to be connected through pins.
@@ -100,6 +99,121 @@ receives it).
10099

101100
For more information about internal package distribution check appendix A.
102101

102+
103+
Making a Connection
104+
-------------------
105+
One of the most complex part is the connection, reconnection and deletion of
106+
connection between blocks, it involves several actors, asynchronous callbacks
107+
and a very strong coupling between all elements.
108+
109+
![Widget Tree](images/hierarchical.pdf)
110+
111+
In order to understand how connections are made it is necessary to understand
112+
how `Kivy` handles input.
113+
At surface level `Kivy` follows the traditional event-based input management,
114+
with the event propagating downwards from the root.
115+
However while traditionaly inputs events are only passed down to components
116+
that are on the event position `Kivy` passes the events to almost all children
117+
by default, this is done because in phones (one of `Kivy` targets is Android)
118+
gestures tend to start outside the actual widget they intend to affect.
119+
120+
On `Kivy` there are three main inputs events, `on_touch_down` that gets called
121+
when a key is is pressed, `on_touch_move` that is notified when the touch is
122+
moved, i.e. a finger moves across the screen, or on this cases when the mouse
123+
moves, and `on_touch_up` that is fired when the touch is released.
124+
125+
Lets represent the possible actions as use cases, the \* represents
126+
`on_touch_down`, - represents `on_touch_move`, and the inner \* `on_touch_up`:
127+
128+
* (On pin) Start a connection
129+
* (On connection) Modify a connection
130+
- Follow cursor
131+
- (On pin) Typecheck
132+
* (On a pin) Establish connection if possible
133+
* (Elsewhere) Remove connection
134+
135+
Logic is split in two big cases, creating a connection and modifying an
136+
existing one.
137+
Creating a connection involves creating one end of the connection, both
138+
visually and logically and preparing the line that will follow the cursor.
139+
On the other hand modifying a connection means removing the end that is being
140+
touched.
141+
This two cases can be handled by different classes, pin on the first case and
142+
connection for the last.
143+
Moving and finishing the connection are the same.
144+
145+
Without getting too deep into implementation details ends cannot just be
146+
removed, there are visual binds that have to be unbinded, and when a connection
147+
is destroyed (this only happens inside `on_touch_up`, but it can be either
148+
the pins or the blackboard `on_touch_up` depending if the connection is
149+
destroyed because the pin violates type safety or there is no pin under the
150+
cursor respectively) it has to unbind the logical connections of the pins
151+
themselves.
152+
For this reason connection has high-level functions that do the unbind, rebind
153+
and deletion of ends, as long as the necessary elements are passed (dependency
154+
injection pattern).
155+
156+
![Connections between elements](images/logical.pdf)
157+
158+
159+
Intermediate Representation
160+
---------------------------
161+
The visual blocks represent a visual-dataflow language, however the backend
162+
uses a simpler representation of the relations between the blocks, this in turn
163+
helps decoupling backend and frontend.
164+
165+
The frontend blocks are translated on function `to_ir`, which merely performs
166+
trivial transformations to achieve the desired intermediate representation
167+
desired and runs on $\mathcal{O}(n)$ with n being the number of pins.
168+
169+
Let's represent the types on a more strongly typed language than Python.
170+
171+
~~~haskell
172+
type Id = Int -- The hash is an integer
173+
data Inputs = Inputs {origin :: Id, block :: Id}
174+
data Blocks = Blocks {inputs :: [Id], function :: IO a -> IO a,
175+
outputs :: [Id]}
176+
data Outputs = Outputs {destinations :: [Id], block :: Id}
177+
data IR = IR {inputs :: Map Id Inputs, blocks :: Map Id Blocks,
178+
outputs :: Map Id Outputs}
179+
~~~
180+
181+
As we can see on the Haskell definition the intermediation representation is
182+
just three Maps, one for blocks, one for input pins and one for output pins.
183+
But the maps do not contains pins themselves, merely unique hashes (Int on
184+
this case).
185+
This reflects the fact that pins model only relationships, not state.
186+
The only non-hash value on `IR` are the blocks functions.
187+
This functions are indeed impure, but earlier on the literature review it was
188+
established that dataflow programming was mainly side-effect free, so why do
189+
they involve side effects?.
190+
191+
There are actually first two reasons, first on the actual python programs this
192+
types do not exist, at least not on an enforceable way, so when translating
193+
them to haskell the `function` field represents the "worst case", that is to
194+
say only a few functions will actually end up producing side-effects.
195+
The second and more important reason is that blocks actually execute
196+
themselves, meaning the block function does not has parameters, it relays on
197+
getting the values from the pins values and sets the values of the output
198+
values, leaving us with the work of setting those input pins and retrieving
199+
results from the output pins.
200+
201+
This goes against the previously stated "pins represent relationships, not
202+
state", in fact an alternative implementation was created in which the
203+
function returned a tuple of results, and it's the compiler job to now
204+
associate the output pins to each of the elements on the tuple. This was done
205+
using the same current mechanism, saving into a dictionary, the difference
206+
being that while currently the values appear on the output pins and have to be
207+
moved into the dictionary (or otherwise a reference to the pin itself must be
208+
kept on the dictionary) on this case the values were fed directly to the
209+
algorithm.
210+
However this proved limiting, as code became more complex since more checks have
211+
to be done, there was no obvious advantage and side-effects did not disappeared
212+
but merely were harder to do.
213+
214+
<!-- Talk about function composition -->
215+
216+
103217
[^blackboard]: Blackboard is how the canvas where the blocks and connections
104218
are lay down.
105219
[^MVC]: Model View Controller is a software pattern.

docs/src/interface.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,49 @@
11
Interface Design
22
================
33

4+
The main way users interact with the system is trough the visual interface, and
5+
as such is very important that all the information and operations available are
6+
easily accessible on an intuitive manner, removing the need for extensive
7+
training with the software.
8+
49
Colour Palette
510
--------------
11+
<!-- Talk about hsv and all that fluff, color brewer 2? -->
612

713
Typography
814
----------
15+
The default font for kivy is Roboto, and for a good reason, as one of Kivy
16+
targets is Android, which has Roboto as the most commonly used font.
17+
Roboto is a neo-grotesque sans-serif with a modern robotic feel, it really
18+
feels at home on mobile screens, and it is also used on other Google's products
19+
and websites.
20+
However on the desktop it feels a bit too cold and ubiquitous, as John Gruber
21+
calls it "Google's Arial'.
22+
The better solution would be platform-dependent, as Mac default choice,
23+
Helvetica, has trouble rendering in some Window and Linux desktop enviroments.
24+
For this reason Roboto was left as the choice for font rendering.
25+
926

1027
Sketches
1128
--------
29+
![Sketch of the first interface](images/sketch_1.png)
30+
31+
On the first interface there was a focus on getting a model done as soon as
32+
possible. For this reason the interface had to be easy to implement and easy
33+
to use, with the few navigations steps required to perform all possible actions
34+
as to allow for quick debugging.
35+
This meant sacrificing flexibility in favour of usability, because the
36+
algorithms implement were so few the button-based interface worked as intended
37+
for this prototype.
38+
No special considerations were taken for color palettes,
39+
shapes or any other kind of visual aid.
40+
41+
![Sketch of the second interface](images/sketch_2.png)
42+
For the second iteration however the extensibility had to be present, meaning
43+
the old interface was not reusable for the new functionality.
44+
The block based interface gives a lot more of control to the final user, still
45+
some underlying mechanisms such as optional parameters or saving into file were
46+
not present.
47+
48+
<!-- Third interface: drag and drop blocks? Bubble? Code execution
49+
visualization? Type safety indicators? -->

0 commit comments

Comments
 (0)