Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

Commit 86e4a0f

Browse files
John Andersenpdxjohnny
authored andcommitted
docs: Add about page and update shouldi
Signed-off-by: John Andersen <[email protected]>
1 parent f6fda93 commit 86e4a0f

File tree

9 files changed

+1183
-124
lines changed

9 files changed

+1183
-124
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3636
- shouldi example got a bandit operation which tells users not to install if
3737
there are more than 5 issues of high severity and confidence.
3838
- dev service got the ability to run a single operation in a standalone fashion.
39+
- About page to docs.
3940
### Changed
4041
- feature/codesec became it's own branch, binsec
4142
- BaseOrchestratorContext `run_operations` strict is default to true. With

docs/about.rst

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
About
2+
=====
3+
4+
This project is for your if:
5+
6+
- You want to apply machine learning by just bringing your data
7+
8+
- No need to deal with the various specifics of ML libraries
9+
10+
- You want to do machine learning on a new problem that you don't have a dataset
11+
for, so you need to generate it.
12+
13+
- You want to harness the power of directed graph execution to write testable,
14+
maintainable code.
15+
16+
Machine Learning
17+
----------------
18+
19+
Python was chosen because of the machine learning community’s preference towards
20+
it. In addition to the data flow side of DFFML, there is a machine learning
21+
focused side. It provides a standardized way to defining, training, and using
22+
models. It also allows for wrapping existing models so as to expose them via the
23+
standardized API. Models can then be integrated into data flows as operations.
24+
This enables trivial layering of models to create complex features. See
25+
:ref:`plugin_models` for existing models and usage.
26+
27+
Data Flows - Directed Graph Execution
28+
-------------------------------------
29+
30+
The idea behind this project is to provide a way to link together various new
31+
or existing pieces of code and run them via an orchestration engine that
32+
forwards the data between them all. Similar a microservice architecture but with
33+
the orchestration being preformed according to a directed graph. This offers
34+
greater flexibility in that interaction between services can easily be modified
35+
without changing code, only the graph (known as the dataflow).
36+
37+
This is an example of the dataflow for a meta static analysis tool for Python,
38+
``shouldi``. We take the package name (package) and feed it through operations,
39+
which are just functions (but could be anything, some SaaS web API endpoint for
40+
instance). All the data generated by running these operations is queriable,
41+
allowing us to structure the output in whatever way is most fitting for our
42+
application.
43+
44+
.. image:: /images/shouldi-dataflow.png
45+
46+
Consistant API
47+
--------------
48+
49+
DFFML decouples the interface through which the flow is accessed from the flow
50+
itself. For instance, data flows can be run via the library, HTTP API, CLI, or
51+
any communication channel (next targets are Slack and IRC). Data flows are also
52+
asynchronous in nature, allowing them to be used to build any event driven
53+
application (Chat, IoT data, etc.). The way in which operations are defined and
54+
executed by the orchestrator will let us take existing API endpoints and code in
55+
other languages and combine them into one cohesive workflow. The architecture
56+
itself is programming language agnostic, the first implementation has been
57+
written in Python.
58+
59+
Plugins
60+
-------
61+
62+
We take a community driven approach to content. Architecture is plugin based,
63+
which means anyone can swap out any piece by writing their own plugin and
64+
publishing it to the Python Package Index. This means that developers can
65+
publish operations and machine learning models that work out of the box with
66+
everything else maintained as a part of the core repository and with other
67+
developers models and operations. :doc:`tutorials/index` show how to create your
68+
own plugins.
69+
70+
Team
71+
----
72+
73+
We have a team of volunteers working on the project. We hold weekly meetings
74+
and have a mailing list and chat. If you want to get involved, ask questions, or
75+
get help getting started, see :doc:`community`.
76+
77+
We participated in Google Summer of Code 2019 under the Python Software
78+
Foundation. A big thanks to our students, Yash and Sudharsana!
79+
80+
- `GSoC 2019 Student Contributions <https://github.com/intel/dffml/wiki/GSoC-2019#student-contributions>`_
81+
82+
Users
83+
-----
84+
85+
The following is a list of organizations and projects using DFFML. Please let us
86+
know if you are using DFFML and we'll add you to the list. If you want help
87+
using DFFML, see the :doc:`community` page.
88+
89+
- Intel
90+
91+
- Open Source Software dependency security viability analysis

docs/community.rst

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,21 +20,10 @@ the agenda.
2020
- `Meeting Minutes <https://docs.google.com/document/d/16u9Tev3O0CcUDe2nfikHmrO3Xnd4ASJ45myFgQLpvzM/>`_
2121
- `Recordings <https://www.youtube.com/channel/UCorEDRWGikwBH3dsJdDK1qA>`_
2222

23-
The meeting link will be posted in the Gitter channel shortly before starting.
2423
To join the meeting, click on the Google calendar below for the day you are
2524
trying to join, then click "more details" which will open your Google calendar,
2625
then click "Join Hangouts".
2726

28-
The goal of DFFML is to build a community driven library of plugins for dataset
29-
generation and model definition. So that we as developers and researchers can
30-
quickly and easily plug and play various pieces of data with various model
31-
implementations.
32-
33-
The more we build up the library of plugins (which anyone can maintain, they
34-
don't have to be contributed upstream unless you want to) the more variations on
35-
model implementations, feature data generators, and database backend
36-
abstractions, we all have to work with.
37-
3827
If you have developed a plugin for DFFML and would like it listed in the docs,
3928
please open an `issue <https://github.com/intel/dffml/issues/new?assignees=&labels=documentation&template=new_plugin.md&title=plugin%3A+new%3A+>`_.
4029

0 commit comments

Comments
 (0)