|
| 1 | +About |
| 2 | +===== |
| 3 | + |
| 4 | +This project is for your if: |
| 5 | + |
| 6 | +- You want to apply machine learning by just bringing your data |
| 7 | + |
| 8 | + - No need to deal with the various specifics of ML libraries |
| 9 | + |
| 10 | +- You want to do machine learning on a new problem that you don't have a dataset |
| 11 | + for, so you need to generate it. |
| 12 | + |
| 13 | +- You want to harness the power of directed graph execution to write testable, |
| 14 | + maintainable code. |
| 15 | + |
| 16 | +Machine Learning |
| 17 | +---------------- |
| 18 | + |
| 19 | +Python was chosen because of the machine learning community’s preference towards |
| 20 | +it. In addition to the data flow side of DFFML, there is a machine learning |
| 21 | +focused side. It provides a standardized way to defining, training, and using |
| 22 | +models. It also allows for wrapping existing models so as to expose them via the |
| 23 | +standardized API. Models can then be integrated into data flows as operations. |
| 24 | +This enables trivial layering of models to create complex features. See |
| 25 | +:ref:`plugin_models` for existing models and usage. |
| 26 | + |
| 27 | +Data Flows - Directed Graph Execution |
| 28 | +------------------------------------- |
| 29 | + |
| 30 | +The idea behind this project is to provide a way to link together various new |
| 31 | +or existing pieces of code and run them via an orchestration engine that |
| 32 | +forwards the data between them all. Similar a microservice architecture but with |
| 33 | +the orchestration being preformed according to a directed graph. This offers |
| 34 | +greater flexibility in that interaction between services can easily be modified |
| 35 | +without changing code, only the graph (known as the dataflow). |
| 36 | + |
| 37 | +This is an example of the dataflow for a meta static analysis tool for Python, |
| 38 | +``shouldi``. We take the package name (package) and feed it through operations, |
| 39 | +which are just functions (but could be anything, some SaaS web API endpoint for |
| 40 | +instance). All the data generated by running these operations is queriable, |
| 41 | +allowing us to structure the output in whatever way is most fitting for our |
| 42 | +application. |
| 43 | + |
| 44 | +.. image:: /images/shouldi-dataflow.png |
| 45 | + |
| 46 | +Consistant API |
| 47 | +-------------- |
| 48 | + |
| 49 | +DFFML decouples the interface through which the flow is accessed from the flow |
| 50 | +itself. For instance, data flows can be run via the library, HTTP API, CLI, or |
| 51 | +any communication channel (next targets are Slack and IRC). Data flows are also |
| 52 | +asynchronous in nature, allowing them to be used to build any event driven |
| 53 | +application (Chat, IoT data, etc.). The way in which operations are defined and |
| 54 | +executed by the orchestrator will let us take existing API endpoints and code in |
| 55 | +other languages and combine them into one cohesive workflow. The architecture |
| 56 | +itself is programming language agnostic, the first implementation has been |
| 57 | +written in Python. |
| 58 | + |
| 59 | +Plugins |
| 60 | +------- |
| 61 | + |
| 62 | +We take a community driven approach to content. Architecture is plugin based, |
| 63 | +which means anyone can swap out any piece by writing their own plugin and |
| 64 | +publishing it to the Python Package Index. This means that developers can |
| 65 | +publish operations and machine learning models that work out of the box with |
| 66 | +everything else maintained as a part of the core repository and with other |
| 67 | +developers models and operations. :doc:`tutorials/index` show how to create your |
| 68 | +own plugins. |
| 69 | + |
| 70 | +Team |
| 71 | +---- |
| 72 | + |
| 73 | +We have a team of volunteers working on the project. We hold weekly meetings |
| 74 | +and have a mailing list and chat. If you want to get involved, ask questions, or |
| 75 | +get help getting started, see :doc:`community`. |
| 76 | + |
| 77 | +We participated in Google Summer of Code 2019 under the Python Software |
| 78 | +Foundation. A big thanks to our students, Yash and Sudharsana! |
| 79 | + |
| 80 | +- `GSoC 2019 Student Contributions <https://github.com/intel/dffml/wiki/GSoC-2019#student-contributions>`_ |
| 81 | + |
| 82 | +Users |
| 83 | +----- |
| 84 | + |
| 85 | +The following is a list of organizations and projects using DFFML. Please let us |
| 86 | +know if you are using DFFML and we'll add you to the list. If you want help |
| 87 | +using DFFML, see the :doc:`community` page. |
| 88 | + |
| 89 | +- Intel |
| 90 | + |
| 91 | + - Open Source Software dependency security viability analysis |
0 commit comments