Skip to content
This repository was archived by the owner on Aug 25, 2024. It is now read-only.

Commit 3c4a530

Browse files
committed
docs: contributing: gsoc: 2022: Update with ideas
Signed-off-by: John Andersen <[email protected]>
1 parent a5db9a4 commit 3c4a530

File tree

5 files changed

+492
-0
lines changed

5 files changed

+492
-0
lines changed

docs/contributing/gsoc/2022/automl.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Implementing AutoML
2+
3+
AutoML or Automated Machine Learning as the name suggests automates the process
4+
of solving problems with Machine Learning. AutoML is generally helpful for
5+
people who aren't either familiar with Machine Learning or the involved
6+
programming. AutoML aims to improve the efficiency of any task involving
7+
Machine Learning.
8+
9+
The primary objective we are trying to achieve is to create a model that
10+
takes as a property of its config a set of models to used for hyperparameter
11+
tuning. Another property of its config is the set of models which we should
12+
attempt to tune (via the first set). Default values for these results in using
13+
all installed models to try to tune all installed model plugins.
14+
15+
- To start, we should define a reduced set of models (not all the ones we have).
16+
We'll implement AutoML supporting only this reduced set. The first phase of
17+
this project will be to make sure that one model can be used to tune
18+
hyperparameters of another model.
19+
20+
- The next phase will be to tune two models using the same tuning model. This
21+
followed by tuning two models, using two models which amounts to doing the
22+
previous task twice, with a different tuning model the second time.
23+
24+
- The following phase will be to go through each model in each model plugin we
25+
have and see which ones have issues being tuned using the approach taken in the
26+
previous phase. This phase will help us determine which properties or methods
27+
we may need to add to models to help them self identify and thereby indicate
28+
their requirements for hyperparameter tuning, or maybe their inherent lack of
29+
support for it.
30+
31+
- The final phase will be to implement hyperparameter tuning for N by N models,
32+
after implementing what we found to be gaps in the previous phase.<br>
33+
34+
Due to the shortened GSoC cycle, we may end up not doing all of these phases.
35+
Which one we go to will be decided as we approach the selection process.
36+
37+
## Skills
38+
39+
- Python
40+
- Intermediate Machine Learning
41+
- Experience with various machine learning frameworks (AutoML frameworks would
42+
be a plus)
43+
44+
## Difficulty
45+
46+
Intermediate/Hard
47+
48+
## Estimated Time Required
49+
50+
350 hours
51+
52+
## Related Readings
53+
54+
- https://github.com/intel/dffml/blob/master/docs/contributing/gsoc/2022/
55+
- https://scikit-learn.org/stable/model_selection.html#model-selection
56+
- https://www.automl.org/automl/
57+
58+
## Getting Started
59+
60+
- Read the contributing guidelines
61+
- https://intel.github.io/dffml/master/contributing/index.html
62+
- Go through the quickstart
63+
- https://intel.github.io/dffml/master/quickstart/model.html
64+
- Go trough the model tutorials
65+
- https://intel.github.io/dffml/master/tutorials/models/
66+
- Go through the model plugins
67+
- https://intel.github.io/dffml/master/plugins/dffml_model.html
68+
69+
## Potential Mentors
70+
71+
- [John Andersen](https://github.com/pdxjohnny)
72+
- [Yash Lamba](https://github.com/yashlamba)
73+
- [Saksham Arora](https://github.com/sakshamarora1)
74+
75+
## Tracking and Discussion
76+
77+
This project is related to the following issues. Please discuss and ask
78+
questions in the issue comments. Please also ping mentors on
79+
[Gitter](https://gitter.im/dffml/community) when you post on the following
80+
issues so that they are sure to see that you've commented.
81+
82+
- https://github.com/intel/dffml/issues/968
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Refactor of DataFlows to Include Event Type
2+
3+
A large part of DFFML is the concept of a DataFlow.
4+
5+
Chance orchestrator context run so that it yields three objects, context, event
6+
type, results.
7+
8+
Currently we have a lot of code that looks like this:
9+
10+
```python
11+
for ctx, results in run(dataflow, [... inputs ...]):
12+
print("The results of", ctx, "are", results)
13+
```
14+
15+
When this project is over, those `for` loops will look like this:
16+
17+
```python
18+
for ctx, event, data in run(dataflow, [... inputs ...]):
19+
if event == EventType.OUTPUT:
20+
print("The results of", ctx, "are", data)
21+
elif event == EventType.INPUT:
22+
print("An input entered network for context", ctx, ":", data)
23+
```
24+
25+
The way things currently work is that the `run` function `yield`s when the
26+
context is finished running. It `yield`s the context that was running and the
27+
results.
28+
29+
We need to add another part to a data flow so we can yield `Input`s. The event
30+
type would be `INPUT`, and in the DataFlow we should add a section for events.
31+
In the events section for `INPUT` events we could specify when an
32+
input should be yielded. We use the inputs section to specify which transitions
33+
between operations should be yielded.
34+
35+
This will enable us to do things like running a DataFlow and not only `yield`ing
36+
the results, but data that's moving through the network as the DataFlow is
37+
running. This allows developers to build applications that show the progress of
38+
a DataFlow as it's running.
39+
40+
## Skills
41+
42+
- Python
43+
- Refactoring a large codebase
44+
- Asyncio knowledge would be very helpful here
45+
46+
## Difficulty
47+
48+
Intermediate/Hard
49+
50+
## Estimated Time Required
51+
52+
350 hours
53+
54+
## Related Readings
55+
56+
- https://intel.github.io/dffml/master/contributing/gsoc/2022/index.html
57+
58+
## Getting Started
59+
60+
- Read the contributing guidelines
61+
- https://intel.github.io/dffml/master/contributing/index.html
62+
- Go through the quickstart
63+
- https://intel.github.io/dffml/master/quickstart/model.html
64+
- Go through the data flow related docs and tutorials
65+
- https://intel.github.io/dffml/master/tutorials/dataflows/index.html
66+
- https://intel.github.io/dffml/master/examples/integration.html
67+
- https://intel.github.io/dffml/master/examples/shouldi.html
68+
- https://intel.github.io/dffml/master/examples/dataflows.html
69+
- https://intel.github.io/dffml/master/examples/mnist.html
70+
- https://intel.github.io/dffml/master/examples/flower17/flower17.html
71+
- https://intel.github.io/dffml/master/examples/webhook/index.html
72+
- Read about what data flows are and how they work
73+
- https://intel.github.io/dffml/master/concepts/index.html#dataflows
74+
- https://intel.github.io/dffml/master/concepts/dataflow.html
75+
- Come up with a basic example where the user will see inputs moving through the
76+
network.
77+
- Make it simple and include a few operations.
78+
- Get the DataFlow running.
79+
- Look at the code in `dffml/df/memory.py` and understand how it relates to the
80+
docs covering DataFlows conceptually.
81+
- Plan out what all needs to change within `dffml/df/memory.py` and the other
82+
code and examples that would change as a result.
83+
84+
## Potential Mentors
85+
86+
- [John Andersen](https://github.com/pdxjohnny)
87+
- [Saksham Arora](https://github.com/sakshamarora1)
88+
89+
## Tracking and Discussion
90+
91+
This project is related to the following issues. Please discuss and ask
92+
questions in the issue comments. Please also ping mentors on
93+
[Gitter](https://gitter.im/dffml/community) when you post on the following
94+
issues so that they are sure to see that you've commented.
95+
96+
- https://github.com/intel/dffml/issues/919
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Time-Series Forecasting and Anomaly Detection
2+
3+
Time-Series Forecasting & Anomaly detection has paramount importance in many
4+
real-world problems like infrastructure monitoring, stock exchange, etc.
5+
Currently, DFFML has limited integration with time-series data formats and
6+
implementation of relevant models. This project aims to add basic support for
7+
commonly used datasets, models, and data pre-processing/cleaning methods for
8+
time-series data.
9+
10+
This project consists of four phases (Documentation & Testing is required in every phase):
11+
12+
1. Implementation of Common Datasets:
13+
- Related Issue: #1319
14+
- Add at least 2 datasets.
15+
2. Implementation/Updating of Operations:
16+
- Related Issue: #1321
17+
- Add at least 2 operations.
18+
3. Implementation of Relevant Models:
19+
- Related Issue: #1320
20+
- Add at least 3 models.
21+
4. Add documentation and examples showing all the three previous phases in
22+
action together. You can use Jupyter notebooks, or rST with python scripts
23+
and CLI examples, for the same.
24+
25+
## Skills
26+
27+
- Python
28+
- Familiarity with Machine Learning
29+
- Experience with various machine learning frameworks (Pytorch & TensorFlow
30+
would be a plus)
31+
32+
## Difficulty
33+
34+
Beginner
35+
36+
## Estimated Time Required
37+
38+
350 hours
39+
40+
## Related Readings
41+
42+
- Related readings and links have also been added in the related issues mentioned above.
43+
- https://intel.github.io/dffml/master/examples/data_cleanup/data_cleanup.html
44+
- https://intel.github.io/dffml/master/examples/icecream_sales.html
45+
- https://intel.github.io/dffml/master/examples/or_covid_data_by_county.html
46+
- https://intel.github.io/dffml/master/api/source/dataset/base.html
47+
- https://intel.github.io/dffml/master/api/source/dataset/iris.html
48+
49+
## Getting Started
50+
51+
- Read the contributing guidelines
52+
- https://intel.github.io/dffml/master/contributing/index.html
53+
- Go through the quickstart
54+
- https://intel.github.io/dffml/master/quickstart/model.html
55+
- Go through the model tutorials
56+
- https://intel.github.io/dffml/master/tutorials/models/
57+
- Go through the model plugins
58+
- https://intel.github.io/dffml/master/plugins/dffml_model.html
59+
- Go through Jupyter Notebook Examples (they also have video walkthroughs available)
60+
- https://intel.github.io/dffml/master/examples/notebooks/index.html
61+
- You don't need to go through all of them. Just get a feel for running a few
62+
63+
## Potential Mentors
64+
65+
- [John Andersen](https://github.com/pdxjohnny)
66+
- [Saahil Ali](https://github.com/programmer290399)

0 commit comments

Comments
 (0)