-
Notifications
You must be signed in to change notification settings - Fork 1
Roadmap
Timo Denk edited this page Apr 15, 2019
·
10 revisions
- Write Basic Datacrawler
- implement datamodule-system
- Create Dataset Version 1
- according to specification
- create Kubernetes setup
- Create Dataset Version 2
- according to specification
- Create Setup for Inference Website
-
Background
- Familiarize with CNN-based screenshot processing (differences to natural image processing)
- Validate capabilities of DeepMind's graph nets library
- Potential loss terms for relative/absolute ranking
- Do more research on papers that were published in the field
- Read about set to vec techniques
-
Prototype. Development of a pagerank estimator that is a single CNN making a prediction based on a single screenshot. (Requirement: dataset v1 with "domain, screenshot, rank")
- Split into train, validation, test set
- Weighting of webpages (the dataset contains more low ranked websites, e.g. ranked 100k-200k than 10k-20k)
- Finding an architecture that works reasonably well.
- The achieved architectures serves as sort of a baseline.
-
Graph Network. Development of a graph network that takes a domain graph as its input. (Requirement: dataset v2 with "domain, screenshots, link graph structure, rank")
- Delve deep into the graph nets library.
- Implement a graph network for testing purposes, working on a toy dataset.
- Implement a graph network for the actual problem at hand.
- UI for an inference website
- Paper. Less than 10 pages summarizing the ML aspects of the project
- Week 8 (starting Feb 18th): Implement dataset v2 class and required data structures. Train on v1 with the loss from Burges et al. (2015): Learning to Rank using Gradient Descent.
- Week 9 (starting Feb 25th): Implement graph nets library for PyTorch and write a unit test to ensure it is working properly. An inspiration for such toy tasks can be found in those demo notebooks.
- Week 10 (starting Mar 4th): Continue with the work from week 9.
- Week 11 (starting Mar 11th): Implement the graph network that processes dataset v2 and train on it.
- Week 12 (starting Mar 18th): Run more trainings and experiment with network modifications, preprocessing methods, etc.
- Week 13 (starting Mar 25th): Runs
- Week 14 (starting Apr 1st): Filter visualization, interpretation
- Week 15 (starting Apr 8th): Documentation backbone and first +20 pages
- Week 16 (starting Apr 15th): Documentation +20 pages
- Week 17 (starting Apr 22nd): Documentation
- Week 18 (starting Apr 29th): Documentation
- Week 19 (starting May 6th): Safety margin
- Week 20 (starting May 13th): Safety margin
- Submission May 20th
- #1 [normal]: Train with non-discrete ground truth matrix
- #3 [time-intense]: pre-train vs. fine-tune vs. end-to-end
- #5 [fast]: GN with averaging, GN with max pooling, GN with 1 core block, GN with 3 core blocks (w and w/o weight sharing)
- #4 [fast]: best of (2.) with different b values
- #3 [fast]: best of (2.) with different edge choices, namely fully-connected, bi-directional, default