Ml sklearn structure (integration to the AB)#385
Open
tzamalisp wants to merge 66 commits intometabrainz:masterfrom
Open
Ml sklearn structure (integration to the AB)#385tzamalisp wants to merge 66 commits intometabrainz:masterfrom
tzamalisp wants to merge 66 commits intometabrainz:masterfrom
Conversation
PEP 8 issues fix 02
remove gaia best models params
issue fixed - tracks in gt file do not exist in low-level
predict with MBID only (not the whole API low-level url))
add sklearn requirements.txt
create gaia evaluation method
change name of the model directory
load updated files
change the path saves - reduce print/logging messages in evaluation
requirements.txt --> lowercase
remove requirements.txt
add requirements.txt with lowercase
use of now.isoformat()
reports creation - datetime to start of the report, code improvements
review updates 01
syntax for documentation strings as in AB fixed
ground_truth file typo fixed
use of os.makedirs(full_path, exist_ok=True)
create results_dict = {....} in a single call
simplify processing step dict creation - documentation added
add lower-case in all process steps params
save best model after training to the whole data - add README
split evaluation in separate methods
split grid classification into separate methods
single logger setup
change logging set from int to str
dynamic project yaml saving - update readme predict section
add new arguments in readme
readme - add predict invoking
readme MBID in prediction
create project - default values declaration - documentation
relative imports
update predict readme and script arguments
import classification_project
readme path file required
readme, how ti works session - remove requirements jupyter notebook
requirements.txt - add requests, remove tensorflow
readme - add how training and predicting modes work
add new dockerfile for ML tool
readme - update doc only for parameters
add exports_directory in config when specified in arguments
sklearn model inserted in AB
|
Hello @tzamalisp! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-07-08 11:27:54 UTC |
36eb75e to
7b81889
Compare
Also move the selection of the tool to the advanced svm settings
This reverts commit 89d3276.
The export_path parameter should contain the complete path where the results should be stored. Everywhere we use exports_dir, we append it to the export_path. If its a matter of user convenience, we can add it back of the command line but merge it with export_path asap.
Making a class to just call a util method does not seem useful. Also, the directories are always created by the ClassifcationTaskManager and DatasetExplorer so we can just refer to their path with os.path.join elsewher. We already have a similar function in AB utils but that is used on Python 2.7 which does not have exist_ok arg available. Hence, keeping this.
Revert the change from previous commit for logging directory because we setup logging before trying to create all other output directories.
Instead of setting up logger again and again, we pass the logger around manually. In subsequent steps, this will be entirely replaced to utilise python's hierarchical logging. Can't do this just now because the file handler requires the dataset name.
This reverts commit 581b713.
Eliminate unused methods and move useful ones to individual methods.
We never really access these dicts other than iterating by calling `.items()` so the defaultdict are redundant and only add confusion.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New Machine Learning Infrastructure - Tool Integration (Google Summer of Code 2020)
In this Pull Request (PR), the main goal is the integration of the Machine Learning tool that is built, into the AcousticBrainz platform. Several aspects have been completed or have to be completed in order for the tool to be fully integrated into the AcousticBrainz project. This PR follows the previous PR, "New Machine Learning Infrastructure - Tool Development and Code Review by the Mentor" (closed), which is related to the phase of the GSoC of the tool development and the code review by the Mentor.
The implemented Machine Learning tool and the corresponding branch where the integration part takes place can be found at the forked repository of the AcousticBrainz platform of the link below. This link includes also a detailed description of how the tool works:
Till now, the tool has been developed successfully, and several models have been trained offline based on some datasets that were provided by the Mentor. The training process works successfully with the performance results (e.g. the accuracy of the model) being almost the same with the old machine learning infrastructure which is provided by the gaia tool.
In the integration part, a new docker image, the
server_dataset_evaluator_sklearnis created in order for the new ML infrastructure can run in Python 3 with the corresponding libraries dependencies. The previous tool runs in Python 2. Several other compatibility issues (as the commits depict) are solved so there will be no conflicts between the simultaneous execution of Python 2 and Python 3, based on the tool that the AcoustiBrainz platform uses. Additionally, the user interface with the relevant back-end services is updated. Now, the user has the ability to choose which ML tool will execute the music classification task and train the model (gaia or sklearn).The next steps that work have to be done are:
predictionsfunctionality.Finally, during the GSoC project period, some other PRs for code changes were created and implemented that correspond to: