You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-28Lines changed: 22 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,31 +9,25 @@ OSS tools covered:
9
9
-[gitbase](https://docs.sourced.tech/gitbase)
10
10
-[bblfsh](https://doc.bblf.sh)
11
11
-[BigARTM](http://bigartm.org)
12
-
-[OpenNMT](http://opennmt.net)
12
+
-[PyTorch](https://pytorch.org)
13
13
14
14
<details>
15
15
<summary>Abstract</summary>
16
16
17
-
> Machine Learning on Source Code (MLonCode) is an emerging and exciting research domain which stands at the sweet spot between deep learning, natural language processing, social science, and programming.
17
+
> Machine Learning on Source Code (MLonCode) is an emerging research domain which stands at the intersection of deep learning, natural language processing, software engineering and programming language communities.
18
18
>
19
-
> During this 2 hours workshop, we are going to show you how to extract insights from code bases—step by step—by shedding light on those crucial aspects:
19
+
> During this 3h30 workshop, we will review recent Software Engineering tasks that benefit from applying Machine Learning, with a focus on hands-on experience on:
20
+
> - extracting data from real source code
21
+
> - developing multiple Machine Learning models
22
+
> - for a particular task of source code summarization (or function name suggestion).
20
23
>
21
-
> - What information is available in your code
22
-
> - How to extract this information
23
-
> - What can you do with this knowledge: what are the tasks solvable by MLonCode
24
-
> - Which models can be used to solve them
24
+
> At the end of the workshop participants will build 2 working models on a real dataset, producing near state-of-the-art results. Practical skill of extracting information from source code as well as modelling different aspects of it are going to be acquired.
25
25
>
26
-
> To get our hands dirty, we will solve several example tasks, using source{d}, an open source stack to gain insights from codebases:
27
-
>
28
-
> - Suggest function names automatically
29
-
> - Cluster developers
30
-
> - Search projects by similarity
31
-
>
32
-
> Prerequisites: a laptop with Docker installed. We will provide an image to all participants.
26
+
> Prerequisites: familiarity with the basics of DeepLearning, a laptop with Docker installed
33
27
34
28
</details>
35
29
36
-
Slides: on [gDrive](https://docs.google.com/presentation/d/1vF0JMagmXXzn-h-OaJu6CsDt78oSQSg58YFJsBUaHxk/edit#slide=id.g4f0d75b8b4_0_0)
30
+
Slides: on [gDrive](#TBD)
37
31
38
32
## Prerequisites
39
33
@@ -57,7 +51,7 @@ Run bblfsh
57
51
docker run \
58
52
--detach \
59
53
--rm \
60
-
--name devfest_bblfshd \
54
+
--name amld_bblfshd \
61
55
--privileged \
62
56
--publish 9432:9432 \
63
57
bblfsh/bblfshd:v2.15.0-drivers \
@@ -70,10 +64,10 @@ Run gitbase
70
64
docker run \
71
65
--detach \
72
66
--rm \
73
-
--name devfest_gitbase \
67
+
--name amld_gitbase \
74
68
--publish 3306:3306 \
75
-
--link devfest_bblfshd:devfest_bblfshd \
76
-
--env BBLFSH_ENDPOINT=devfest_bblfshd:9432 \
69
+
--link amld_bblfshd:amld_bblfshd \
70
+
--env BBLFSH_ENDPOINT=amld_bblfshd:9432 \
77
71
--env MAX_MEMORY=1024 \
78
72
--volume $(pwd)/repos/git-data:/opt/repos \
79
73
srcd/gitbase:v0.24.0-rc2
@@ -84,13 +78,13 @@ Run the jupyter image
84
78
```shell
85
79
docker run \
86
80
--rm \
87
-
--name devfest_jupyter \
81
+
--name amld_jupyter \
88
82
--publish 8888:8888 \
89
-
--link devfest_bblfshd:devfest_bblfshd \
90
-
--link devfest_gitbase:devfest_gitbase \
91
-
--volume $(pwd)/notebooks:/devfest/notebooks \
92
-
--volume $(pwd)/repos:/devfest/repos \
93
-
mloncode/devfest
83
+
--link amld_bblfshd:amld_bblfshd \
84
+
--link amld_gitbase:amld_gitbase \
85
+
--volume $(pwd)/notebooks:/amld/notebooks \
86
+
--volume $(pwd)/repos:/amld/repos \
87
+
mloncode/amld
94
88
```
95
89
96
90
<details>
@@ -116,16 +110,16 @@ make
116
110
117
111
We are going to use top 50 repositories from [Apache Software Foundation](https://www.apache.org) though this workshop.
118
112
119
-
[Notebook 1: data collection pipeline](http://127.0.0.1:8888/notebooks/Download%20repositories.ipynb)
113
+
[Notebook 1: data collection pipeline](http://127.0.0.1:8888/notebooks/Download%20repositories.ipynb) ([example](notebooks/Download%20repositories.ipynb))
120
114
121
115
### 2. Project and Developer Similarities
122
116
123
117
Build a vector model for projects and developers using [Topic Modelling](https://en.wikipedia.org/wiki/Topic_model) of code identifiers.
124
118
125
-
[Notebook 2: project and developer similarities](http://127.0.0.1:8888/notebooks/Project%20and%20Developer%20Similarity.ipynb)
119
+
[Notebook 2: project and developer similarities](http://127.0.0.1:8888/notebooks/Project%20and%20Developer%20Similarity.ipynb) ([example](notebooks/Project%20and%20Developer%20Similarity.ipynb))
126
120
127
121
### 3. Function Name Suggestion
128
122
129
123
Train a NMT [seq2seq model](https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1) for predicting method names based on identifiers in method bodies.
130
124
131
-
[Notebook 2: function name suggestion](http://127.0.0.1:8888/notebooks/Name%20suggestion.ipynb)
125
+
[Notebook 2: function name suggestion](http://127.0.0.1:8888/notebooks/Name%20suggestion.ipynb) ([example](notebooks/Name%20suggestion.ipynb))
0 commit comments