Skip to content

Commit b7cba7b

Browse files
committed
doc: small README facelift
Signed-off-by: Alexander Bezzubov <[email protected]>
1 parent 57777d0 commit b7cba7b

File tree

1 file changed

+22
-28
lines changed

1 file changed

+22
-28
lines changed

README.md

Lines changed: 22 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,31 +9,25 @@ OSS tools covered:
99
- [gitbase](https://docs.sourced.tech/gitbase)
1010
- [bblfsh](https://doc.bblf.sh)
1111
- [BigARTM](http://bigartm.org)
12-
- [OpenNMT](http://opennmt.net)
12+
- [PyTourch](https://pytorch.org)
1313

1414
<details>
1515
<summary>Abstract</summary>
1616

17-
> Machine Learning on Source Code (MLonCode) is an emerging and exciting research domain which stands at the sweet spot between deep learning, natural language processing, social science, and programming.
17+
> Machine Learning on Source Code (MLonCode) is an emerging research domain which stands at the > intersection of deep learning, natural language processing, software engineering and programming > language communities.
1818
>
19-
> During this 2 hours workshop, we are going to show you how to extract insights from code bases—step by step—by shedding light on those crucial aspects:
19+
> During this 3.5 hours workshop, we will review the recent SE tasks that benefit from applying ML and focus the hands-on experience on:
20+
> - extracting data from the real source code and
21+
> - developing multiple different ML models
22+
> - for a particular task of source code summarization (or function name suggestion).
2023
>
21-
> - What information is available in your code
22-
> - How to extract this information
23-
> - What can you do with this knowledge: what are the tasks solvable by MLonCode
24-
> - Which models can be used to solve them
24+
> At the end of the workshop participants will build 2 working models on a real dataset, producing > near state-of-the-art results. Practical skill of extracting information from source code as well > as modeling different aspects of it are going to be acquired.
2525
>
26-
> To get our hands dirty, we will solve several example tasks, using source{d}, an open source stack to gain insights from codebases:
27-
>
28-
> - Suggest function names automatically
29-
> - Cluster developers
30-
> - Search projects by similarity
31-
>
32-
> Prerequisites: a laptop with Docker installed. We will provide an image to all participants.
26+
> Prerequisites: familiarity with the basics of DeepLearning, a laptop with Docker installed
3327
3428
</details>
3529

36-
Slides: on [gDrive](https://docs.google.com/presentation/d/1vF0JMagmXXzn-h-OaJu6CsDt78oSQSg58YFJsBUaHxk/edit#slide=id.g4f0d75b8b4_0_0)
30+
Slides: on [gDrive](#TBD)
3731

3832
## Prerequisites
3933

@@ -57,7 +51,7 @@ Run bblfsh
5751
docker run \
5852
--detach \
5953
--rm \
60-
--name devfest_bblfshd \
54+
--name amld_bblfshd \
6155
--privileged \
6256
--publish 9432:9432 \
6357
bblfsh/bblfshd:v2.15.0-drivers \
@@ -70,10 +64,10 @@ Run gitbase
7064
docker run \
7165
--detach \
7266
--rm \
73-
--name devfest_gitbase \
67+
--name amld_gitbase \
7468
--publish 3306:3306 \
75-
--link devfest_bblfshd:devfest_bblfshd \
76-
--env BBLFSH_ENDPOINT=devfest_bblfshd:9432 \
69+
--link amld_bblfshd:amld_bblfshd \
70+
--env BBLFSH_ENDPOINT=amld_bblfshd:9432 \
7771
--env MAX_MEMORY=1024 \
7872
--volume $(pwd)/repos/git-data:/opt/repos \
7973
srcd/gitbase:v0.24.0-rc2
@@ -84,13 +78,13 @@ Run the jupyter image
8478
```shell
8579
docker run \
8680
--rm \
87-
--name devfest_jupyter \
81+
--name amld_jupyter \
8882
--publish 8888:8888 \
89-
--link devfest_bblfshd:devfest_bblfshd \
90-
--link devfest_gitbase:devfest_gitbase \
91-
--volume $(pwd)/notebooks:/devfest/notebooks \
92-
--volume $(pwd)/repos:/devfest/repos \
93-
mloncode/devfest
83+
--link amld_bblfshd:amld_bblfshd \
84+
--link amld_gitbase:amld_gitbase \
85+
--volume $(pwd)/notebooks:/amld/notebooks \
86+
--volume $(pwd)/repos:/amld/repos \
87+
mloncode/amld
9488
```
9589

9690
<details>
@@ -116,16 +110,16 @@ make
116110

117111
We are going to use top 50 repositories from [Apache Software Foundation](https://www.apache.org) though this workshop.
118112

119-
[Notebook 1: data collection pipeline](http://127.0.0.1:8888/notebooks/Download%20repositories.ipynb)
113+
[Notebook 1: data collection pipeline](http://127.0.0.1:8888/notebooks/Download%20repositories.ipynb) ([example](notebooks/Download%20repositories.ipynb))
120114

121115
### 2. Project and Developer Similarities
122116

123117
Build a vector model for projects and developers using [Topic Modelling](https://en.wikipedia.org/wiki/Topic_model) of code identifiers.
124118

125-
[Notebook 2: project and developer similarities](http://127.0.0.1:8888/notebooks/Project%20and%20Developer%20Similarity.ipynb)
119+
[Notebook 2: project and developer similarities](http://127.0.0.1:8888/notebooks/Project%20and%20Developer%20Similarity.ipynb) ([example](notebooks/Project%20and%20Developer%20Similarity.ipynb))
126120

127121
### 3. Function Name Suggestion
128122

129123
Train a NMT [seq2seq model](https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1) for predicting method names based on identifiers in method bodies.
130124

131-
[Notebook 2: function name suggestion](http://127.0.0.1:8888/notebooks/Name%20suggestion.ipynb)
125+
[Notebook 2: function name suggestion](http://127.0.0.1:8888/notebooks/Name%20suggestion.ipynb) ([example](notebooks/Name%20suggestion.ipynb))

0 commit comments

Comments
 (0)