Please create one issue per bullet point when you start working on them and edit this message to add the link to it. - [ ] Avoid using bblfsh and gitbase for preprocessing - [ ] Make it simpler (move code to external file, use collapsible cells) - [ ] Add visualizations and other improvements (better clustering, better search)