- [x] N-grams, TFIDF - [ ] word embeddings - [x] n-gram embeddings for SVC and get feature importance - [ ] clustering by topic (BERTtopic?) - [ ] check maximum token length