Commit 2f47b92
Maarten Grootendorst
v0.10.0 (#492)
* Use any dimensionality reduction technique instead of UMAP
* Use any clustering technique instead of HDBSCAN
* Add a CountVectorizer page with tips and tricks on how to create topic representations that fit your use case
* Added pages on how to use other dimensionality reduction and clustering algorithms
* Additional instructions on how to reduce outliers in the FAQ
* Fixed `None` being returned for probabilities when transforming unseen documents
* Replaced all instances of `arg:` with `Arguments:` for consistency
* Before saving a fitted BERTopic instance, we remove the stopwords in the fitted CountVectorizer model as it can get quite large due to the number of words that end in stopwords if `min_df` is set to a value larger than 1
* Set `"hdbscan>=0.8.28"` to prevent numpy issues
* Update gensim dependency to `>=4.0.0` (#371)
* Fix topic 0 not appearing in visualizations (#472)
* Fix #506
* Fix #4291 parent 681ac26 commit 2f47b92
File tree
26 files changed
+933
-271
lines changed- bertopic
- backend
- plotting
- docs
- getting_started
- clustering
- countvectorizer
- custom_embeddings
- dim_reduction
- embeddings
- models
- tips_and_tricks
- visualization
- tests
26 files changed
+933
-271
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| 78 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| 56 | + | |
56 | 57 | | |
57 | 58 | | |
| 59 | + | |
58 | 60 | | |
59 | 61 | | |
| 62 | + | |
60 | 63 | | |
61 | 64 | | |
62 | 65 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
51 | | - | |
52 | | - | |
| 51 | + | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
61 | | - | |
| 61 | + | |
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| 49 | + | |
| 50 | + | |
49 | 51 | | |
50 | 52 | | |
51 | 53 | | |
52 | | - | |
| 54 | + | |
53 | 55 | | |
54 | | - | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| 60 | + | |
59 | 61 | | |
60 | 62 | | |
61 | 63 | | |
62 | | - | |
| 64 | + | |
63 | 65 | | |
64 | | - | |
| 66 | + | |
65 | 67 | | |
66 | 68 | | |
67 | 69 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| 60 | + | |
| 61 | + | |
60 | 62 | | |
61 | | - | |
| 63 | + | |
62 | 64 | | |
63 | | - | |
| 65 | + | |
64 | 66 | | |
65 | | - | |
| 67 | + | |
66 | 68 | | |
67 | 69 | | |
68 | 70 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
| 46 | + | |
| 47 | + | |
46 | 48 | | |
47 | 49 | | |
48 | 50 | | |
49 | | - | |
| 51 | + | |
50 | 52 | | |
51 | | - | |
| 53 | + | |
52 | 54 | | |
53 | 55 | | |
54 | 56 | | |
| |||
0 commit comments