Skip to content

Commit 829b46b

Browse files
Update bertopic.md: replaced all instances of RLFH to RLHF (#1471)
1 parent 579948c commit 829b46b

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

bertopic.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ The benefits of this integration are particularly notable for production use cas
199199
In addition to the Hugging Face Hub integration, BERTopic now supports serialization using the [safetensors library](https://huggingface.co/docs/safetensors/). Safetensors is a new simple format for storing tensors safely (instead of pickle), which is still fast (zero-copy). We’re excited to see more and more libraries leveraging safetensors for safe serialization. You can read more about a recent audit of the library in this [blog post](https://huggingface.co/blog/safetensors-security-audit).
200200

201201

202-
### An example of using BERTopic to explore RLFH datasets
202+
### An example of using BERTopic to explore RLHF datasets
203203

204204
To illustrate some of the power of BERTopic let's look at an example of how it can be used to monitor changes in topics in datasets used to train chat models.
205205

@@ -211,7 +211,7 @@ BERTopic gives us various ways of visualizing a dataset. We can see the top 8 to
211211

212212
![Words associated with top 8 topics](https://huggingface.co/datasets/huggingface/documentation-images/resolve/2d1113254a370972470d42e122df150f3551cc07/blog/BERTopic/topic_word_scores.png)
213213

214-
[databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) is another dataset that can be used to train an RLFH model. The approach taken to creating this dataset was quite different from the OpenAssistant Conversations dataset since it was created by employees of Databricks instead of being crowd sourced via volunteers. Perhaps we can use our trained BERTopic model to compare the topics across these two datasets?
214+
[databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) is another dataset that can be used to train an RLHF model. The approach taken to creating this dataset was quite different from the OpenAssistant Conversations dataset since it was created by employees of Databricks instead of being crowd sourced via volunteers. Perhaps we can use our trained BERTopic model to compare the topics across these two datasets?
215215

216216
The new BERTopic Hub integrations mean we can load this trained model and apply it to new examples.
217217

0 commit comments

Comments
 (0)