Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
*.ipynb linguist-documentation
*.ipynb linguist-documentation
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ body:
value: |
```python
from bertopic import BERTopic

```

- type: input
Expand Down
32 changes: 0 additions & 32 deletions .github/workflows/lint.yml

This file was deleted.

10 changes: 9 additions & 1 deletion .github/workflows/testing.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ on:
- dev

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
# Ref: https://github.com/pre-commit/action
- uses: pre-commit/[email protected]

build:
runs-on: ubuntu-latest
strategy:
Expand All @@ -25,7 +33,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install --upgrade pip
pip install -e ".[test]"
- name: Run Checking Mechanisms
run: make check
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,4 +84,4 @@ venv.bak/
.DS_Store

# mkdocs
site/
site/
20 changes: 20 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace
exclude: |
(?x)^(
README.md|
docs/
)$
- id: end-of-file-fixer
exclude_types: [html, svg]
- id: check-yaml
- id: check-added-large-files
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.9.9
hooks:
- id: ruff
args: [--fix, --show-fixes, --exit-non-zero-on-fix]
- id: ruff-format
14 changes: 7 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Contributing to BERTopic

Hi! Thank you for considering contributing to BERTopic. With the modular nature of BERTopic, many new add-ons, backends, representation models, sub-models, and LLMs, can quickly be added to keep up with the incredibly fast-pacing field.
Hi! Thank you for considering contributing to BERTopic. With the modular nature of BERTopic, many new add-ons, backends, representation models, sub-models, and LLMs, can quickly be added to keep up with the incredibly fast-pacing field.

Whether contributions are new features, better documentation, bug fixes, or improvement on the repository itself, anything is appreciated!

## 📚 Guidelines

### 🤖 Contributing Code

To contribute to this project, we follow an `issue -> pull request` approach for main features and bug fixes. This means that any new feature, bug fix, or anything else that touches on code directly needs to start from an issue first. That way, the main discussion about what needs to be added/fixed can be done in the issue before creating a pull request. This makes sure that we are on the same page before you start coding your pull request. If you start working on an issue, please assign it to yourself but do so after there is an agreement with the maintainer, [@MaartenGr](https://github.com/MaartenGr).
To contribute to this project, we follow an `issue -> pull request` approach for main features and bug fixes. This means that any new feature, bug fix, or anything else that touches on code directly needs to start from an issue first. That way, the main discussion about what needs to be added/fixed can be done in the issue before creating a pull request. This makes sure that we are on the same page before you start coding your pull request. If you start working on an issue, please assign it to yourself but do so after there is an agreement with the maintainer, [@MaartenGr](https://github.com/MaartenGr).

When there is agreement on the assigned approach, a pull request can be created in which the fix/feature can be added. This follows a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
Please do not try to push directly to this repo unless you are a maintainer.
Expand All @@ -19,7 +19,7 @@ There are exceptions to the `issue -> pull request` approach that are typically
* Docstrings
* etc.

There is a large focus on documentation in this repository, so please make sure to add extensive descriptions of features when creating the pull request.
There is a large focus on documentation in this repository, so please make sure to add extensive descriptions of features when creating the pull request.

Note that the main focus of pull requests and code should be:
* Easy readability
Expand All @@ -28,7 +28,7 @@ Note that the main focus of pull requests and code should be:

## 🚀 Quick Start

To start contributing, make sure to first start from a fresh environment. Using an environment manager, such as `conda` or `pyenv` helps in making sure that your code is reproducible and tracks the versions you have in your environment.
To start contributing, make sure to first start from a fresh environment. Using an environment manager, such as `conda` or `pyenv` helps in making sure that your code is reproducible and tracks the versions you have in your environment.

If you are using conda, you can approach it as follows:

Expand All @@ -53,12 +53,12 @@ If you believe an error is incorrectly flagged, use a [`# noqa:` comment to supp

## 🤓 Collaborative Efforts

When you run into any issue with the above or need help to start with a pull request, feel free to reach out in the issues! As with all repositories, this one has its particularities as a result of the maintainer's view. Each repository is quite different and so will their processes.
When you run into any issue with the above or need help to start with a pull request, feel free to reach out in the issues! As with all repositories, this one has its particularities as a result of the maintainer's view. Each repository is quite different and so will their processes.

## 🏆 Recognition

If your contribution has made its way into a new release of BERTopic, you will be given credit in the changelog of the new release! Regardless of the size of the contribution, any help is greatly appreciated.
If your contribution has made its way into a new release of BERTopic, you will be given credit in the changelog of the new release! Regardless of the size of the contribution, any help is greatly appreciated.

## 🎈 Release

BERTopic tries to mostly follow [semantic versioning](https://semver.org/) for its new releases. Even though BERTopic has been around for a few years now, it is still pre-1.0 software. With the rapid chances in the field and as a way to keep up, this versioning is on purpose. Backwards-compatibility is taken into account but integrating new features and thereby keeping up with the field takes priority. Especially since BERTopic focuses on modularity, flexibility is necessary.
BERTopic tries to mostly follow [semantic versioning](https://semver.org/) for its new releases. Even though BERTopic has been around for a few years now, it is still pre-1.0 software. With the rapid chances in the field and as a way to keep up, this versioning is on purpose. Backwards-compatibility is taken into account but integrating new features and thereby keeping up with the field takes priority. Especially since BERTopic focuses on modularity, flexibility is necessary.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
SOFTWARE.
4 changes: 2 additions & 2 deletions bertopic/_bertopic.py
Original file line number Diff line number Diff line change
Expand Up @@ -603,7 +603,7 @@ def transform(
)

# Transform without hdbscan_model and umap_model using only cosine similarity
elif type(self.hdbscan_model) == BaseCluster:
elif type(self.hdbscan_model) is BaseCluster:
logger.info("Predicting topic assignments through cosine similarity of topic and document embeddings.")
sim_matrix = cosine_similarity(embeddings, np.array(self.topic_embeddings_))
predictions = np.argmax(sim_matrix, axis=1) - self._outliers
Expand Down Expand Up @@ -3584,7 +3584,7 @@ def merge_models(cls, models, min_similarity: float = 0.7, embedding_model=None)

# Replace embedding model if one is specifically chosen
verbose = any([model.verbose for model in models])
if embedding_model is not None and type(merged_model.embedding_model) == BaseEmbedder:
if embedding_model is not None and type(merged_model.embedding_model) is BaseEmbedder:
merged_model.embedding_model = select_backend(embedding_model, verbose=verbose)
return merged_model

Expand Down
10 changes: 5 additions & 5 deletions bertopic/_save_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@

# {MODEL_NAME}

This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

## Usage
## Usage

To use this model, please install BERTopic:

Expand All @@ -88,9 +88,9 @@

<details>
<summary>Click here for an overview of all topics.</summary>

{TOPICS}

</details>

## Training hyperparameters
Expand Down
2 changes: 1 addition & 1 deletion bertopic/representation/_litellm.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@


DEFAULT_PROMPT = """
I have a topic that contains the following documents:
I have a topic that contains the following documents:
[DOCUMENTS]
The topic is described by the following keywords: [KEYWORDS]
Based on the information above, extract a short topic label in the following format:
Expand Down
Loading
Loading