Skip to content

Commit 6ed123e

Browse files
authored
Set up pre-commit hooks (#2283)
1 parent 0c930d2 commit 6ed123e

File tree

61 files changed

+1078
-1088
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+1078
-1088
lines changed

.gitattributes

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
*.ipynb linguist-documentation
1+
*.ipynb linguist-documentation

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ body:
3333
value: |
3434
```python
3535
from bertopic import BERTopic
36-
36+
3737
```
3838
3939
- type: input

.github/workflows/lint.yml

Lines changed: 0 additions & 32 deletions
This file was deleted.

.github/workflows/testing.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,14 @@ on:
1111
- dev
1212

1313
jobs:
14+
lint:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
- uses: actions/setup-python@v5
19+
# Ref: https://github.com/pre-commit/action
20+
- uses: pre-commit/[email protected]
21+
1422
build:
1523
runs-on: ubuntu-latest
1624
strategy:
@@ -25,7 +33,7 @@ jobs:
2533
python-version: ${{ matrix.python-version }}
2634
- name: Install dependencies
2735
run: |
28-
python -m pip install --upgrade pip
36+
python -m pip install --upgrade pip
2937
pip install -e ".[test]"
3038
- name: Run Checking Mechanisms
3139
run: make check

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,4 @@ venv.bak/
8484
.DS_Store
8585

8686
# mkdocs
87-
site/
87+
site/

.pre-commit-config.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
repos:
2+
- repo: https://github.com/pre-commit/pre-commit-hooks
3+
rev: v5.0.0
4+
hooks:
5+
- id: trailing-whitespace
6+
exclude: |
7+
(?x)^(
8+
README.md|
9+
docs/
10+
)$
11+
- id: end-of-file-fixer
12+
exclude_types: [html, svg]
13+
- id: check-yaml
14+
- id: check-added-large-files
15+
- repo: https://github.com/astral-sh/ruff-pre-commit
16+
rev: v0.9.9
17+
hooks:
18+
- id: ruff
19+
args: [--fix, --show-fixes, --exit-non-zero-on-fix]
20+
- id: ruff-format

CONTRIBUTING.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# Contributing to BERTopic
22

3-
Hi! Thank you for considering contributing to BERTopic. With the modular nature of BERTopic, many new add-ons, backends, representation models, sub-models, and LLMs, can quickly be added to keep up with the incredibly fast-pacing field.
3+
Hi! Thank you for considering contributing to BERTopic. With the modular nature of BERTopic, many new add-ons, backends, representation models, sub-models, and LLMs, can quickly be added to keep up with the incredibly fast-pacing field.
44

55
Whether contributions are new features, better documentation, bug fixes, or improvement on the repository itself, anything is appreciated!
66

77
## 📚 Guidelines
88

99
### 🤖 Contributing Code
1010

11-
To contribute to this project, we follow an `issue -> pull request` approach for main features and bug fixes. This means that any new feature, bug fix, or anything else that touches on code directly needs to start from an issue first. That way, the main discussion about what needs to be added/fixed can be done in the issue before creating a pull request. This makes sure that we are on the same page before you start coding your pull request. If you start working on an issue, please assign it to yourself but do so after there is an agreement with the maintainer, [@MaartenGr](https://github.com/MaartenGr).
11+
To contribute to this project, we follow an `issue -> pull request` approach for main features and bug fixes. This means that any new feature, bug fix, or anything else that touches on code directly needs to start from an issue first. That way, the main discussion about what needs to be added/fixed can be done in the issue before creating a pull request. This makes sure that we are on the same page before you start coding your pull request. If you start working on an issue, please assign it to yourself but do so after there is an agreement with the maintainer, [@MaartenGr](https://github.com/MaartenGr).
1212

1313
When there is agreement on the assigned approach, a pull request can be created in which the fix/feature can be added. This follows a ["fork and pull request"](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) workflow.
1414
Please do not try to push directly to this repo unless you are a maintainer.
@@ -19,7 +19,7 @@ There are exceptions to the `issue -> pull request` approach that are typically
1919
* Docstrings
2020
* etc.
2121

22-
There is a large focus on documentation in this repository, so please make sure to add extensive descriptions of features when creating the pull request.
22+
There is a large focus on documentation in this repository, so please make sure to add extensive descriptions of features when creating the pull request.
2323

2424
Note that the main focus of pull requests and code should be:
2525
* Easy readability
@@ -28,7 +28,7 @@ Note that the main focus of pull requests and code should be:
2828

2929
## 🚀 Quick Start
3030

31-
To start contributing, make sure to first start from a fresh environment. Using an environment manager, such as `conda` or `pyenv` helps in making sure that your code is reproducible and tracks the versions you have in your environment.
31+
To start contributing, make sure to first start from a fresh environment. Using an environment manager, such as `conda` or `pyenv` helps in making sure that your code is reproducible and tracks the versions you have in your environment.
3232

3333
If you are using conda, you can approach it as follows:
3434

@@ -53,12 +53,12 @@ If you believe an error is incorrectly flagged, use a [`# noqa:` comment to supp
5353

5454
## 🤓 Collaborative Efforts
5555

56-
When you run into any issue with the above or need help to start with a pull request, feel free to reach out in the issues! As with all repositories, this one has its particularities as a result of the maintainer's view. Each repository is quite different and so will their processes.
56+
When you run into any issue with the above or need help to start with a pull request, feel free to reach out in the issues! As with all repositories, this one has its particularities as a result of the maintainer's view. Each repository is quite different and so will their processes.
5757

5858
## 🏆 Recognition
5959

60-
If your contribution has made its way into a new release of BERTopic, you will be given credit in the changelog of the new release! Regardless of the size of the contribution, any help is greatly appreciated.
60+
If your contribution has made its way into a new release of BERTopic, you will be given credit in the changelog of the new release! Regardless of the size of the contribution, any help is greatly appreciated.
6161

6262
## 🎈 Release
6363

64-
BERTopic tries to mostly follow [semantic versioning](https://semver.org/) for its new releases. Even though BERTopic has been around for a few years now, it is still pre-1.0 software. With the rapid chances in the field and as a way to keep up, this versioning is on purpose. Backwards-compatibility is taken into account but integrating new features and thereby keeping up with the field takes priority. Especially since BERTopic focuses on modularity, flexibility is necessary.
64+
BERTopic tries to mostly follow [semantic versioning](https://semver.org/) for its new releases. Even though BERTopic has been around for a few years now, it is still pre-1.0 software. With the rapid chances in the field and as a way to keep up, this versioning is on purpose. Backwards-compatibility is taken into account but integrating new features and thereby keeping up with the field takes priority. Especially since BERTopic focuses on modularity, flexibility is necessary.

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,4 +18,4 @@ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1818
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1919
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2020
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21-
SOFTWARE.
21+
SOFTWARE.

bertopic/_bertopic.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -603,7 +603,7 @@ def transform(
603603
)
604604

605605
# Transform without hdbscan_model and umap_model using only cosine similarity
606-
elif type(self.hdbscan_model) == BaseCluster:
606+
elif type(self.hdbscan_model) is BaseCluster:
607607
logger.info("Predicting topic assignments through cosine similarity of topic and document embeddings.")
608608
sim_matrix = cosine_similarity(embeddings, np.array(self.topic_embeddings_))
609609
predictions = np.argmax(sim_matrix, axis=1) - self._outliers
@@ -3584,7 +3584,7 @@ def merge_models(cls, models, min_similarity: float = 0.7, embedding_model=None)
35843584

35853585
# Replace embedding model if one is specifically chosen
35863586
verbose = any([model.verbose for model in models])
3587-
if embedding_model is not None and type(merged_model.embedding_model) == BaseEmbedder:
3587+
if embedding_model is not None and type(merged_model.embedding_model) is BaseEmbedder:
35883588
merged_model.embedding_model = select_backend(embedding_model, verbose=verbose)
35893589
return merged_model
35903590

bertopic/_save_utils.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,10 +61,10 @@
6161
6262
# {MODEL_NAME}
6363
64-
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
65-
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
64+
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
65+
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
6666
67-
## Usage
67+
## Usage
6868
6969
To use this model, please install BERTopic:
7070
@@ -88,9 +88,9 @@
8888
8989
<details>
9090
<summary>Click here for an overview of all topics.</summary>
91-
91+
9292
{TOPICS}
93-
93+
9494
</details>
9595
9696
## Training hyperparameters

0 commit comments

Comments
 (0)