Skip to content

Commit 6073046

Browse files
committed
Some documentation stuff
1 parent cfef4e6 commit 6073046

File tree

11 files changed

+178
-96
lines changed

11 files changed

+178
-96
lines changed

.github/pull_request_template.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
# PR Type
2-
[Feature | Fix | Documentation | Other() ]
2+
[Feature | Fix | Documentation | Other ]
33

44
# Short Description
5-
...
5+
6+
Clickup Ticket(s): Link(s) if applicable.
7+
8+
Add a short description of what is in this PR.
69

710
# Tests Added
8-
...
11+
12+
Describe the tests that have been added to ensure the codes correctness, if applicable.

.pre-commit-config.yaml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,12 +27,18 @@ repos:
2727
- repo: https://github.com/astral-sh/ruff-pre-commit
2828
rev: 'v0.11.13'
2929
hooks:
30-
- id: ruff
30+
- id: ruff-check
3131
args: [--fix, --exit-non-zero-on-fix]
3232
types_or: [python, jupyter]
3333
- id: ruff-format
3434
types_or: [python, jupyter]
3535

36+
- repo: https://github.com/jsh9/pydoclint
37+
rev: '0.6.6'
38+
hooks:
39+
- id: pydoclint
40+
args: [--style=google, --check-return-types=True, --exclude=tests]
41+
3642
- repo: https://github.com/pre-commit/mirrors-mypy
3743
rev: v1.16.0
3844
hooks:
@@ -71,6 +77,14 @@ repos:
7177
pass_filenames: false
7278
always_run: true
7379

80+
- repo: local
81+
hooks:
82+
- id: mypy legacy type check
83+
name: mypy legacy type check
84+
entry: python mypy_disallow_legacy_types.py
85+
language: python
86+
pass_filenames: true
87+
7488
ci:
7589
autofix_commit_msg: |
7690
[pre-commit.ci] Add auto fixes from pre-commit.com hooks

CONTRIBUTING.md

Lines changed: 65 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,86 @@
1-
# Contributing to aieng-template
1+
# Contributing to MIDST Toolkit
22

3-
Thanks for your interest in contributing to the aieng-template!
3+
Thanks for your interest in contributing to the MIDST Toolkit!
44

55
To submit PRs, please fill out the PR template along with the PR. If the PR
66
fixes an issue, don't forget to link the PR to the issue!
77

88
## Pre-commit hooks
99

10-
Once the python virtual environment is setup, you can run pre-commit hooks using:
10+
```bash
11+
pre-commit install
12+
```
1113

14+
To run the checks, some of which will automatically re-format your code to fit the standards, you can run
1215
```bash
1316
pre-commit run --all-files
1417
```
18+
It can also be run on a subset of files by omitting the `--all-files` option and pointing to specific files or folders.
19+
20+
If you're using VS Code for development, pre-commit should setup git hooks that execute the pre-commit checks each
21+
time you check code into your branch through the integrated source-control as well. This will ensure that each of your
22+
commits conform to the desired format before they are run remotely and without needing to remember to run the checks
23+
before pushing to a remote. If this isn't done automatically, you can find instructions for setting up these hooks
24+
manually online.
1525

1626
## Coding guidelines
1727

1828
For code style, we recommend the [PEP 8 style guide](https://peps.python.org/pep-0008/).
1929

20-
For docstrings we use [numpy format](https://numpydoc.readthedocs.io/en/latest/format.html).
30+
For code documentation, we try to adhere to the Google docstring style
31+
(See [here](https://google.github.io/styleguide/pyguide.html), Section: Comments and Doc-strings). The implementation
32+
of an extensive set of comments for the code in this repository is a work-in-progress. However, we are continuing to
33+
work towards a better commented state for the code. For development, as stated in the style guide,
34+
__any non-trivial or non-obvious methods added to the library should have a doc string__. For our library this
35+
applies only to code added to the main library in `midst_toolkit`. Examples, research code, and tests need not
36+
incorporate the strict rules of documentation, though clarifying and helpful comments in that code is also
37+
__strongly encouraged__.
38+
39+
> [!NOTE]
40+
> As a matter of convention choice, classes are documented through their `__init__` functions rather than at the
41+
> "class" level.
42+
43+
If you are using VS Code a very helpful integration is available to facilitate the creation of properly formatted
44+
doc-strings called autoDocstring [VS Code Page](https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring)
45+
and [Documentation](https://github.com/NilsJPWerner/autoDocstring). This tool will automatically generate a docstring
46+
template when starting a docstring with triple quotation marks (`"""`). To get the correct format, the following
47+
settings should be prescribed in your VS Code settings JSON:
48+
49+
```json
50+
{
51+
"autoDocstring.customTemplatePath": "",
52+
"autoDocstring.docstringFormat": "google",
53+
"autoDocstring.generateDocstringOnEnter": true,
54+
"autoDocstring.guessTypes": true,
55+
"autoDocstring.includeExtendedSummary": false,
56+
"autoDocstring.includeName": false,
57+
"autoDocstring.logLevel": "Info",
58+
"autoDocstring.quoteStyle": "\"\"\"",
59+
"autoDocstring.startOnNewLine": true
60+
}
61+
```
2162

2263
We use [ruff](https://docs.astral.sh/ruff/) for code formatting and static code
23-
analysis. Ruff checks various rules including [flake8](https://docs.astral.sh/ruff/faq/#how-does-ruff-compare-to-flake8). The pre-commit hooks show errors which you need to fix before submitting a PR.
64+
analysis. Ruff checks various rules including
65+
[flake8](https://docs.astral.sh/ruff/faq/#how-does-ruff-compare-to-flake8). The pre-commit hooks show errors which
66+
you need to fix before submitting a PR.
2467

2568
Last but not the least, we use type hints in our code which is then checked using
2669
[mypy](https://mypy.readthedocs.io/en/stable/).
70+
71+
**Note**: We use the modern mypy types introduced in Python 3.10 and above. See some of the
72+
[documentation here](https://mypy.readthedocs.io/en/stable/builtin_types.html)
73+
74+
For example, this means that we're using `list[str], tuple[int, int], tuple[int, ...], dict[str, int], type[C]` as
75+
built-in types and `Iterable[int], Sequence[bool], Mapping[str, int], Callable[[...], ...]` from collections.abc
76+
(as now recommended by mypy).
77+
78+
We also use the new Optional and Union specification style:
79+
```python
80+
Optional[typing_stuff] -> typing_stuff | None
81+
Union[typing1, typing2] -> typing1 | typing2
82+
Optional[Union[typing1, typing2]] -> typing1 | typing2 | None
83+
```
84+
85+
There is a custom script that enforces this style. It is not infallible. So if there is an issue with it please fix or
86+
report it to us.

README.md

Lines changed: 4 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# AI Engineering template (with uv)
1+
# MIDST Toolkit
22

33
----------------------------------------------------------------------------------------
44

@@ -8,10 +8,9 @@
88
[![codecov](https://codecov.io/github/VectorInstitute/midst-toolkit/graph/badge.svg?token=83MYFZ3UPA)](https://codecov.io/github/VectorInstitute/midst-toolkit)
99
![GitHub License](https://img.shields.io/github/license/VectorInstitute/midst-toolkit)
1010

11-
A template repo for AI Engineering projects (using ``python``) and ``uv``. This
12-
template is like our original AI Engineering [template](https://github.com/VectorInstitute/aieng-template),
13-
however, unlike how that template uses poetry, this one uses uv for dependency
14-
management (as well as packaging and publishing).
11+
A toolkit for facilitating MIA resiliency testing on diffusion-model-based synthetic tabular data. Many of the attacks
12+
included in this toolkit are based on the most success ones used in the
13+
[2025 SaTML MIDST Competition](https://vectorinstitute.github.io/MIDST/).
1514

1615
## 🧑🏿‍💻 Developing
1716

@@ -40,34 +39,3 @@ run:
4039
```bash
4140
uv sync --no-group docs
4241
```
43-
44-
If you're coming from `poetry` then you'll notice that the virtual environment
45-
is actually stored in the project root folder and is by default named as `.venv`.
46-
The other important note is that while `poetry` uses a "flat" layout of the project,
47-
`uv` opts for the the "src" layout. (For more info, see [here](https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/))
48-
49-
### Poetry to UV
50-
51-
The table below provides the `uv` equivalent counterparts for some of the more
52-
common `poetry` commands.
53-
54-
| Poetry | UV |
55-
|------------------------------------------------------|---------------------------------------------|
56-
| `poetry new <project-name>` # creates new project | `uv init <project-name>` |
57-
| `poetry install` # installs existing project | `uv sync` |
58-
| `poetry install --with docs,test` | `uv sync --group docs --group test` |
59-
| `poetry add numpy` | `uv add numpy` |
60-
| `poetry add pytest pytest-asyncio --groups dev` | `uv add pytest pytest-asyncio --groups dev` |
61-
| `poetry remove numpy` | `uv remove numpy` |
62-
| `poetry lock` | `uv lock` |
63-
| `poetry run <cmd>` # runs cmd with the project venv | `uv run <cmd>` |
64-
| `poetry build` | `uv build` |
65-
| `poetry publish` | `uv publish` |
66-
| `poetry cache clear pypi --all` | `uv cache clean` |
67-
68-
For the full list of `uv` commands, you can visit the official [docs](https://docs.astral.sh/uv/reference/cli/#uv).
69-
70-
### Tidbit
71-
72-
If you're curious about what "uv" stands for, it appears to have been more or
73-
less chosen [randomly](https://github.com/astral-sh/uv/issues/1349#issuecomment-1986451785).

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ plugins:
3434
handlers:
3535
python:
3636
options:
37-
docstring_style: numpy
37+
docstring_style: google
3838
members_order: source
3939
separate_signature: true
4040
show_overloads: true

mypy_disallow_legacy_types.py

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
import re
2+
import sys
3+
from collections.abc import Set
4+
5+
6+
# List of files that we want to skip with this check. Currently empty.
7+
files_to_ignore: Set[str] = {"Empty"}
8+
file_types_to_ignore: Set[str] = {".png", ".pkl", ".pt", ".md", ".svg", ".ico"}
9+
# List of disallowed types to search for that should no longer be imported from the typing library. These types
10+
# have been migrated to either collections.abc or into core python
11+
disallowed_types = [
12+
"Union",
13+
"Optional",
14+
"List",
15+
"Dict",
16+
"Sequence",
17+
"Set",
18+
"Callable",
19+
"Iterable",
20+
"Hashable",
21+
"Generator",
22+
"Tuple",
23+
"Mapping",
24+
"Type",
25+
]
26+
27+
28+
type_or = "|".join(disallowed_types)
29+
comma_separated_types = ", ".join(disallowed_types)
30+
file_suffixes = "|".join(file_types_to_ignore)
31+
file_type_regex = rf".*({file_suffixes})$"
32+
33+
34+
def filter_files_to_ignore(file_paths: list[str]) -> list[str]:
35+
file_paths = [file_path for file_path in file_paths if file_path not in files_to_ignore]
36+
file_paths = [file_path for file_path in file_paths if not re.match(file_type_regex, file_path)]
37+
return file_paths
38+
39+
40+
def construct_same_line_import_regex() -> str:
41+
return rf"from typing import ([^\n]*?, )*({type_or})(\n|, [^\n]*?\n)"
42+
43+
44+
def construct_multi_line_import_regex() -> str:
45+
return rf"from typing import \(\n(\s{{4}}.*,\n)*\s{{4}}({type_or}),\n(\s{{4}}.*,\n)*\)$"
46+
47+
48+
same_line_import_re = construct_same_line_import_regex()
49+
multi_line_import_re = construct_multi_line_import_regex()
50+
51+
52+
def discover_legacy_imports(file_paths: list[str]) -> None:
53+
file_paths = filter_files_to_ignore(file_paths=file_paths)
54+
for file_path in file_paths:
55+
with open(file_path, mode="r") as file_handle:
56+
file_contents = file_handle.read()
57+
same_line_match = re.search(same_line_import_re, file_contents, flags=re.MULTILINE)
58+
multi_line_match = re.search(multi_line_import_re, file_contents, flags=re.MULTILINE)
59+
if same_line_match or multi_line_match:
60+
raise ValueError(
61+
f"A legacy mypy type is being imported in file {file_path}. "
62+
f"Disallowed imports from the typing library are: {comma_separated_types}"
63+
)
64+
65+
66+
if __name__ == "__main__":
67+
file_relative_paths = sys.argv[1:]
68+
discover_legacy_imports(file_relative_paths)

pyproject.toml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,7 @@ extra_checks = true
6464

6565
[tool.ruff]
6666
include = ["*.py", "pyproject.toml", "*.ipynb"]
67+
exclude = ["mypy_disallow_legacy_types.py"]
6768
line-length = 119
6869

6970
[tool.ruff.format]
@@ -97,6 +98,8 @@ ignore = [
9798
"E501", # line too long
9899
"D203", # 1 blank line required before class docstring
99100
"D213", # Multi-line docstring summary should start at the second line
101+
"D100", # Ignore module level docstrings requirement
102+
"D104", # Ignore package level docstrings requirement
100103
"PLR2004", # Replace magic number with named constant
101104
"PLR0913", # Too many arguments
102105
"COM812", # Missing trailing comma
@@ -111,10 +114,9 @@ ignore-names = ["X*", "setUp"]
111114

112115
[tool.ruff.lint.isort]
113116
lines-after-imports = 2
114-
line_length = 119
115117

116118
[tool.ruff.lint.pydocstyle]
117-
convention = "numpy"
119+
convention = "google"
118120

119121
[tool.ruff.lint.pycodestyle]
120122
max-doc-length = 119

src/midst_toolkit/__init__.py

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,7 @@
1-
"""Top level module."""
2-
3-
41
def hello() -> str:
5-
"""UV's hello world.
2+
"""Hello function.
63
7-
Returns
8-
-------
9-
str: A friendly hello.
4+
Returns:
5+
str: Hello world
106
"""
117
return "Hello from midst-toolkit!"

src/midst_toolkit/bar.py

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,10 @@
1-
"""bar module."""
2-
3-
41
def bar(foo: str) -> str:
5-
"""Return input concatenated with 'bar'.
6-
7-
Parameters
8-
----------
9-
foo : str
10-
Input string to be concatenated with 'bar'.
11-
12-
Returns
13-
-------
14-
str
15-
Concatenated string.
2+
"""Bar function.
163
17-
Examples
18-
--------
19-
>>> bar("foo")
20-
'barfoo'
21-
>>> bar("baz")
22-
'barbaz'
4+
Args:
5+
foo (str): Foo string
236
7+
Returns:
8+
str: A modified string
249
"""
2510
return f"bar{foo}"

src/midst_toolkit/foo.py

Lines changed: 5 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,10 @@
1-
"""foo module."""
2-
3-
41
def foo(bar: str) -> str:
5-
"""Return input concatenated with 'foo'.
6-
7-
Parameters
8-
----------
9-
bar : str
10-
Input string to be concatenated with 'foo'.
11-
12-
Returns
13-
-------
14-
str
15-
Concatenated string.
2+
"""Foo function.
163
17-
Examples
18-
--------
19-
>>> foo("bar")
20-
'foobar'
21-
>>> foo("baz")
22-
'foobaz'
4+
Args:
5+
bar (str): Bar string
236
7+
Returns:
8+
str: String
249
"""
2510
return f"foo{bar}"

0 commit comments

Comments
 (0)