Skip to content

Commit e1d69bd

Browse files
committed
initial commit
0 parents  commit e1d69bd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+13374
-0
lines changed

.cursorrules

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# AGENTS.md
2+
3+
Problem definition → small, safe change → change review → refactor — repeat the loop.
4+
5+
## Mandatory Rules
6+
7+
- Before changing anything, read the relevant files end to end, including all call/reference paths.
8+
- Keep tasks, commits, and PRs small.
9+
- If you make assumptions, record them in the Issue/PR/ADR.
10+
- Never commit or log secrets; validate all inputs and encode/normalize outputs.
11+
- Avoid premature abstraction and use intention-revealing names.
12+
- Compare at least two options before deciding.
13+
14+
## Mindset
15+
16+
- Think like a senior engineer.
17+
- Don’t jump in on guesses or rush to conclusions.
18+
- Always evaluate multiple approaches; write one line each for pros/cons/risks, then choose the simplest solution.
19+
20+
## Code & File Reference Rules
21+
22+
- Read files thoroughly from start to finish (no partial reads).
23+
- Before changing code, locate and read definitions, references, call sites, related tests, docs/config/flags.
24+
- Do not change code without having read the entire file.
25+
- Before modifying a symbol, run a global search to understand pre/postconditions and leave a 1–3 line impact note.
26+
27+
## Required Coding Rules
28+
29+
- Before coding, write a Problem 1-Pager: Context / Problem / Goal / Non-Goals / Constraints.
30+
- Enforce limits: file ≤ 300 LOC, function ≤ 50 LOC, parameters ≤ 5, cyclomatic complexity ≤ 10. If exceeded, split/refactor.
31+
- Prefer explicit code; no hidden “magic.”
32+
- Follow DRY, but avoid premature abstraction.
33+
- Isolate side effects (I/O, network, global state) at the boundary layer.
34+
- Catch only specific exceptions and present clear user-facing messages.
35+
- Use structured logging and do not log sensitive data (propagate request/correlation IDs when possible).
36+
- Account for time zones and DST.
37+
38+
## Testing Rules
39+
40+
- New code requires new tests; bug fixes must include a regression test (write it to fail first).
41+
- Tests must be deterministic and independent; replace external systems with fakes/contract tests.
42+
- Include ≥1 happy path and ≥1 failure path in e2e tests.
43+
- Proactively assess risks from concurrency/locks/retries (duplication, deadlocks, etc.).
44+
45+
## Security Rules
46+
47+
- Never leave secrets in code/logs/tickets.
48+
- Validate, normalize, and encode inputs; use parameterized operations.
49+
- Apply the Principle of Least Privilege.
50+
51+
## Clean Code Rules
52+
53+
- Use intention-revealing names.
54+
- Each function should do one thing.
55+
- Keep side effects at the boundary.
56+
- Prefer guard clauses first.
57+
- Symbolize constants (no hardcoding).
58+
- Structure code as Input → Process → Return.
59+
- Report failures with specific errors/messages.
60+
- Make tests serve as usage examples; include boundary and failure cases.
61+
62+
## Anti-Pattern Rules
63+
64+
- Don’t modify code without reading the whole context.
65+
- Don’t expose secrets.
66+
- Don’t ignore failures or warnings.
67+
- Don’t introduce unjustified optimization or abstraction.
68+
- Don’t overuse broad exceptions.
69+
70+
## Other rules
71+
- 5_audiodiarization/ and movie/ contain the original codebase for audio visual character-aware audiovisual subtitling.
72+
- Our goal is to refactor the code, make it faster and remove unnecessary functions.
73+
- You could refer to images/ which explains the overview of this pipeline.
74+
- Try as much as possible for efficient data processing / loading.

.github/workflows/tests.yml

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [ main, develop ]
6+
pull_request:
7+
branches: [ main, develop ]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
matrix:
14+
python-version: ["3.8", "3.9", "3.10"]
15+
16+
steps:
17+
- uses: actions/checkout@v3
18+
19+
- name: Set up Python ${{ matrix.python-version }}
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: ${{ matrix.python-version }}
23+
24+
- name: Cache pip packages
25+
uses: actions/cache@v3
26+
with:
27+
path: ~/.cache/pip
28+
key: ${{ runner.os }}-pip-${{ hashFiles('requirements.txt') }}
29+
restore-keys: |
30+
${{ runner.os }}-pip-
31+
32+
- name: Install system dependencies
33+
run: |
34+
sudo apt-get update
35+
sudo apt-get install -y ffmpeg
36+
37+
- name: Install Python dependencies
38+
run: |
39+
python -m pip install --upgrade pip
40+
pip install -r requirements.txt
41+
pip install pytest pytest-cov pytest-xdist
42+
43+
- name: Run unit tests
44+
run: |
45+
pytest tests/unit -v --cov=pipeline --cov=utils --cov-report=xml --cov-report=term-missing
46+
47+
- name: Run integration tests
48+
run: |
49+
pytest tests/integration -v -m "not slow"
50+
51+
- name: Upload coverage reports
52+
uses: codecov/codecov-action@v3
53+
with:
54+
file: ./coverage.xml
55+
flags: unittests
56+
name: codecov-umbrella
57+
58+
lint:
59+
runs-on: ubuntu-latest
60+
61+
steps:
62+
- uses: actions/checkout@v3
63+
64+
- name: Set up Python
65+
uses: actions/setup-python@v4
66+
with:
67+
python-version: "3.9"
68+
69+
- name: Install linting tools
70+
run: |
71+
python -m pip install --upgrade pip
72+
pip install flake8 black isort mypy
73+
74+
- name: Run flake8
75+
run: |
76+
flake8 pipeline utils --max-line-length=100 --extend-ignore=E203,W503
77+
78+
- name: Check black formatting
79+
run: |
80+
black --check pipeline utils tests
81+
82+
- name: Check import sorting
83+
run: |
84+
isort --check-only pipeline utils tests
85+
86+
- name: Run mypy type checking
87+
run: |
88+
mypy pipeline utils --ignore-missing-imports
89+
90+
docs:
91+
runs-on: ubuntu-latest
92+
93+
steps:
94+
- uses: actions/checkout@v3
95+
96+
- name: Set up Python
97+
uses: actions/setup-python@v4
98+
with:
99+
python-version: "3.9"
100+
101+
- name: Install documentation tools
102+
run: |
103+
python -m pip install --upgrade pip
104+
pip install sphinx sphinx-rtd-theme
105+
106+
- name: Build documentation
107+
run: |
108+
cd docs || mkdir docs
109+
echo "Documentation build would go here"
110+
111+
security:
112+
runs-on: ubuntu-latest
113+
114+
steps:
115+
- uses: actions/checkout@v3
116+
117+
- name: Run Trivy vulnerability scanner
118+
uses: aquasecurity/trivy-action@master
119+
with:
120+
scan-type: 'fs'
121+
scan-ref: '.'
122+
format: 'sarif'
123+
output: 'trivy-results.sarif'
124+
125+
- name: Upload Trivy scan results
126+
uses: github/codeql-action/upload-sarif@v2
127+
with:
128+
sarif_file: 'trivy-results.sarif'

.gitignore

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
images/
17+
.eggs/
18+
lib/
19+
lib64/
20+
parts/
21+
sdist/
22+
var/
23+
wheels/
24+
pip-wheel-metadata/
25+
share/python-wheels/
26+
*.egg-info/
27+
.installed.cfg
28+
*.egg
29+
MANIFEST
30+
31+
# PyInstaller
32+
*.manifest
33+
*.spec
34+
35+
# Installer logs
36+
pip-log.txt
37+
pip-delete-this-directory.txt
38+
39+
# Unit test / coverage reports
40+
htmlcov/
41+
.tox/
42+
.nox/
43+
.coverage
44+
.coverage.*
45+
.cache
46+
nosetests.xml
47+
coverage.xml
48+
*.cover
49+
*.py,cover
50+
.hypothesis/
51+
.pytest_cache/
52+
53+
# Translations
54+
*.mo
55+
*.pot
56+
57+
# Django stuff:
58+
*.log
59+
local_settings.py
60+
db.sqlite3
61+
db.sqlite3-journal
62+
63+
# Flask stuff:
64+
instance/
65+
.webassets-cache
66+
67+
# Scrapy stuff:
68+
.scrapy
69+
70+
# Sphinx documentation
71+
docs/_build/
72+
73+
# PyBuilder
74+
target/
75+
76+
# Jupyter Notebook
77+
.ipynb_checkpoints
78+
79+
# IPython
80+
profile_default/
81+
ipython_config.py
82+
83+
# pyenv
84+
.python-version
85+
86+
# pipenv
87+
Pipfile.lock
88+
89+
# PEP 582
90+
__pypackages__/
91+
92+
# Celery stuff
93+
celerybeat-schedule
94+
celerybeat.pid
95+
96+
# SageMath parsed files
97+
*.sage.py
98+
99+
# Environments
100+
.env
101+
.venv
102+
env/
103+
venv/
104+
ENV/
105+
env.bak/
106+
venv.bak/
107+
108+
# Spyder project settings
109+
.spyderproject
110+
.spyproject
111+
112+
# Rope project settings
113+
.ropeproject
114+
115+
# mkdocs documentation
116+
/site
117+
118+
# mypy
119+
.mypy_cache/
120+
.dmypy.json
121+
dmypy.json
122+
123+
# Pyre type checker
124+
.pyre/
125+
126+
# Project specific
127+
data/intermediate/
128+
data/temp/
129+
data/output/
130+
logs/
131+
*.pkl
132+
*.h5
133+
*.hdf5
134+
*.mp4
135+
*.wav
136+
*.avi
137+
*.mov
138+
*.pth
139+
*.pt
140+
141+
# IDE
142+
.vscode/
143+
.idea/
144+
*.swp
145+
*.swo
146+
*~
147+
148+
# OS
149+
.DS_Store
150+
Thumbs.db
151+
152+
# Temporary files
153+
tmp/
154+
temp/
155+
*.tmp
156+
*.bak
157+
158+
# Model weights (too large for git)
159+
models/
160+
weights/
161+
162+
163+
# Personal notes
164+
notes.txt
165+
TODO.txt
166+
167+
# Credentials
168+
*.key
169+
*.pem
170+
171+
movie/
172+
1/
173+
exp/
174+
.claude/
175+
5_audiodiarization/
176+
sample_data/
177+
exp/

0 commit comments

Comments
 (0)