Skip to content

Commit bbdcc96

Browse files
authored
[#10] Add Thai language alignment support (#34)
* [ghi-10] Add Thai language alignment support Add Thai (th) to supported alignment languages using the airesearch/wav2vec2-large-xlsr-53-th model. The model is loaded on-demand via whisperx.load_align_model's model_name parameter, downloaded automatically on first Thai transcription. * [ghi-10] Add implementation plan for Thai alignment * [ghi-10] Add YAML frontmatter to transcription quality spec Add status, story_ids, and last_verified fields to enable automated spec drift detection. * #10 Update commit prefix to use GitHub issue format * Add .coverage to gitignore * Update Argus framework to feature/58-spec-drift-detection * Add vocabulary and post-processing quality spec Companion spec to transcription-quality-improvements.md covering persistent custom vocabulary, phonetic post-processing correction, optional local LLM correction, and Thai text normalization. * Fix ruff formatting in test_transcriber.py
1 parent 968f0b9 commit bbdcc96

22 files changed

+788
-73
lines changed

.argus/config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# PM Tool
55
pm_tool: github_issues
6-
commit_prefix: ghi
6+
commit_prefix: "#"
77
github_issue_types_supported: false
88
# github_repo: owner/repo # Optional — auto-detected from git remote origin
99
# github_toolsets: # Default: [issues, users]

.claude/agents/architect.md

Lines changed: 106 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,91 @@ Read `.argus/codebase/CONCERNS.md` if it exists.
6565

6666

6767
**Layer 1B — Stack-Specific Best Practices:**
68+
# JavaScript Best Practices
69+
70+
## Architecture
71+
72+
- Organize code into clear directories: `adapters/` (network), `components/` (UI), `helpers/` (utilities), `screens/` (pages), `services/` (business logic), `config/` (environment)
73+
- Create one adapter per API resource — extend a base adapter for shared config (headers, auth)
74+
- Use the constructor pattern for components: `_bind()`, `_setup()`, `_addEventListeners()` as private methods
75+
- Create one initializer file per component, export all through `initializers/index.js`
76+
- Manifest files include only initializers and screens (in that order)
77+
- Use ES6 classes over prototype-based patterns
78+
- Use `const` by default, `let` when reassignment is needed, never `var`
79+
- Prefer strict equality (`===` / `!==`) everywhere — only use `==` for intentional `null`/`undefined` coalescing
80+
- Use optional chaining (`?.`) and nullish coalescing (`??`) instead of manual truthy checks or lodash `_.get`
81+
- Use `structuredClone()` for deep copies — avoid JSON parse/stringify round-trips or spread-based shallow clones
82+
- Keep functions small and single-purpose — extract helpers when a function exceeds ~30 lines
83+
- Use early returns to reduce nesting depth and improve readability
84+
- Prefer named exports over default exports for better refactoring and tree-shaking support
85+
- Use native `#private` class fields over underscore conventions when runtime privacy is needed
86+
87+
## Performance
88+
89+
- Favor `map()`, `filter()`, `reduce()` over traditional `for` loops for clarity and optimization
90+
- Use template literals over string concatenation for readability and performance
91+
- Use arrow functions for callbacks and function expressions
92+
- Import only what is needed — avoid importing entire modules when destructured imports are available
93+
- Group imports by: built-in modules, external packages, internal modules (alphabetical within each group)
94+
- Never block the event loop — offload CPU-intensive work to Web Workers or worker threads in Node.js
95+
- Minimize DOM reads/writes and batch DOM mutations to avoid layout thrashing
96+
- Use `requestAnimationFrame` for visual updates instead of `setTimeout`/`setInterval`
97+
- Avoid memory leaks: remove event listeners on teardown, nullify references in closures, use `WeakRef`/`WeakMap` for caches
98+
- Use `AbortController` to cancel in-flight fetch requests and avoid stale responses
99+
- Prefer `for...of` over `forEach` when early exit (`break`/`return`) is needed
100+
- Debounce or throttle high-frequency events (scroll, resize, input) to limit unnecessary computation
101+
102+
## Security
103+
104+
- Use strict equality (`===` and `!==`) except when intentionally comparing against `null`/`undefined`
105+
- Sanitize all user input before DOM insertion — never use `innerHTML` with untrusted content
106+
- Validate and sanitize URL inputs before using in redirects or link hrefs
107+
- Store secrets in environment variables, never in client-side code
108+
- Use Content Security Policy headers to prevent XSS attacks
109+
- Prevent prototype pollution: freeze prototypes on critical objects, validate JSON keys, avoid recursive merge without safeguards
110+
- Write ReDoS-safe regular expressions — avoid nested quantifiers and unbounded repetition on overlapping patterns
111+
- Never use `eval()`, `Function()` constructor, or `setTimeout`/`setInterval` with string arguments
112+
- Validate and sanitize all server-side inputs — never trust client-side validation alone
113+
- Use `Object.create(null)` for lookup maps to avoid prototype chain interference
114+
- Set `HttpOnly`, `Secure`, and `SameSite` flags on cookies to mitigate XSS and CSRF
115+
- Pin dependency versions and audit regularly with `npm audit` to catch known vulnerabilities
116+
117+
## Testing
118+
119+
- Use Cypress for E2E testing with TypeScript support
120+
- Configure `baseUrl` in Cypress config to avoid repeating URLs across tests
121+
- Name test files with `.spec.ts` suffix in camelCase
122+
- Use `data-test-id` attributes for selectors instead of CSS classes or HTML structure
123+
- Use `cy.intercept()` (v6.0+) to stub network requests with fixtures
124+
- Create reusable custom commands for repeated test workflows
125+
- Write `describe` blocks with the filename, `context` blocks with "when/given", and `it` blocks in imperative mood without "should"
126+
- Test async code with `async`/`await` — avoid `.then()` chains in tests for readability
127+
- Mock timers (`jest.useFakeTimers` or `vi.useFakeTimers`) when testing `setTimeout`/`setInterval` logic
128+
- Test error paths and edge cases, not just happy paths — verify thrown errors, rejected promises, and boundary inputs
129+
- Keep tests isolated: no shared mutable state between test cases, reset mocks in `beforeEach`/`afterEach`
130+
- Aim for deterministic tests — avoid reliance on real time, network, or random values
131+
132+
## Common Anti-Patterns
133+
134+
- `innerHTML with user content` → Use textContent or sanitize with DOMPurify (Critical)
135+
- `unhandled promise rejections` → Always attach `.catch()` or use `try`/`catch` in async functions — unhandled rejections crash Node.js (Critical)
136+
- `eval or Function constructor` → Use safe alternatives like JSON.parse or AST-based transforms (Critical)
137+
- `unbounded recursive merge on user input` → Validate keys and use `Object.create(null)` to prevent prototype pollution (Critical)
138+
- `var declarations` → Use `const` or `let` for block scoping and clarity (Major)
139+
- `== for equality checks` → Use `===` to avoid type coercion bugs (Major)
140+
- `floating promises` → Always `await` or return promises — a missing `await` silently swallows errors (Major)
141+
- `callback hell` → Refactor nested callbacks to async/await or named functions (Major)
142+
- `memory leaks from unremoved listeners` → Remove event listeners in cleanup/teardown lifecycle hooks (Major)
143+
- `network calls in components` → Extract to adapter classes with base adapter pattern (Major)
144+
- `string concatenation with +` → Use template literals for readability (Minor)
145+
- `for loops for array transformation` → Use `map`, `filter`, `reduce` for declarative code (Minor)
146+
- `typeof null === 'object'` → Guard with explicit `value !== null` check before typeof (Minor)
147+
- `CSS/HTML selectors in tests` → Use data-test-id attributes for resilient test selectors (Minor)
148+
- `missing trailing commas` → Add trailing commas in multi-line arrays/objects for cleaner diffs (Nit)
149+
- `_singleUnderscore private without enforcement` → Document intent but consider WeakMap or # private fields (Nit)
150+
- `missing semicolons` → Include semicolons at the end of each statement (Nit)
151+
152+
68153
# Python Best Practices
69154

70155
## Architecture
@@ -147,7 +232,7 @@ Read `.argus/conventions.md` if it exists.
147232
Read the spec file referenced in the story.
148233

149234
### Project Configuration
150-
- Stacks: python
235+
- Stacks: javascript, python
151236
- Test coverage target: 80%
152237

153238
## Behavior
@@ -157,26 +242,30 @@ Read the spec file referenced in the story.
157242
1. Read all changed files in the feature branch
158243
2. Load all four convention layers
159244
3. For each changed file, check against conventions from Layer 1 through Layer 4
160-
4. Categorize each finding by severity
161-
5. For Critical and Major issues: fix directly on the branch
162-
6. For Minor issues: fix if the fix is unambiguous, comment otherwise
163-
7. For Nit issues: comment only
164-
8. When layers conflict, follow the higher layer and flag the conflict
245+
4. Categorize each finding by severity and act per the "Review Severity Levels" table below
246+
5. Before making any commits, follow the "Commit Rules" below
247+
6. When layers conflict, follow the higher layer and flag the conflict
165248

166249
## Review Severity Levels
167250

168251
| Severity | Behavior | Examples |
169-
|----------|----------|----------|
170-
| **Critical** | Blocks merge. Fix automatically if possible, otherwise flag for human. | Security vulnerabilities, data loss risks, broken functionality |
171-
| **Major** | Fix automatically. | Convention violations, performance anti-patterns, missing error handling |
172-
| **Minor** | Fix automatically if trivial, comment otherwise. | Code organization, naming suggestions, minor style |
173-
| **Nit** | Comment only. Do not fix. | Style preferences, alternative approaches, cosmetic |
174-
175-
### Rules
176-
- Critical issues must be resolved before PR can merge
177-
- Major issues are fixed directly on the branch (push commits)
178-
- Minor issues are fixed only when the fix is unambiguous
179-
- Nit comments are informational — never block merge or auto-fix
252+
| -------- | -------- | -------- |
253+
| **Critical** | Blocks merge until resolved. Auto-fix if possible, otherwise flag for human. | Security vulnerabilities, data loss risks, broken functionality |
254+
| **Major** | Auto-fix. | Convention violations, performance anti-patterns, missing error handling |
255+
| **Minor** | Auto-fix if unambiguous, comment otherwise. | Code organization, naming suggestions, minor style |
256+
| **Nit** | Comment only. Never blocks merge or auto-fix. | Style preferences, alternative approaches, cosmetic |
257+
258+
> All auto-fixes are pushed as commits directly to the branch, following the "Commit Rules".
259+
260+
261+
## Commit Rules
262+
263+
- Prefix all commits with `[ghi-{story-id}]`
264+
- Present tense, capitalize first word, no period
265+
- Atomic commits — one logical change per commit
266+
- If you find yourself using "and" in the message, split into separate commits
267+
- Single-line title only — no body, no description
268+
- Never add `Co-Authored-By` trailers to commits
180269

181270

182271
### Conflict Resolution

.claude/agents/business-analyst.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ Read `.argus/codebase/ARCHITECTURE.md` if it exists.
3030
Read `.argus/codebase/INTEGRATIONS.md` if it exists.
3131

3232
### Project Configuration
33-
- PM tool: github_projects
34-
- Stacks: python
33+
- PM tool: github_issues
34+
- Stacks: javascript, python
3535

3636
## Behavior
3737

.claude/agents/codebase-mapper.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ You are a Codebase Mapper agent. Your job is to analyze the existing codebase fo
1313
## Context
1414

1515
### Project Configuration
16-
- Stacks: python
16+
- Stacks: javascript, python
1717

1818
## Behavior
1919

.claude/agents/debug.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Read `.argus/codebase/ARCHITECTURE.md` if it exists.
2121
Read `.argus/codebase/STRUCTURE.md` if it exists.
2222

2323
### Project Configuration
24-
- Stacks: python
24+
- Stacks: javascript, python
2525

2626
## Guardrails
2727

.claude/agents/product-manager.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,8 @@ Read the spec file referenced in the story.
3838
Read the spec file referenced in the story.
3939

4040
### Project Configuration
41-
- PM tool: github_projects
42-
- Commit prefix: gh
41+
- PM tool: github_issues
42+
- Commit prefix: ghi
4343

4444
### PM Tool Hierarchy Mapping
4545

@@ -49,6 +49,7 @@ Read the spec file referenced in the story.
4949
| Linear | Project || Issue |
5050
| Jira | Epic || Story |
5151
| GitHub Projects | Milestone || Issue |
52+
| GitHub Issues | Milestone || Issue |
5253

5354
**GitHub Projects specifics:**
5455
- Two-level hierarchy only: Milestones (Initiatives) and Issues (User Stories)
@@ -57,6 +58,18 @@ Read the spec file referenced in the story.
5758
- Use the GitHub MCP server for all GitHub Projects operations (structured tool calls are more token-efficient than CLI output parsing)
5859
- Fall back to `gh` CLI via Bash only for operations not covered by MCP tools (e.g., complex GraphQL queries via `gh api graphql`)
5960

61+
**GitHub Issues specifics:**
62+
- Two-level hierarchy only: Milestones (Initiatives) and Issues (User Stories)
63+
- No status workflow tracking — issues are simply open or closed
64+
- No Project board required — uses plain GitHub Issues with Milestones for grouping
65+
- Use the GitHub MCP server for all GitHub Issues operations (toolsets: `issues`, `users`)
66+
- Fall back to `gh` CLI via Bash only for operations not covered by MCP tools
67+
- **Milestones:** Before creating issues, check existing milestones via `gh api repos/{owner}/{repo}/milestones`. Reuse a Milestone if one with the same name exists — never create duplicates. Create new ones via `gh api repos/{owner}/{repo}/milestones -f title="..." -f description="..."`
68+
- **Categorization:** Check `github_issue_types_supported` in `.argus/config.yml`:
69+
- When `true`: set the Issue Type — Argus "feature" → **Feature**, "bug" → **Bug**, "chore" → **Task**. If type assignment fails at runtime, fall back to the corresponding label instead
70+
- When `false` (or not set): apply labels (`feature`, `bug`, `chore`). Ensure the label exists before applying; create it if missing
71+
- **Issue creation:** Assign each issue to its Milestone, apply the Issue Type or label, and reference dependencies as cross-references (`#N`)
72+
6073

6174
## Behavior
6275

.claude/agents/product-researcher.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ The BA agent provides:
2121
- Any relevant context from the user interview
2222

2323
### Project Configuration
24-
- PM tool: github_projects
25-
- Stacks: python
24+
- PM tool: github_issues
25+
- Stacks: javascript, python
2626

2727
## Behavior
2828

.claude/agents/quality-assurance.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Read `.argus/codebase/TESTING.md` if it exists.
3838
Read `.argus/codebase/CONVENTIONS.md` if it exists.
3939

4040
### Project Configuration
41-
- Stacks: python
41+
- Stacks: javascript, python
4242
- Test coverage target: 80%
4343

4444
## Tools

0 commit comments

Comments
 (0)