docs, issue templates, makefile fix

leo-aa88 · leo-aa88 · commit 7c5f4c6dd481 · 2026-03-16T13:18:25.000-03:00
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -0,0 +1,75 @@
+name: Bug report
+description: Something is broken or producing wrong output
+labels: ["bug"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Thanks for taking the time to report a bug.
+        The most useful thing you can provide is the exact command you ran and the full output.
+
+  - type: textarea
+    id: command
+    attributes:
+      label: Command
+      description: The exact raglogs command that produced the problem.
+      placeholder: "raglogs compare --since 30m --baseline 24h"
+    validations:
+      required: true
+
+  - type: textarea
+    id: expected
+    attributes:
+      label: Expected behaviour
+      description: What did you expect to happen?
+    validations:
+      required: true
+
+  - type: textarea
+    id: actual
+    attributes:
+      label: Actual behaviour
+      description: What actually happened? Include the full output, including any error messages or tracebacks.
+      render: text
+    validations:
+      required: true
+
+  - type: textarea
+    id: log_sample
+    attributes:
+      label: Log sample (if relevant)
+      description: |
+        If the bug is about incorrect clustering, normalization, or timeline/compare output,
+        paste a few representative log lines. Redact any sensitive values.
+      render: json
+
+  - type: input
+    id: version
+    attributes:
+      label: raglogs version
+      description: Output of `pip show raglogs` or the git commit hash.
+      placeholder: "0.1.0 / abc1234"
+    validations:
+      required: false
+
+  - type: input
+    id: python
+    attributes:
+      label: Python version
+      description: Output of `python --version`.
+      placeholder: "3.12.3"
+    validations:
+      required: false
+
+  - type: dropdown
+    id: llm
+    attributes:
+      label: LLM provider
+      description: Which LLM provider is configured?
+      options:
+        - disabled (no LLM)
+        - openai
+        - ollama
+        - other OpenAI-compatible endpoint
+    validations:
+      required: false
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: false
+contact_links:
+  - name: Question
+    url: https://github.com/leo-aa88/raglogs/issues/new?labels=question&template=
+    about: Not a bug or feature request? Open a plain issue with the question label.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml
@@ -0,0 +1,55 @@
+name: Feature request
+description: Suggest a new command, flag, adapter, or behaviour
+labels: ["enhancement"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Feature requests are welcome. The most convincing ones include a concrete
+        example of the problem and what the output would look like.
+
+  - type: textarea
+    id: problem
+    attributes:
+      label: What problem does this solve?
+      description: Describe the situation where you hit this limitation.
+      placeholder: "When an incident spans multiple services, I can't easily see..."
+    validations:
+      required: true
+
+  - type: textarea
+    id: proposed
+    attributes:
+      label: Proposed solution
+      description: |
+        What would you want the command, flag, or output to look like?
+        A concrete example is more useful than a general description.
+      placeholder: |
+        raglogs compare --service api --service billing-worker --since 30m --baseline 24h
+
+        Output:
+          ...
+    validations:
+      required: true
+
+  - type: textarea
+    id: alternatives
+    attributes:
+      label: Alternatives considered
+      description: Other approaches you considered, and why they fall short.
+
+  - type: dropdown
+    id: area
+    attributes:
+      label: Area
+      options:
+        - New CLI command
+        - New flag on existing command
+        - Log source adapter (Datadog, Loki, CloudWatch, etc.)
+        - Normalization / clustering
+        - LLM integration
+        - HTTP API
+        - Output format
+        - Other
+    validations:
+      required: true
diff --git a/.github/ISSUE_TEMPLATE/normalization.yml b/.github/ISSUE_TEMPLATE/normalization.yml
@@ -0,0 +1,57 @@
+name: Normalization / clustering issue
+description: Log messages that should cluster together are not, or unrelated messages are being grouped
+labels: ["normalization"]
+body:
+  - type: markdown
+    attributes:
+      value: |
+        Normalization quality directly affects how useful raglogs is.
+        The more concrete examples you provide, the easier this is to fix.
+
+  - type: dropdown
+    id: problem_type
+    attributes:
+      label: Problem type
+      options:
+        - Messages that should cluster together are producing separate clusters
+        - Unrelated messages are being grouped into the same cluster
+        - A dynamic value (ID, IP, timestamp) is not being stripped
+        - A meaningful value is being incorrectly stripped
+    validations:
+      required: true
+
+  - type: textarea
+    id: raw_messages
+    attributes:
+      label: Raw log messages
+      description: Paste the actual log lines as they appear in your logs.
+      render: text
+    validations:
+      required: true
+
+  - type: textarea
+    id: normalized
+    attributes:
+      label: What they normalize to (if known)
+      description: |
+        You can check normalization with:
+        ```python
+        from src.core.normalization.normalize import normalize_message
+        print(normalize_message("your message here"))
+        ```
+      render: text
+
+  - type: textarea
+    id: expected_normalization
+    attributes:
+      label: What they should normalize to
+      description: The normalized form you would expect, so they fingerprint correctly.
+      render: text
+    validations:
+      required: true
+
+  - type: textarea
+    id: context
+    attributes:
+      label: Additional context
+      description: Log format, service type, or anything else that helps.
diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -0,0 +1,30 @@
+# Code of Conduct
+
+## Our commitment
+
+raglogs is a small open-source tool. Everyone who participates — opening issues, submitting pull requests, commenting on discussions — is expected to treat others with basic respect.
+
+We are not trying to build a movement or a community with a charter. We are trying to build useful software. The bar is simple: engage in good faith, focus on the work, and don't be a jerk.
+
+## What this means in practice
+
+**Do:**
+- Critique code, ideas, and approaches directly and honestly
+- Ask questions when something is unclear
+- Disagree — including strongly — on technical grounds
+- Accept that maintainers have final say on direction
+
+**Don't:**
+- Attack people instead of ideas
+- Harass, threaten, or demean anyone
+- Flood issues with off-topic noise
+
+## Scope
+
+This applies to the GitHub repository: issues, pull requests, discussions, and code review comments.
+
+## Enforcement
+
+If something is genuinely harmful — harassment, threats, sustained bad-faith disruption — maintainers will remove the content and, if necessary, block the account. There is no formal appeals process for a project of this size.
+
+For anything that crosses a line, contact the maintainer directly via GitHub.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,144 @@
+# Contributing to raglogs
+
+raglogs is an incident analysis CLI tool. Contributions are welcome — bug fixes, new log adapters, normalization improvements, and documentation are all useful.
+
+This document covers how to get set up, what the codebase expects, and how to submit changes.
+
+---
+
+## Getting started
+
+```bash
+git clone https://github.com/leo-aa88/raglogs
+cd raglogs
+python -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt && pip install -e .
+cp .env.example .env
+# Edit .env — set RAGLOGS_DB_URL at minimum
+docker compose up postgres -d
+raglogs init
+```
+
+Run the demo to verify everything works:
+
+```bash
+raglogs demo
+raglogs timeline --since 2h
+raglogs compare --since 2h --baseline 24h
+```
+
+---
+
+## Project structure
+
+```
+src/core/
+  clustering/     Fingerprint grouping, importance scoring, baseline comparison
+  compare/        Window diffing — new, disappeared, increased, decreased
+  explain/        Evidence assembly, confidence, templates, summarizer
+  ingestion/      Ingestion orchestration and batch persistence
+  llm/            Provider abstraction (OpenAI, Ollama, noop)
+  normalization/  Message normalization, fingerprinting, trigger patterns
+  parsing/        JSON and text parsers, field alias resolution
+  retrieval/      Keyword-based question answering
+  timeline/       Causal timeline reconstruction
+src/cli/commands/ One file per CLI command
+src/api/routes/   FastAPI route handlers
+src/db/           SQLAlchemy models and session management
+src/utils/        Time window parsing, hashing helpers
+tests/unit/       Pure unit tests — no database required
+tests/integration/ Full pipeline tests — require running Postgres
+```
+
+The pipeline flows: ingest → normalize → fingerprint → cluster → baseline compare → evidence assembly → explain / timeline / compare.
+
+---
+
+## Running tests
+
+Unit tests require no database:
+
+```bash
+make test-unit
+# or
+pytest tests/unit/
+```
+
+Integration tests require a running Postgres instance:
+
+```bash
+docker compose up postgres -d
+make test-int
+# or
+pytest tests/integration/
+```
+
+All tests must pass before a pull request will be merged. If you are adding new functionality, add tests for it.
+
+---
+
+## Code style
+
+- Python 3.11+
+- Type annotations on all function signatures
+- `ruff` for linting, `black` for formatting
+
+```bash
+make lint
+make format
+```
+
+No hard rules on line length beyond what `black` enforces. Prefer explicit over clever. Avoid abstractions that exist only to save lines.
+
+---
+
+## Areas where contributions are most useful
+
+**Normalization improvements**
+
+The normalization step (`src/core/normalization/patterns.py`) determines clustering quality. If you have real-world log formats that normalize poorly — producing too many clusters for the same error, or collapsing unrelated errors — a fix there has high leverage. Include before/after examples and a test in `tests/unit/test_normalization.py`.
+
+**Log source adapters**
+
+New adapters go in `src/adapters/`. Each adapter yields `ParsedLogLine` objects. The rest of the pipeline is fully source-agnostic. Useful adapters: Datadog, Loki, Kubernetes pod logs, CloudWatch.
+
+**Trigger patterns**
+
+New trigger event patterns go in `TRIGGER_PATTERNS` in `src/core/normalization/patterns.py`. A good trigger pattern is specific enough to avoid false positives and general enough to match common variants across log formats.
+
+**Bug fixes**
+
+Check the issue tracker. Bugs with a reproducing log sample are easiest to fix.
+
+---
+
+## Submitting a pull request
+
+1. Fork the repository and create a branch from `main`
+2. Make your changes with tests
+3. Run `make lint` and `make test-unit` — both must pass
+4. Open a pull request with a clear description of what changed and why
+5. Reference any related issue
+
+Pull requests that are purely cosmetic (reformatting with no functional change) will not be merged.
+
+Keep pull requests focused. A PR that fixes a bug and adds an unrelated feature is harder to review and slower to merge than two separate PRs.
+
+---
+
+## Commit messages
+
+No strict format required. Be clear about what changed and why. One-line messages are fine for small changes. For anything non-trivial:
+
+```
+Short summary (under 72 chars)
+
+Longer explanation of what changed and why, if not obvious from the
+diff. Reference the issue number if applicable.
+```
+
+---
+
+## Questions
+
+Open a GitHub issue with the `question` label. If it is a quick question about whether a contribution would be accepted before you invest time in it, that is a good use of an issue.
diff --git a/Makefile b/Makefile
@@ -69,6 +69,7 @@ demo: db-up
 	alembic upgrade head
 	raglogs demo
 	raglogs timeline --since 2h
+	raglogs compare --since 30m --baseline 24h
 
 ingest:
 	raglogs ingest $(SAMPLE)