test: add comprehensive tests for catalog extra fields handling (#87)

jontsai · web-flow · commit 50140dd68054 · 2025-12-17T09:34:55.000-08:00
## Summary

Add test suite to verify that the `Metadata` class correctly handles extra fields from dbt, preventing job failures when dbt adds new fields.

## Motivation

As discussed in the previous PR, dbt frequently adds new fields to their schema (like `invocation_started_at`). With `extra="allow"` already merged to main, we now need tests to:
1. Document the expected behavior
2. Prevent regressions if someone changes it back to `extra="forbid"`
3. Validate that extra fields work correctly

## Changes

### Tests Added (`tests/test_vendor/test_catalog_v1.py`)
- ✅ Extra fields are accepted without validation errors
- ✅ Extra fields are stored in `__pydantic_extra__`
- ✅ `model_dump()` includes extra fields
- ✅ Works with no extra fields (backwards compatible)
- ✅ Works with only extra fields
- ✅ Specific case of `invocation_started_at` field

### Developer Experience Improvements
- **Makefile added** with common targets (`make test`, `make test-vendor`, `make test-cov`, etc.)
- **CONTRIBUTING.rst updated** with comprehensive test running instructions
- **Fixed import conflicts** by renaming `tests/vendor` to `tests/test_vendor`

## Test Results

All 6 tests pass ✅

**Red/Green verification:**
- With `extra="forbid"`: 5 tests fail with clear error messages ❌
- With `extra="allow"`: All tests pass ✅

## Coverage

- **catalog_v1.py coverage:** 100%
- **Overall project coverage:** 88.74% (unchanged)

## CI Integration

Tests will automatically run in GitHub Actions across:
- Python 3.10, 3.11, 3.12, PyPy 3.9
- Pydantic 2.8, 2.10

## Checklist

- [x] Tests added and passing
- [x] Red/Green verification performed
- [x] Documentation added (CONTRIBUTING.rst)
- [x] Developer tooling improved (Makefile)
- [x] No breaking changes
- [x] Minimal, focused tests (no bloat)
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -73,8 +73,80 @@ For merging, you should:
 3. Add a note to ``CHANGELOG.rst`` about the changes.
 4. Add yourself to ``AUTHORS.rst``.
 
-Tips
-----
+Running Tests
+-------------
+
+Quick Start (Using Make)
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+The easiest way to run tests::
+
+    # First time setup - create virtual environment
+    make venv
+    source .venv/bin/activate
+
+    # Install dependencies
+    make install
+
+    # Run all tests
+    make test
+
+    # Run just the catalog vendor tests
+    make test-vendor
+
+    # Run tests with coverage
+    make test-cov
+
+Manual Test Commands
+~~~~~~~~~~~~~~~~~~~~
+
+If you prefer to run pytest directly::
+
+    # Activate virtual environment
+    source .venv/bin/activate
+
+    # Run catalog extra fields tests
+    python -m pytest tests/test_vendor/test_catalog_v1.py -v
+
+All Test Commands
+~~~~~~~~~~~~~~~~~
+
+::
+
+    # Run all catalog vendor tests
+    python -m pytest tests/test_vendor/ -v
+
+    # Run specific test file
+    python -m pytest tests/test_vendor/test_catalog_v1.py -v
+
+    # Run specific test class
+    python -m pytest tests/test_vendor/test_catalog_v1.py::TestMetadataExtraFields -v
+
+    # Run specific test method
+    python -m pytest tests/test_vendor/test_catalog_v1.py::TestMetadataExtraFields::test_metadata_accepts_extra_fields -v
+
+    # Run with more verbose output
+    python -m pytest tests/test_vendor/test_catalog_v1.py -vv
+
+    # Run and show print statements
+    python -m pytest tests/test_vendor/test_catalog_v1.py -v -s
+
+    # Run all tests in the project
+    python -m pytest tests/ -v
+
+Using tox
+~~~~~~~~~
+
+The GitHub Actions CI uses tox to run tests across multiple Python and Pydantic versions::
+
+    # Run tests with Python 3.10 and Pydantic 2.10 (no coverage)
+    python3 -m tox -e py310-pydantic210-nocov
+
+    # Run tests with coverage
+    python3 -m tox -e py310-pydantic210-cover
+
+    # Run specific tests with tox
+    python3 -m tox -e py310-pydantic210-nocov -- tests/test_vendor/test_catalog_v1.py
 
 To run a subset of tests::
 
@@ -83,3 +155,14 @@ To run a subset of tests::
 To run all the test environments in *parallel*::
 
     tox -p auto
+
+Continuous Integration
+~~~~~~~~~~~~~~~~~~~~~~
+
+Tests run automatically on every push and pull request via GitHub Actions (``.github/workflows/github-actions.yml``).
+
+The CI runs tests across:
+
+* Python versions: 3.10, 3.11, 3.12, PyPy 3.9
+* Pydantic versions: 2.8, 2.10
+* With and without coverage reports
diff --git a/Makefile b/Makefile
@@ -0,0 +1,54 @@
+.PHONY: help venv install test test-vendor test-cov test-all clean lint format
+
+## help - Display help about make targets for this Makefile
+help:
+	@cat Makefile | grep '^## ' --color=never | cut -c4- | sed -e "`printf 's/ - /\t- /;'`" | column -s "`printf '\t'`" -t
+
+## venv - Create virtual environment
+venv:
+	python3 -m venv .venv
+	.venv/bin/pip install --upgrade pip
+	@echo ""
+	@echo "Virtual environment created. Activate with:"
+	@echo "  source .venv/bin/activate"
+
+## install - Install package and dependencies in development mode
+install:
+	pip install -e .
+	pip install pytest pytest-cov tox pre-commit ruff
+
+## test - Run tests quickly
+test:
+	python -m pytest tests/ -v
+
+## test-vendor - Run catalog vendor tests
+test-vendor:
+	python -m pytest tests/test_vendor/test_catalog_v1.py -v
+
+## test-cov - Run tests with coverage report
+test-cov:
+	python -m pytest --cov=src --cov-report=term-missing --cov-report=html tests/ -v
+
+## test-all - Run full test suite with tox (all Python/Pydantic versions)
+test-all:
+	tox
+
+## lint - Run code quality checks
+lint:
+	pre-commit run --all-files
+
+## format - Format code with ruff
+format:
+	ruff format src/ tests/
+
+## clean - Remove build artifacts and cache
+clean:
+	rm -rf build/
+	rm -rf dist/
+	rm -rf *.egg-info
+	rm -rf .pytest_cache/
+	rm -rf .tox/
+	rm -rf htmlcov/
+	rm -rf .coverage
+	find . -type d -name __pycache__ -exec rm -rf {} +
+	find . -type f -name '*.pyc' -delete
diff --git a/tests/test_vendor/__init__.py b/tests/test_vendor/__init__.py
diff --git a/tests/test_vendor/test_catalog_v1.py b/tests/test_vendor/test_catalog_v1.py
@@ -0,0 +1,113 @@
+"""Tests for catalog v1 parser, specifically testing extra fields handling."""
+import pytest
+
+from vendor.dbt_artifacts_parser.parsers.catalog.catalog_v1 import Metadata
+
+
+class TestMetadataExtraFields:
+    """Test that Metadata class accepts extra fields from dbt."""
+
+    def test_metadata_accepts_extra_fields(self):
+        """Test that metadata accepts fields not explicitly defined in the model."""
+        # Test with a new field that dbt might add in the future
+        data = {
+            "dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json",
+            "dbt_version": "1.9.0",
+            "generated_at": "2025-11-05T10:00:00Z",
+            "invocation_id": "test-invocation-123",
+            "invocation_started_at": "2025-11-05T09:59:00Z",  # New field
+            "new_future_field": "some_value",  # Another potential future field
+        }
+
+        # This should not raise a validation error
+        metadata = Metadata(**data)
+
+        # Verify that known fields are accessible normally
+        assert metadata.dbt_schema_version == "https://schemas.getdbt.com/dbt/catalog/v1.json"
+        assert metadata.dbt_version == "1.9.0"
+        assert metadata.generated_at == "2025-11-05T10:00:00Z"
+        assert metadata.invocation_id == "test-invocation-123"
+
+    def test_metadata_extra_fields_in_pydantic_extra(self):
+        """Test that extra fields are stored in __pydantic_extra__."""
+        data = {
+            "dbt_version": "1.9.0",
+            "invocation_started_at": "2025-11-05T09:59:00Z",
+            "new_field_1": "value1",
+            "new_field_2": 123,
+        }
+
+        metadata = Metadata(**data)
+
+        # Extra fields should be stored in __pydantic_extra__
+        assert metadata.__pydantic_extra__ is not None
+        assert "invocation_started_at" in metadata.__pydantic_extra__
+        assert "new_field_1" in metadata.__pydantic_extra__
+        assert "new_field_2" in metadata.__pydantic_extra__
+        assert metadata.__pydantic_extra__["invocation_started_at"] == "2025-11-05T09:59:00Z"
+        assert metadata.__pydantic_extra__["new_field_1"] == "value1"
+        assert metadata.__pydantic_extra__["new_field_2"] == 123
+
+    def test_metadata_model_dump_includes_extra_fields(self):
+        """Test that model_dump() includes extra fields."""
+        data = {
+            "dbt_version": "1.9.0",
+            "invocation_id": "test-123",
+            "invocation_started_at": "2025-11-05T09:59:00Z",
+            "future_field": "future_value",
+        }
+
+        metadata = Metadata(**data)
+        dumped = metadata.model_dump()
+
+        # All fields including extra should be in the dump
+        assert dumped["dbt_version"] == "1.9.0"
+        assert dumped["invocation_id"] == "test-123"
+        assert dumped["invocation_started_at"] == "2025-11-05T09:59:00Z"
+        assert dumped["future_field"] == "future_value"
+
+    def test_metadata_with_no_extra_fields(self):
+        """Test that metadata works normally when no extra fields are provided."""
+        data = {
+            "dbt_version": "1.9.0",
+            "generated_at": "2025-11-05T10:00:00Z",
+        }
+
+        metadata = Metadata(**data)
+
+        assert metadata.dbt_version == "1.9.0"
+        assert metadata.generated_at == "2025-11-05T10:00:00Z"
+
+    def test_metadata_with_only_extra_fields(self):
+        """Test that metadata accepts data with only extra fields (all known fields are Optional)."""
+        data = {
+            "some_new_field": "value",
+            "another_new_field": 42,
+        }
+
+        # This should work since all defined fields are Optional
+        metadata = Metadata(**data)
+
+        assert metadata.__pydantic_extra__["some_new_field"] == "value"
+        assert metadata.__pydantic_extra__["another_new_field"] == 42
+
+    def test_invocation_started_at_as_extra_field(self):
+        """Test the specific case of invocation_started_at being handled as an extra field."""
+        # This is the real-world scenario: dbt adds invocation_started_at
+        data = {
+            "dbt_schema_version": "https://schemas.getdbt.com/dbt/catalog/v1.json",
+            "dbt_version": "1.9.0",
+            "generated_at": "2025-11-05T10:00:00Z",
+            "invocation_id": "abc-123-def-456",
+            "invocation_started_at": "2025-11-05T09:55:30.123456Z",
+        }
+
+        # Should not raise ValidationError
+        metadata = Metadata(**data)
+
+        # The field should be accessible via __pydantic_extra__
+        assert metadata.__pydantic_extra__["invocation_started_at"] == "2025-11-05T09:55:30.123456Z"
+
+        # And should be included in model_dump()
+        dumped = metadata.model_dump()
+        assert dumped["invocation_started_at"] == "2025-11-05T09:55:30.123456Z"