Skip to content

Commit 565edbe

Browse files
rverkRob Verkuijlenclaude
authored
Feature/total progress (#19)
* Improve progress logging in basisprofiel app Add total counts for missing and outdated KvK numbers at the start of each processing step, so it's clear how far behind the mirror is. Add get_outdated_kvk_nummers_count() to BasisProfielReader. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Improve progress logging in vestigingen app Add total counts for missing and outdated vestigingen at the start of each processing step. Add get_missing_kvk_nummers_count() and get_outdated_vestigingen_count() to KvKVestigingenReader. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Improve progress logging in vestigingsprofiel app Add total counts for missing and outdated vestigingsprofielen at the start of each processing step. Add three count methods to VestigingsProfielReader. Also fix W1203 violations (f-strings in logger.info calls). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * .claude in .gitignore voor nu. --------- Co-authored-by: Rob Verkuijlen <rob.verkuijlen@outlook.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 6f1d719 commit 565edbe

File tree

8 files changed

+189
-30
lines changed

8 files changed

+189
-30
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
*.pyproj.user
55
settings.json
66
.idea/
7+
.claude/
78

89
#Remove test input/output
910
*.zip

CLAUDE.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
KVK-Connect is a Python library and Docker microservice suite for integrating with the Dutch Chamber of Commerce (KVK) API. It serves three purposes:
8+
1. A pip-installable package for fetching KVK data without deep API knowledge
9+
2. A local mirror of KVK data (basisprofiel, vestigingen, vestigingsprofiel) kept up-to-date in a database
10+
3. An optional mutation service that polls KVK API for company changes and syncs them to the local mirror
11+
12+
## Commands
13+
14+
Before every commit, always run `just check-all` twice. The first run may auto-fix files (ruff); the second run validates the result is clean. Only commit if the second run passes fully.
15+
16+
```bash
17+
# Install dependencies
18+
just install # uv sync
19+
20+
# Quality checks
21+
just test # pytest
22+
just cov # pytest with coverage report
23+
just lint # ruff check + format
24+
just typing # pyright type checking
25+
just check-all # all checks (lint, cov, typing, pre-commit)
26+
27+
# Run a single test
28+
uv run pytest tests/path/to/test_file.py::test_function_name
29+
30+
# Docker
31+
just docker-build # build containers
32+
just docker-up # start all services
33+
just docker-down # stop all services
34+
```
35+
36+
## Architecture
37+
38+
### Three Model Layers
39+
40+
Data flows through three distinct model layers (all in `src/kvk_connect/models/`):
41+
- **API models** (`models/api/`): Raw dataclasses deserialized from KVK API JSON responses. Each has `from_dict()` and `to_dict()` methods.
42+
- **Domain models** (`models/domain/`): Business logic representations, also dataclasses with `from_dict()`/`to_dict()`.
43+
- **ORM models** (`models/orm/`): SQLAlchemy mapped tables for database persistence.
44+
45+
### Mappers and Services
46+
47+
`src/kvk_connect/mappers/` contains functions that convert API models → Domain models. `src/kvk_connect/services/record_service.py` (`KVKRecordService`) is the public-facing high-level API that orchestrates fetching and mapping. Both `KVKApiClient` and `KVKRecordService` are exported from `kvk_connect.__init__` as the library's public API.
48+
49+
### Database Layer
50+
51+
`src/kvk_connect/db/` has separate reader (`*_reader.py`) and writer (`*_writer.py`) classes. Schema is auto-initialized via `ensure_database_initialized()` in `db/init.py` — no Alembic. Use direct SQL for any schema changes.
52+
53+
### Docker Apps
54+
55+
Five Docker services in `apps/`, each with a `main.py` and `Dockerfile`:
56+
- `gateway`: NGINX rate limiter (port 8080) — all apps route API calls through this
57+
- `basisprofiel`, `vestigingen`, `vestigingsprofiel`: Fetch and persist respective KVK data types
58+
- `mutatie-reader`: Polls KVK mutation API and writes change signals to DB
59+
60+
Apps depend on each other in order: mutatie-reader → basisprofiel → vestigingen → vestigingsprofiel. Compose files: `docker-compose.local.yaml` (SQLite/local) and `docker-compose.db.yaml` (PostgreSQL).
61+
62+
### Error Handling Pattern
63+
64+
`src/kvk_connect/exceptions.py` defines two error types:
65+
- `KVKPermanentError`: Company doesn't exist (e.g., error code IPD0005) — long retry delay (24h default)
66+
- `KVKTemporaryError`: Temporary unavailability (e.g., IPD1002, IPD1003) — short retry delay (10m)
67+
68+
`KVKApiClient` uses `@global_rate_limit()` decorator (`utils/rate_limit.py`) on all API methods.
69+
70+
## Coding Standards
71+
72+
- **Python 3.13+**, PEP 8, full type hints everywhere
73+
- Use `Mapped[T]` for SQLAlchemy ORM models
74+
- All dataclasses must have `from_dict()` static method and `to_dict()` using `asdict()`
75+
- Use `from __future__ import annotations` for forward references in dataclasses
76+
- Log with lazy `%s` formatting, not f-strings: `logging.info('Fetched %d records', count)` (Pylint W1203)
77+
- Use `uv` for package management, not `pip`
78+
- Database agnostic via SQLAlchemy — no Redis, no Alembic
79+
- Business domain terms in comments/docstrings may use Dutch; all code (variables, functions) in English
80+
81+
### Git Worktrees
82+
83+
When using git worktrees, create them **inside the project folder** (e.g., `.worktrees/`).

apps/basisprofiel/main.py

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -152,24 +152,19 @@ def process_csv(csv_path: str, kvk_client: KVKApiClient, writer: BasisProfielWri
152152

153153

154154
def process_missing(kvk_client: KVKApiClient, writer: BasisProfielWriter, reader: BasisProfielReader) -> int:
155-
logger.info("Finding missing KvK nummers...")
156-
157155
count_missing = reader.get_missing_kvk_nummers_count()
158-
logger.info("Total missing KvK nummers: %s", count_missing)
159-
160156
missing_kvk_nummers = reader.get_missing_kvk_nummers(KVK_FETCH_LIMIT)
161-
description = f"{len(missing_kvk_nummers)} missing KvK nummer(s) (limit: {KVK_FETCH_LIMIT})"
157+
logger.info("Missing KvK nummers: %s total, processing %s", count_missing, len(missing_kvk_nummers))
162158

163-
return process_kvk_nummers(missing_kvk_nummers, description, kvk_client, writer)
159+
return process_kvk_nummers(missing_kvk_nummers, "missing", kvk_client, writer)
164160

165161

166162
def process_outdated(kvk_client: KVKApiClient, writer: BasisProfielWriter, reader: BasisProfielReader) -> int:
167-
logger.info("Finding outdated KvK nummers...")
163+
count_outdated = reader.get_outdated_kvk_nummers_count()
168164
outdated_kvk_nummers = reader.get_outdated_kvk_nummers()
169-
logger.debug("Found outdated kvk records: %s", outdated_kvk_nummers)
165+
logger.info("Outdated KvK nummers: %s total, processing %s", count_outdated, len(outdated_kvk_nummers))
170166

171-
description = f"{len(outdated_kvk_nummers)} outdated KvK nummer(s)"
172-
return process_kvk_nummers(outdated_kvk_nummers, description, kvk_client, writer)
167+
return process_kvk_nummers(outdated_kvk_nummers, "outdated", kvk_client, writer)
173168

174169

175170
def run_daemon(kvk_client: KVKApiClient, engine, batch_size: int, interval: int) -> None:

apps/vestigingen/main.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,17 @@ def process_csv_kvk(csv_path: str, kvk_client: KVKApiClient, writer: KvKVestigin
6969

7070

7171
def process_missing(kvk_client: KVKApiClient, writer: KvKVestigingenWriter, reader: KvKVestigingenReader) -> int:
72+
count_missing = reader.get_missing_kvk_nummers_count()
7273
missing_kvk_nummers = reader.get_missing_kvk_nummers()
73-
description = f"{len(missing_kvk_nummers)} missing KvK nummer(s)"
74-
return process_kvk_nummers(missing_kvk_nummers, description, kvk_client, writer)
74+
logger.info("Missing vestigingen: %s total, processing %s", count_missing, len(missing_kvk_nummers))
75+
return process_kvk_nummers(missing_kvk_nummers, "missing", kvk_client, writer)
7576

7677

7778
def process_outdated(kvk_client: KVKApiClient, writer: KvKVestigingenWriter, reader: KvKVestigingenReader) -> int:
79+
count_outdated = reader.get_outdated_vestigingen_count()
7880
outdated_kvk_nummers = reader.get_outdated_vestigingen()
79-
description = f"{len(outdated_kvk_nummers)} outdated vestigingen"
80-
return process_kvk_nummers(outdated_kvk_nummers, description, kvk_client, writer)
81+
logger.info("Outdated vestigingen: %s total, processing %s", count_outdated, len(outdated_kvk_nummers))
82+
return process_kvk_nummers(outdated_kvk_nummers, "outdated", kvk_client, writer)
8183

8284

8385
def get_kvk_vestigingen(kvk_nummer: str, kvk_client: KVKApiClient) -> KvKVestigingsNummersDomain | None:
@@ -98,10 +100,7 @@ def run_daemon(kvk_client: KVKApiClient, engine, batch_size: int, interval: int)
98100
with KvKVestigingenWriter(engine, batch_size=batch_size) as writer:
99101
reader = KvKVestigingenReader(engine)
100102

101-
logger.info("Updating known Vestigingen...")
102103
count += process_outdated(kvk_client, writer, reader)
103-
104-
logger.info("Updating missing Vestigingen...")
105104
count += process_missing(kvk_client, writer, reader)
106105

107106
writer.flush()

apps/vestigingsprofiel/main.py

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -96,21 +96,23 @@ def process_csv_vestiging(csv_path: str, kvk_client: KVKApiClient, writer: Vesti
9696

9797

9898
def process_missing(kvk_client: KVKApiClient, writer: VestigingsProfielWriter, reader: VestigingsProfielReader) -> int:
99+
count_missing = reader.get_vestigingen_zonder_vestigingsprofielen_count()
99100
missing_profielen = reader.get_vestigingen_zonder_vestigingsprofielen()
100-
description = f"{len(missing_profielen)} vestigingen without VestigingsProfielen"
101-
return process_vestigingen(missing_profielen, description, kvk_client, writer)
101+
logger.info("Missing vestigingsprofielen: %s total, processing %s", count_missing, len(missing_profielen))
102+
return process_vestigingen(missing_profielen, "missing", kvk_client, writer)
102103

103104

104105
def process_outdated(kvk_client: KVKApiClient, writer: VestigingsProfielWriter, reader: VestigingsProfielReader) -> int:
105-
outdated_vestigingen = reader.get_outdated_vestigingen()
106-
description = f"{len(outdated_vestigingen)} outdated vestigingsprofielen (from kvk vestigingen)"
107-
logger.info(description)
108-
106+
count_outdated = reader.get_outdated_vestigingen_count()
107+
count_outdated_signaal = reader.get_outdated_vestigingen_signaal_count()
109108
outdated_vestigingen_signaal = reader.get_outdated_vestigingen_signaal()
110-
description = f"{len(outdated_vestigingen_signaal)} outdated vestigingsprofielen (from signaal)"
111-
logger.info(description)
112-
113-
return process_vestigingen(outdated_vestigingen_signaal, description, kvk_client, writer)
109+
logger.info("Outdated vestigingsprofielen from vestigingen: %s total", count_outdated)
110+
logger.info(
111+
"Outdated vestigingsprofielen from signaal: %s total, processing %s",
112+
count_outdated_signaal,
113+
len(outdated_vestigingen_signaal),
114+
)
115+
return process_vestigingen(outdated_vestigingen_signaal, "outdated", kvk_client, writer)
114116

115117

116118
def run_daemon(kvk_client: KVKApiClient, engine, batch_size: int, interval: int) -> None:

src/kvk_connect/db/basisprofiel_reader.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ def get_outdated_kvk_nummers(self, limit: int = 1000) -> list[str]:
7979
SignaalORM.timestamp > BasisProfielORM.last_updated,
8080
SignaalORM.vestigingsnummer.is_(None), # Alleen basisprofiel updates, geen vestigingsprofielen
8181
BasisProfielORM.niet_leverbaar_code.is_(None), # Geen tombstones updaten
82-
BasisProfielORM.retry_after.is_(None) | (BasisProfielORM.retry_after <= func.now()), # Geen actieve blokkade
82+
BasisProfielORM.retry_after.is_(None)
83+
| (BasisProfielORM.retry_after <= func.now()), # Geen actieve blokkade
8384
)
8485
.distinct()
8586
.limit(limit) # maximaal limit nieuwe per keer ophalen
@@ -88,6 +89,22 @@ def get_outdated_kvk_nummers(self, limit: int = 1000) -> list[str]:
8889
result = session.execute(stmt).scalars().all()
8990
return list(result)
9091

92+
def get_outdated_kvk_nummers_count(self) -> int:
93+
"""Retourneert het totaal aantal KVK nummers met een nieuwer signaal dan het opgeslagen basisprofiel."""
94+
with Session(self.engine) as session:
95+
stmt = (
96+
select(func.count(func.distinct(SignaalORM.kvknummer)))
97+
.join(BasisProfielORM, SignaalORM.kvknummer == BasisProfielORM.kvk_nummer)
98+
.where(
99+
SignaalORM.timestamp > BasisProfielORM.last_updated,
100+
SignaalORM.vestigingsnummer.is_(None),
101+
BasisProfielORM.niet_leverbaar_code.is_(None),
102+
BasisProfielORM.retry_after.is_(None) | (BasisProfielORM.retry_after <= func.now()),
103+
)
104+
)
105+
result = session.execute(stmt).scalar()
106+
return result or 0
107+
91108
def kvk_nummer_exists(self, kvk_nummer: str) -> bool:
92109
"""Check if KVK number exists in basisprofiel.
93110

src/kvk_connect/db/kvkvestigingen_reader.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from sqlalchemy import select
1+
from sqlalchemy import func, select
22
from sqlalchemy.engine import Engine
33
from sqlalchemy.orm import Session
44

@@ -25,6 +25,29 @@ def get_missing_kvk_nummers(self, limit: int = 1000) -> list[str]:
2525
result = session.execute(stmt).scalars().all()
2626
return list(result)
2727

28+
def get_missing_kvk_nummers_count(self) -> int:
29+
"""Retourneert het totaal aantal KVK nummers die wel in basisprofielen staan maar nog niet in kvkvestigingen."""
30+
with Session(self.engine) as session:
31+
stmt = (
32+
select(func.count(func.distinct(BasisProfielORM.kvk_nummer)))
33+
.select_from(BasisProfielORM)
34+
.outerjoin(VestigingenORM, BasisProfielORM.kvk_nummer == VestigingenORM.kvk_nummer)
35+
.where(VestigingenORM.kvk_nummer.is_(None))
36+
)
37+
result = session.execute(stmt).scalar()
38+
return result or 0
39+
40+
def get_outdated_vestigingen_count(self) -> int:
41+
"""Retourneert het totaal aantal KVK nummers waarvan de vestigingen verouderd zijn."""
42+
with Session(self.engine) as session:
43+
stmt = (
44+
select(func.count(func.distinct(BasisProfielORM.kvk_nummer)))
45+
.join(VestigingenORM, BasisProfielORM.kvk_nummer == VestigingenORM.kvk_nummer)
46+
.where(BasisProfielORM.last_updated > VestigingenORM.last_updated)
47+
)
48+
result = session.execute(stmt).scalar()
49+
return result or 0
50+
2851
def get_outdated_vestigingen(self, limit: int = 1000) -> list[str]:
2952
"""Geen een lijst van unieke kvknummers terug waarvan de vestigingen verouderd zijn.
3053

src/kvk_connect/db/vestigingenprofiel_reader.py

Lines changed: 40 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from sqlalchemy import select
1+
from sqlalchemy import func, select
22
from sqlalchemy.orm import Session
33

44
from kvk_connect.models.orm.signaal_orm import SignaalORM
@@ -31,6 +31,45 @@ def get_vestigingen_zonder_vestigingsprofielen(self, limit: int = 1000) -> list[
3131
result = session.execute(stmt)
3232
return [row[0] for row in result.fetchall()]
3333

34+
def get_vestigingen_zonder_vestigingsprofielen_count(self) -> int:
35+
"""Retourneert het totaal aantal vestigingsnummers zonder vestigingsprofiel."""
36+
with Session(self.engine) as session:
37+
stmt = (
38+
select(func.count(func.distinct(VestigingenORM.vestigingsnummer)))
39+
.outerjoin(
40+
VestigingsProfielORM, VestigingenORM.vestigingsnummer == VestigingsProfielORM.vestigingsnummer
41+
)
42+
.where(VestigingsProfielORM.vestigingsnummer.is_(None))
43+
.where(VestigingenORM.vestigingsnummer != VestigingenORM.SENTINEL_VESTIGINGSNUMMER)
44+
)
45+
result = session.execute(stmt).scalar()
46+
return result or 0
47+
48+
def get_outdated_vestigingen_count(self) -> int:
49+
"""Retourneert het totaal aantal vestigingsnummers met vestiging nieuwer dan het vestigingsprofiel."""
50+
with Session(self.engine) as session:
51+
stmt = (
52+
select(func.count(func.distinct(VestigingenORM.vestigingsnummer)))
53+
.join(VestigingsProfielORM, VestigingenORM.vestigingsnummer == VestigingsProfielORM.vestigingsnummer)
54+
.where(VestigingenORM.last_updated > VestigingsProfielORM.last_updated)
55+
.where(VestigingenORM.vestigingsnummer != VestigingenORM.SENTINEL_VESTIGINGSNUMMER)
56+
)
57+
result = session.execute(stmt).scalar()
58+
return result or 0
59+
60+
def get_outdated_vestigingen_signaal_count(self) -> int:
61+
"""Retourneert het totaal aantal vestigingsnummers met signaal nieuwer dan het vestigingsprofiel."""
62+
with Session(self.engine) as session:
63+
stmt = (
64+
select(func.count(func.distinct(SignaalORM.vestigingsnummer)))
65+
.join(VestigingsProfielORM, SignaalORM.vestigingsnummer == VestigingsProfielORM.vestigingsnummer)
66+
.where(
67+
SignaalORM.timestamp > VestigingsProfielORM.last_updated, SignaalORM.vestigingsnummer.is_not(None)
68+
)
69+
)
70+
result = session.execute(stmt).scalar()
71+
return result or 0
72+
3473
def get_outdated_vestigingen(self, limit: int = 1000) -> list[str]:
3574
"""Return lijst van vestigingsnummers met vestiging nieuwer dan de lastupdated van het vestigingenprofiel."""
3675

0 commit comments

Comments
 (0)