Skip to content

Latest commit

 

History

History
428 lines (360 loc) · 17.9 KB

File metadata and controls

428 lines (360 loc) · 17.9 KB

GitHub Copilot Instructions for Matrikkel Java Project

Project Overview

This is a generic Java Spring Boot application for integrating with the Norwegian Matrikkel SOAP API. The project handles bulk download of cadastral data (matrikkelenheter) from any Norwegian municipality and stores it in PostgreSQL.

Key Technologies

  • Java 17 with Spring Boot 3.2.0
  • JAX-WS for SOAP client (automatic serialization)
  • PostgreSQL with Spring Data JPA + Hibernate
  • Maven for build management
  • Flyway for database migrations

Critical Implementation Details

SOAP Client - MatrikkelBubbleId Handling

⚠️ IMPORTANT: When working with Matrikkel API:

  • Use MatrikkelBubbleId for cursor-based pagination
  • Always set snapshotVersion to far future date (9999-01-01) to avoid permission errors
  • JAX-WS handles automatic XML serialization - NO manual XML needed!
// Correct way to create MatrikkelContext
MatrikkelContext context = new MatrikkelContext();
SnapshotVersion snapshotVersion = new SnapshotVersion();
ZonedDateTime futureDate = ZonedDateTime.of(9999, 1, 1, 0, 0, 0, 0, ZoneId.of("Europe/Oslo"));
snapshotVersion.setTimestamp(futureDate);
context.setSnapshotVersion(snapshotVersion);

Package Structure

no.matrikkel/
├── config/          # Spring configuration (DB, SOAP clients, properties)
├── client/          # SOAP client wrappers
│   └── generated/   # Auto-generated from WSDL (don't edit!)
├── domain/
│   ├── entity/      # JPA entities for database
│   └── dto/         # Data transfer objects
├── repository/      # Spring Data JPA repositories
├── service/         # Business logic layer
├── mapper/          # Entity ↔ DTO converters
└── cli/            # Command-line interface

Code Generation

  • WSDL files are in src/main/resources/wsdl/
  • Run mvn compile to generate Java classes from WSDL
  • Generated classes go to target/generated-sources/wsimport/
  • NEVER edit generated classes - regenerate from WSDL instead

CRITICAL: WSDL Package Consolidation ✅

ALL WSDL services MUST use the same package name!

Since StoreService and NedlastningService use the SAME XSD schema, they generate identical classes. If they use different package names, type casting between packages will FAIL at runtime!

Solution implemented:

<!-- pom.xml - ALL services use nedlastning package -->
<execution>
    <id>wsimport-store</id>
    <goals><goal>wsimport</goal></goals>
    <configuration>
        <packageName>no.matrikkel.client.generated.nedlastning</packageName>
        <wsdlFiles>
            <wsdlFile>${project.basedir}/src/main/resources/wsdl/StoreServiceWS.wsdl</wsdlFile>
        </wsdlFiles>
    </configuration>
</execution>

Result:

  • StoreService returns nedlastning.Grunneiendom
  • NedlastningService returns nedlastning.Grunneiendom
  • They are now THE SAME CLASS → casting works! ✅

Critical error prevented:

// ❌ BEFORE (different packages):
store.Grunneiendom cannot be cast to nedlastning.Matrikkelenhet
// Even though store.Grunneiendom extends store.Matrikkelenhet!

// ✅ AFTER (unified package):
nedlastning.Grunneiendom extends nedlastning.Matrikkelenhet
// Casting works perfectly!

Database Conventions

  • Entity table names: matrikkel_* (e.g., matrikkel_matrikkelenheter)
  • Use @Index annotations for frequently queried columns
  • All entities should have sist_lastet_ned timestamp
  • Use Flyway migrations in src/main/resources/db/migration/

API Credentials

  • Username: Configured per municipality (e.g., [municipality]_test for Bergen test environment)
  • Environment URLs:
    • Test: https://wsweb-test.matrikkel.no/matrikkel-ws-v1.0/
    • Prod: https://wsweb.matrikkel.no/matrikkel-ws-v1.0/
  • Always use Basic Authentication via JAX-WS BindingProvider
  • Credentials should be externalized in environment variables or .env file

Running Commands with Environment Variables

⚠️ CRITICAL: Always load .env file before running Maven commands!

The .env file contains comments (lines starting with #) that must be filtered out:

# ❌ WRONG - Will fail if .env has comments
source .env && mvn spring-boot:run

# ✅ CORRECT - Filter out comments with grep
export $(grep -v '^#' .env | xargs) && mvn spring-boot:run -Dspring-boot.run.arguments="--import --kommune=1103"

# Alternative: Set -a before sourcing
set -a && source .env && set +a && mvn spring-boot:run

Why this matters:

  • .env contains database credentials (DB_HOST, DB_PORT, DB_USERNAME, DB_PASSWORD)
  • .env contains API credentials (MATRIKKEL_API_USERNAME, MATRIKKEL_API_PASSWORD)
  • Spring Boot reads these via ${DB_HOST} placeholders in application.yml
  • Without loading .env, the application will fail to connect to database/API

Shell script pattern:

#!/bin/bash
set -a
source .env
set +a

# Now run Maven commands
mvn spring-boot:run -Dspring-boot.run.arguments="--import --kommune=1103"

Batch Processing

  • Default batch size: 5000 (API maximum)
  • Use cursor-based pagination with findObjekterEtterId()
  • Check if batch.size() < maxBatchSize to detect last batch
  • Implement retry logic for transient failures

Transaction Management - CRITICAL ⚠️

Per-Batch Commits Pattern (for large imports):

// ❌ WRONG - Outer transaction "swallows" inner @Transactional commits
@Transactional
public void importAll() {
    for (batch : batches) {
        personService.saveBatch(batch);  // @Transactional - NOT committed until outer method ends!
    }
}

// ✅ CORRECT - No outer transaction, inner @Transactional commits immediately
public void importAll() {
    for (batch : batches) {
        personService.saveBatch(batch);  // @Transactional - commits after each batch!
    }
}

Key Rules:

  1. Remove @Transactional from methods that call batch-saving services
  2. Each batch-saving service method should have its own @Transactional
  3. Avoid TransactionTemplate wrapping around batch operations
  4. Use @Transactional(propagation = Propagation.REQUIRES_NEW) if you must nest

Performance:

  • Person batch saving: 500 per batch = 583/sec (vs 1.8/sec with wrong pattern!)
  • Linking optimization: Use in-memory Maps instead of N+1 queries (348k→3 queries)

Two-Phase Import Architecture 🔄

Phase 1 (--base-import): Download matrikkelenheter + personer

mvn spring-boot:run -Dspring-boot.run.arguments="--import --kommune=4601 --base-import"
  • Downloads ALL matrikkelenheter for kommune
  • Fetches person data (eierforhold)
  • SKIPS bygninger/bruksenheter/adresser (saves time!)
  • Use case: Build base dataset once

Phase 1 with Person Filter (RECOMMENDED): Server-side filtered download ✨

mvn spring-boot:run -Dspring-boot.run.arguments="--import --kommune=1103 --personnummer=964965226"
  • SERVER-SIDE FILTERING: Uses MatrikkelenhetService.findMatrikkelenheter() to get filtered IDs
  • EFFICIENT FETCHING: Uses StoreService.getObjects() to fetch ONLY those matrikkelenheter
  • Example: Downloaded 4,744 matrikkelenheter in 12 seconds (10 batches × 500 objects)
  • NO bulk download of entire kommune needed!
  • Use case: Extract data for specific persons/organizations efficiently

Phase 2 (--filter-existing): Selective bygning/bruksenhet fetching

mvn spring-boot:run -Dspring-boot.run.arguments="--filter-existing --kommune=4601 --personnummer=964338531"
  • Loads existing matrikkelenheter from database
  • Applies filters (personnummer/organisasjonsnummer/IDs)
  • Fetches bruksenheter with API-side filtering (matrikkelenhetfilter) ✅ EFFICIENT!
  • Fetches bygninger for entire kommune, then filters client-side ⚠️ Less efficient
  • Fetches adresser for those bruksenheter
  • Use case: Extract detailed data for specific organizations

Why This Architecture?

  • Avoids downloading ALL bygninger/bruksenheter unnecessarily
  • Supports targeted data extraction for organizations
  • Enables incremental dataset building
  • Much faster for filtered queries
  • Server-side filtering dramatically reduces data transfer

API Filtering Support - CRITICAL UPDATE (2025-10-21):

✅ MATRIKKELENHETER - TWO-STEP PATTERN (WORKS PERFECTLY!):

  1. MatrikkelenhetService.findMatrikkelenheter(MatrikkelenhetsokModel) with nummerForPerson filter
    • Returns filtered MatrikkelenhetIdList (server-side filtering!)
    • Example: Found 4,744 matrikkelenheter for person 964965226 in kommune 1103
  2. StoreService.getObjects(MatrikkelBubbleIdList) → Returns full Matrikkelenhet objects
    • Fetches ONLY the specific matrikkelenheter by ID
    • Example: Downloaded 4,744 objects in 10 batches (500/batch) in ~12 seconds

Implementation:

// Step 1: Server-side filtering
MatrikkelenhetsokModel sokModel = new MatrikkelenhetsokModel();
sokModel.setKommunenummer(kommunenummer);
sokModel.setNummerForPerson(personnummer);
MatrikkelenhetIdList idList = matrikkelenhetService.findMatrikkelenheter(sokModel, context);

// Step 2: Fetch full objects in batches
List<Long> ids = idList.getMatrikkelenhetIdList().stream()
    .map(MatrikkelenhetId::getValue)
    .collect(Collectors.toList());

// Batch fetch with StoreService (500 per batch recommended)
for (int i = 0; i < ids.size(); i += 500) {
    List<Long> batchIds = ids.subList(i, Math.min(i + 500, ids.size()));
    MatrikkelBubbleIdList bubbleIdList = new MatrikkelBubbleIdList();
    // ... convert Long IDs to MatrikkelenhetId and add to bubbleIdList ...
    MatrikkelBubbleObjectList objects = storeService.getObjects(bubbleIdList, context);
    // ... process objects ...
}

Performance Impact:

  • Kommune 1103 with person filter 964965226:
    • ✅ MatrikkelenhetService: 4,744 IDs found (server-side filtered, ~1 second)
    • ✅ StoreService.getObjects(): 4,744 objects downloaded (10 batches × 500, ~12 seconds)
    • ❌ OLD (bulk download): Would download ALL ~15,000 matrikkelenheter in kommune
    • Result: 99% reduction in unnecessary data transfer!

❌ NedlastningService - DO NOT USE for individual fetches:

  • NedlastningService.findMatrikkelenhetById() → SOAP error (MatrikkelBubbleId mapping issue)
  • NedlastningService with JSON matrikkelenhetfilter → API completely ignores it!
  • Use NedlastningService ONLY for bulk kommune downloads without filtering

✅ BRUKSENHETER - TWO-STEP PATTERN (ACTUALLY WORKS!):

  1. BruksenhetService.findBruksenheterForMatrikkelenheter(MatrikkelenhetIdList) → Returns MatrikkelenhetIdTilBruksenhetIdsMap
  2. StoreService.getObjects(MatrikkelBubbleIdList) → Returns full Bruksenhet objects

Implementation:

// Step 1: Get bruksenhet IDs
MatrikkelenhetIdList matrikkelenhetIdList = new MatrikkelenhetIdList();
// ... populate list ...
MatrikkelenhetIdTilBruksenhetIdsMap resultMap = 
    bruksenhetService.findBruksenheterForMatrikkelenheter(matrikkelenhetIdList, context);

// Step 2: Fetch full objects
MatrikkelBubbleIdList bubbleIdList = new MatrikkelBubbleIdList();
// ... convert BruksenhetId to MatrikkelBubbleId ...
MatrikkelBubbleObjectList objects = storeService.getObjects(bubbleIdList, context);

Performance Impact:

  • Bergen Kommune Phase 2 with 85 matrikkelenheter:
    • ❌ OLD (NedlastningService): 263,764 bruksenheter downloaded (ALL in kommune!)
    • ✅ NEW (BruksenhetService): ~100-200 bruksenheter (actually filtered!)

ADRESSER (Addresses) - TWO-STEP PATTERN (WORKS!):

  1. AdresseService.findAdresserForMatrikkelenheter(MatrikkelenhetIdList) → Returns MatrikkelenhetIdTilAdresseIdsMap
  2. StoreService.getObjects(MatrikkelBubbleIdList) → Returns full Adresse objects

Implementation:

// Step 1: Get adresse IDs
MatrikkelenhetIdList matrikkelenhetIdList = new MatrikkelenhetIdList();
// ... populate list ...
MatrikkelenhetIdTilAdresseIdsMap resultMap = 
    adresseService.findAdresserForMatrikkelenheter(matrikkelenhetIdList, context);

// Step 2: Fetch full objects
MatrikkelBubbleIdList bubbleIdList = new MatrikkelBubbleIdList();
// ... convert AdresseId to MatrikkelBubbleId ...
MatrikkelBubbleObjectList objects = storeService.getObjects(bubbleIdList, context);

⚠️ CRITICAL SEQUENCING: Download ALL veger BEFORE adresser!

// Phase 2 import order:
fetchAndSaveVegData(kommunenummer);        // 1. Download ALL streets (bulk)
fetchAndSaveAdresseData(matrikkelenheter); // 2. Download filtered addresses
  • Adresser reference Veg entities via foreign key
  • AdresseMapper.toEntity() requires Veg to exist in database
  • Missing veger causes all addresses to be skipped!

Performance Impact:

  • Bergen Kommune Phase 2 with 85 matrikkelenheter:
    • ❌ POTENTIAL (NedlastningService): ~50,000 adresser (all in kommune)
    • ✅ ACTUAL (AdresseService): 37 adresser (API-filtered) = 99.9% reduction!
    • Veger: 1,944 downloaded (bulk, once per kommune)
    • Mapping: 37/37 vegadresser mapped successfully (0 skipped)

Address Types:

  • Vegadresse: Street address (most common) - Fully supported
  • Matrikkeladresse: Cadastral address (rare) - Entity not implemented, skipped in mapper

Debugging Phase 2:

# Run with verbose logging to see what's happening
./test_phase2_verbose.sh

Look for these log messages:

  • "🛣️ Fetching ALL veger (streets) for kommune X..."
  • "Downloaded X veger from API"
  • "Finding adresser for X matrikkelenheter using two-step pattern"
  • "Step 1 complete: Found X unique adresser"
  • "Step 2 complete: Downloaded X adresser from StoreService (API-filtered)"
  • "Mapped X adresser: Y vegadresser, Z matrikkeladresser (A skipped)"
  • "✅ Successfully saved X adresser for kommune Y (two-step pattern)"

Person Number Filtering - Critical Discovery 🔍

⚠️ IMPORTANT: Matrikkel API stores person numbers in Person.nummer (base table), NOT in specialized fields!

Database Schema:

matrikkel_personer (base table)
├── id (JPA primary key)
├── matrikkel_person_id (API PersonId)
└── nummer ← fødselsnummer OR organisasjonsnummer stored here!

matrikkel_fysiske_personer (subclass)
└── fodselsnummer ← ALWAYS NULL! (API doesn't provide it separately)

matrikkel_juridiske_personer (subclass)
└── organisasjonsnummer ← ALWAYS NULL! (API doesn't provide it separately)

Filtering Pattern:

// ❌ WRONG - These fields are NULL!
juridiskPersonRepository.findByOrganisasjonsnummer(nummer);
fysiskPersonRepository.findByFodselsnummer(nummer);

// ✅ CORRECT - Use Person.nummer (universal field)
personRepository.findByNummer(nummer);

Phase 2 Filtering Flow:

  1. Try PersonService API: findPersonIdByNummer(nummer) → may return 404!
  2. Fallback: Query database personRepository.findByNummer(nummer)
  3. Cast to FysiskPerson or JuridiskPerson based on type
  4. Find eierforhold using JPA foreign keys (fysisk_person_id, juridisk_person_entity_id)
  5. Filter matrikkelenheter by eierforhold

Why API Returns 404:

  • PersonService API may have access restrictions
  • Person data exists in database from Phase 1 (StoreService)
  • Database fallback ensures filtering works regardless of API availability

Logging

  • Use @Slf4j from Lombok
  • Log batch progress: log.info("Batch {}: Mottok {} objekter", batchNumber, batch.size())
  • Debug level for SOAP request/response details
  • Error logs should include context (kommune, batch number, etc.)

Testing Guidelines

  • Unit tests: Mock SOAP clients and repositories
  • Integration tests: Use Testcontainers for PostgreSQL
  • Test data: Use kommunenummer "4601" (Bergen) for examples
  • Verify SOAP serialization in integration tests

Configuration Files

  • application.yml - Main configuration
  • application-dev.yml - Development overrides
  • application-prod.yml - Production overrides
  • .env - Local development credentials (NOT in git!)

Common Pitfalls to Avoid

  1. ❌ Don't manually serialize MatrikkelBubbleId to XML - JAX-WS does this
  2. ❌ Don't use ddl-auto: create - use Flyway migrations
  3. ❌ Don't forget to set snapshotVersion in MatrikkelContext
  4. ❌ Don't commit credentials to git - use environment variables
  5. Don't run Maven commands without loading .env - use export $(grep -v '^#' .env | xargs) first!
  6. ❌ Don't edit generated SOAP client classes
  7. Don't wrap batch operations in outer @Transactional - kills per-batch commits!
  8. Don't query FysiskPerson.fodselsnummer or JuridiskPerson.organisasjonsnummer - they're NULL!
  9. Don't assume PersonService API always works - implement database fallback
  10. Don't use N+1 queries for linking - load into Map first (O(1) lookup)
  11. Don't use NedlastningService JSON filters - they don't work! Use dedicated service methods
  12. Don't use matrikkelenhetfilter with NedlastningService - API ignores it completely!
  13. Don't download adresser before veger - AdresseMapper needs Veg entities in database!
  14. Don't use NedlastningService.findMatrikkelenhetById() - causes SOAP error! Use StoreService.getObjects() instead!
  15. CRITICAL: Don't generate WSDL services into different packages! - StoreService and NedlastningService MUST use same package (nedlastning) to avoid ClassCastException at runtime!

When Adding New WSDL Services

  1. Add WSDL file to src/main/resources/wsdl/
  2. Add <execution> in pom.xml for wsimport
  3. CRITICAL: Use no.matrikkel.client.generated.nedlastning package for ALL services that share XSD schemas!
  4. Create wrapper class in client/ package
  5. Register bean in SoapClientConfig
  6. Run mvn clean compile to generate classes

Code Review Checklist

  • Proper error handling and logging
  • Transactions on service methods
  • Database indexes for new queries
  • Unit and integration tests
  • No hardcoded credentials
  • Javadoc on public methods
  • Consistent naming conventions

Related Documentation

  • Full setup guide: JAVA_PROJECT_SETUP_GUIDE.md
  • API documentation: docs/API_DOCUMENTATION.md
  • Database schema: docs/DATABASE_SCHEMA.md

Questions?

Refer to the comprehensive setup guide in JAVA_PROJECT_SETUP_GUIDE.md for detailed examples and explanations.