Granular exception handling for SML/SMP lookup failures#68
Granular exception handling for SML/SMP lookup failures#68harsha-amarasiri wants to merge 1 commit intoOxalisCommunity:masterfrom
Conversation
9673be5 to
1d39cdf
Compare
|
Did you notice yesterday an error at ELMA SMP? It returned 404 code - but actually it was Tomcat exception, fixed this morning. The world is not ideal. So even if SMP says 404 - it DOES NOT mean permanent error. Actually, only NotFoundException can be considered "permanent" - because in 15 minutes it can actually be published. Everything is temporarily :) Never give up and always retry :) |
|
Hi @dladlk , Thanks for the feedback. Sorry I did not know about the ELMA issue. I was looking at it from the perspective retrying / re-queuing within a short time window, for resiliency. So "permanent" actually meant unavailable within that window, until the config changes are made and published. I understand the wording is ambiguous and can be misinterpreted. I will improve the documentation and comments to be more precise and unambiguous. Is there a design/implementation level concern that you see ? |
Resolves #586, relates to Oxalis-AS4#158, #497, #666, vefa-peppol#56.
Production monitoring indicates that lookup/delivery failures thrown were
generic LookupException, some pointing to FileNotFoundException, making it difficult to distinguish permanent failures from transient ones. As a result implementing or attempting retry is challenging.
This change introduces a domain-specific exception taxonomy:
Exceptions are derived from the discussion in Oxalis-AS4#158
LookupException (base — existing, backward compatible)
├── PeppolResourceException — permanent, do not retry
│ e.g. participant not in SML, SMP returns 404
├── PeppolInfrastructureException — transient, retry may help
│ e.g. SMP 5xx, DNS SERVFAIL, DNS REFUSED
└── NetworkFailureException — transient, retry may help
e.g. socket timeout, connection refused, unknown host
DNS layer (BdxlLocator, BusdoxLocator):
- HOST_NOT_FOUND / TYPE_NOT_FOUND → PeppolResourceException
- TRY_AGAIN → PeppolInfrastructureException (with UDP→TCP fallback)
- UNRECOVERABLE → PeppolInfrastructureException
HTTP layer (ApacheFetcher, UrlFetcher):
- 404 → PeppolResourceException
- 500/502/503/504 → PeppolInfrastructureException
- Socket/timeout/DNS errors → NetworkFailureException
- FileNotFoundException is no longer thrown anywhere
LookupClient:
- Removed dead catch(FileNotFoundException) blocks
- PeppolResourceException caught and enriched with receiver/doctype context
- All other exceptions propagate with correct type preserved
1d39cdf to
59c2821
Compare
Pull Request Description
Resolves #586, relates to oxalis#586, oxalis#666, Oxalis-AS4#158, vefa-peppol#56
Lookup failures are reported as generic a
LookupException, and sometimes wrappingFileNotFoundException, making it difficult to distinguish resource/configuration failures from network related failures. As a result implementing or attempting retry is also challenging.This change introduces a domain-specific exception taxonomy: Exceptions are derived from the discussion in Oxalis-AS4#158
DNS layer (BdxlLocator, BusdoxLocator):
PeppolResourceExceptionPeppolInfrastructureException(with UDP->TCP fallback)PeppolInfrastructureExceptionHTTP layer (ApacheFetcher, UrlFetcher):
LookupClient:
catch(FileNotFoundException)blocksDesign Notes
Test Coverage
PeppolResourceExceptionBdxlLocatorTest/BusdoxLocatorTest: DNS result code -> exception type mappingApacheFetcherTest: HTTP status code -> exception type mapping (WireMock)FileNotFoundExceptionregression guards across all layersType of Pull Request
Type of Change
Pull Request Checklist:
mvn clean installbefore commit and all tests run successfullymasterbranch