Conversation
Eliminate the extractEntryUrls function and its associated tests as they are no longer needed. Update the startHandler to use a hardcoded entry URL instead. This simplifies the code and reduces complexity in the parsing logic. Signed-off-by: Manuel Ruck <git@manuelruck.de>
- Introduced a new configuration module to manage environment variables. - Replaced hardcoded values with configurable options for conference year, week, and limits. - Added tests to ensure default values and overrides work as expected.
652716d to
58830ca
Compare
There was a problem hiding this comment.
Pull Request Overview
This pull request removes dynamic conference week URL extraction logic and replaces it with a hardcoded configuration-based approach. The change simplifies the codebase by eliminating HTML parsing for URL discovery and instead uses centralized configuration to construct the target URL directly.
- Removes the
url-parser.tsmodule and its associated URL extraction functionality - Replaces dynamic URL discovery with hardcoded URL construction using configuration values
- Introduces a new centralized configuration system to manage environment variables and defaults
Reviewed Changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| services/cron-jobs/import-conference-week-details/src/services/parsers/url-parser.ts | File deleted - contained URL extraction functions |
| services/cron-jobs/import-conference-week-details/src/services/parsers/index.ts | Removes export of deleted url-parser module |
| services/cron-jobs/import-conference-week-details/src/services/parsers/tests/url-parser.test.ts | File deleted - contained tests for removed URL parser |
| services/cron-jobs/import-conference-week-details/src/services/parsers/tests/index.test.ts | Removes tests for deleted URL parser functions |
| services/cron-jobs/import-conference-week-details/src/services/html-parser.test.ts | Removes URL parser integration tests and import statements |
| services/cron-jobs/import-conference-week-details/src/routes.ts | Replaces dynamic URL extraction with hardcoded URL construction using config |
| services/cron-jobs/import-conference-week-details/src/index.ts | Updates to use new config module instead of direct environment variable access |
| services/cron-jobs/import-conference-week-details/src/crawler.ts | Updates to use config module for maxRequestsPerCrawl setting |
| services/cron-jobs/import-conference-week-details/src/config.ts | New centralized configuration module with type safety and validation |
| services/cron-jobs/import-conference-week-details/src/config.test.ts | Test coverage for the new configuration module |
| // Re-export defaults for documentation / potential external tooling | ||
| export const defaults = DEFAULTS; | ||
|
|
||
| export type { RawEnv }; |
There was a problem hiding this comment.
[nitpick] The RawEnv type export at the end of the file is inconsistent with the export placement pattern used elsewhere in the file. Consider moving this export closer to the RawEnv interface definition (around line 19) for better code organization.
| // Recursively freeze an object (simple deep freeze for plain objects/arrays) | ||
| const deepFreeze = <T>(obj: T): T => { | ||
| if (obj && typeof obj === 'object' && !Object.isFrozen(obj)) { | ||
| Object.freeze(obj); | ||
| for (const key of Object.getOwnPropertyNames(obj)) { | ||
| const value = (obj as Record<string, unknown>)[key]; | ||
| if (value && typeof value === 'object') deepFreeze(value); | ||
| } | ||
| } | ||
| return obj; | ||
| }; |
There was a problem hiding this comment.
[nitpick] The deepFreeze function implements a generic recursive freezing utility that could be reused across the codebase. Consider extracting this to a shared utility module to follow the DRY principle and improve maintainability.
| // Build initial detail URL from central configuration | ||
| const absoluteUrl = new URL( | ||
| `/apps/plenar/plenar/conferenceweekDetail.form?year=${config.conference.year}&week=${config.conference.week}&limit=${config.conference.limit}`, | ||
| 'https://www.bundestag.de', |
There was a problem hiding this comment.
[nitpick] The base URL 'https://www.bundestag.de' is hardcoded here but also appears in the crawler configuration. Consider moving this to the config module to maintain a single source of truth and improve maintainability.
| 'https://www.bundestag.de', | |
| config.conference.baseUrl, |
This pull request removes the logic and code related to dynamically extracting conference week detail URLs from HTML pages, and instead hardcodes a specific conference week detail URL for processing. It also removes associated parser functions and their tests, simplifying the codebase and focusing the workflow on a single, known URL.
Routing and URL Extraction Changes:
startHandlerfunction inroutes.ts.extractEntryUrlsimport and usage fromroutes.ts, as well as related logging for dynamic URL extraction. [1] [2]Parser Module Cleanup:
url-parser.tsmodule, including theextractEntryUrls,isValidConferenceWeekUrl, andgetEntryPageUrlfunctions, and stopped re-exporting them from the main parser index. [1] [2]Test Suite Simplification:
html-parser.test.tsand the dedicatedurl-parser.test.tsfiles. [1] [2] fixes: fix import conference week details #682