Skip to content

Fix/import conference week details#683

Merged
ManAnRuck merged 3 commits intomasterfrom
fix/import-conference-week-details
Sep 10, 2025
Merged

Fix/import conference week details#683
ManAnRuck merged 3 commits intomasterfrom
fix/import-conference-week-details

Conversation

@ManAnRuck
Copy link
Member

This pull request removes the logic and code related to dynamically extracting conference week detail URLs from HTML pages, and instead hardcodes a specific conference week detail URL for processing. It also removes associated parser functions and their tests, simplifying the codebase and focusing the workflow on a single, known URL.

Routing and URL Extraction Changes:

  • Removed the extraction of conference week detail URLs from the start page and replaced it with a hardcoded URL in the startHandler function in routes.ts.
  • Deleted the extractEntryUrls import and usage from routes.ts, as well as related logging for dynamic URL extraction. [1] [2]

Parser Module Cleanup:

  • Removed the url-parser.ts module, including the extractEntryUrls, isValidConferenceWeekUrl, and getEntryPageUrl functions, and stopped re-exporting them from the main parser index. [1] [2]
  • Updated parser module integration tests to remove checks for the deleted URL parser functions.

Test Suite Simplification:

@ManAnRuck ManAnRuck requested a review from Copilot September 10, 2025 18:01
@ManAnRuck ManAnRuck self-assigned this Sep 10, 2025

This comment was marked as outdated.

Eliminate the extractEntryUrls function and its associated tests as they are no longer needed.
Update the startHandler to use a hardcoded entry URL instead.
This simplifies the code and reduces complexity in the parsing logic.

Signed-off-by: Manuel Ruck <git@manuelruck.de>
- Introduced a new configuration module to manage environment variables.
- Replaced hardcoded values with configurable options for conference year, week, and limits.
- Added tests to ensure default values and overrides work as expected.
@ManAnRuck ManAnRuck force-pushed the fix/import-conference-week-details branch from 652716d to 58830ca Compare September 10, 2025 18:14
@ManAnRuck ManAnRuck requested a review from Copilot September 10, 2025 18:15
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request removes dynamic conference week URL extraction logic and replaces it with a hardcoded configuration-based approach. The change simplifies the codebase by eliminating HTML parsing for URL discovery and instead uses centralized configuration to construct the target URL directly.

  • Removes the url-parser.ts module and its associated URL extraction functionality
  • Replaces dynamic URL discovery with hardcoded URL construction using configuration values
  • Introduces a new centralized configuration system to manage environment variables and defaults

Reviewed Changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
services/cron-jobs/import-conference-week-details/src/services/parsers/url-parser.ts File deleted - contained URL extraction functions
services/cron-jobs/import-conference-week-details/src/services/parsers/index.ts Removes export of deleted url-parser module
services/cron-jobs/import-conference-week-details/src/services/parsers/tests/url-parser.test.ts File deleted - contained tests for removed URL parser
services/cron-jobs/import-conference-week-details/src/services/parsers/tests/index.test.ts Removes tests for deleted URL parser functions
services/cron-jobs/import-conference-week-details/src/services/html-parser.test.ts Removes URL parser integration tests and import statements
services/cron-jobs/import-conference-week-details/src/routes.ts Replaces dynamic URL extraction with hardcoded URL construction using config
services/cron-jobs/import-conference-week-details/src/index.ts Updates to use new config module instead of direct environment variable access
services/cron-jobs/import-conference-week-details/src/crawler.ts Updates to use config module for maxRequestsPerCrawl setting
services/cron-jobs/import-conference-week-details/src/config.ts New centralized configuration module with type safety and validation
services/cron-jobs/import-conference-week-details/src/config.test.ts Test coverage for the new configuration module

// Re-export defaults for documentation / potential external tooling
export const defaults = DEFAULTS;

export type { RawEnv };
Copy link

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The RawEnv type export at the end of the file is inconsistent with the export placement pattern used elsewhere in the file. Consider moving this export closer to the RawEnv interface definition (around line 19) for better code organization.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +78
// Recursively freeze an object (simple deep freeze for plain objects/arrays)
const deepFreeze = <T>(obj: T): T => {
if (obj && typeof obj === 'object' && !Object.isFrozen(obj)) {
Object.freeze(obj);
for (const key of Object.getOwnPropertyNames(obj)) {
const value = (obj as Record<string, unknown>)[key];
if (value && typeof value === 'object') deepFreeze(value);
}
}
return obj;
};
Copy link

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The deepFreeze function implements a generic recursive freezing utility that could be reused across the codebase. Consider extracting this to a shared utility module to follow the DRY principle and improve maintainability.

Copilot uses AI. Check for mistakes.
// Build initial detail URL from central configuration
const absoluteUrl = new URL(
`/apps/plenar/plenar/conferenceweekDetail.form?year=${config.conference.year}&week=${config.conference.week}&limit=${config.conference.limit}`,
'https://www.bundestag.de',
Copy link

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The base URL 'https://www.bundestag.de' is hardcoded here but also appears in the crawler configuration. Consider moving this to the config module to maintain a single source of truth and improve maintainability.

Suggested change
'https://www.bundestag.de',
config.conference.baseUrl,

Copilot uses AI. Check for mistakes.
@ManAnRuck ManAnRuck merged commit 013e821 into master Sep 10, 2025
8 checks passed
@ManAnRuck ManAnRuck deleted the fix/import-conference-week-details branch September 10, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix import conference week details

2 participants