feat(conference-week): update procedure voteDate after crawl#707
Merged
feat(conference-week): update procedure voteDate after crawl#707
Conversation
Add a new utility to derive and batch-update Procedure.voteDate from saved conference week sessions (last 5 weeks) and call it after persisting conference weeks. Continue execution if the voteDate update fails and log results. Also: - Restrict crawler navigation to forward-only (enqueue next week only) - Add comprehensive docs (docs/votedate-flow.md) and update README with features/documentation link - Import and wire updateProcedureVoteDates in the main run flow
There was a problem hiding this comment.
Pull Request Overview
This PR adds automatic voteDate synchronization from conference weeks to procedures and optimizes the crawler to focus on upcoming votes by crawling forward only.
Key changes:
- New
updateProcedureVoteDatesutility that syncs vote dates from conference week sessions to procedures - Modified crawler to only enqueue future weeks (removed backward crawling)
- Comprehensive documentation added explaining the voteDate flow with Mermaid diagrams
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/utils/update-vote-dates.ts |
New utility to batch update procedure voteDates from the last 5 conference weeks |
src/index.ts |
Integrates voteDate update step after conference weeks are saved to DB |
src/routes.ts |
Removes backward crawling by not enqueueing previousYear/previousWeek URLs |
docs/votedate-flow.md |
Extensive documentation of the voteDate data flow with diagrams and troubleshooting |
README.md |
Updated feature list to highlight voteDate functionality |
services/cron-jobs/import-conference-week-details/src/utils/update-vote-dates.ts
Outdated
Show resolved
Hide resolved
Split updateProcedureVoteDates into smaller pure functions (extractProcedureIdsFromSession, groupProcedureIdsByDate, fetchRecentConferenceWeeks, updateProceduresVoteDate) and keep the orchestration in updateProcedureVoteDates. Add comprehensive unit tests for extraction and grouping logic to improve correctness and prevent regressions. Export ISession type from the ConferenceWeekDetail model and adjust package.json lint scripts to run lint:es and lint:ts. These changes improve testability, readability and separation of concerns. Signed-off-by: Manuel Ruck <git@manuelruck.de>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces significant improvements to the conference week details import package, focusing on automated detection and updating of vote dates for Bundestag procedures. The main changes include the addition of a comprehensive data flow documentation, a new utility for updating vote dates, and refinements to the crawling strategy to prioritize upcoming sessions and votes.
Vote date automation and crawling improvements
updateProcedureVoteDatesinsrc/utils/update-vote-dates.tsthat automatically updates thevoteDatefield in procedures based on the last five conference weeks with sessions, improving performance and ensuring current votes are prioritized.src/index.tsto callupdateProcedureVoteDatesafter saving conference weeks, logging the number of procedures updated and handling errors gracefully. [1] [2]src/routes.tsto only crawl forward to future weeks, focusing on upcoming sessions and votes rather than historical data.Documentation and developer guidance
README.mdwith a new summary emphasizing conference week scraping, vote detection, procedure linking, and automated vote date updates. Added a link to the new vote date flow documentation.docs/votedate-flow.md, a detailed technical documentation with diagrams explaining the full vote date data flow, crawler strategy, validation logic, data models, performance optimizations, and troubleshooting steps for common issues.