Skip to content

Commit 7d84182

Browse files
Copilotpethers
andauthored
fix: correct data quality bugs in 34 database view definitions and ensure JPA backward compatibility (#8486)
* Initial plan * fix: correct data quality issues in 4 database views (party_summary, behavioral_trends, role_member views) ROOT CAUSE 5: Wrong join in view_riksdagen_party_summary - used dsc.hjid = dprc.hjid (PK coincidence) instead of correct FK dsc.document_person_reference_co_1 = dprc.hjid. This caused total_documents ~6 for M party instead of expected ~292K. ROOT CAUSE 6: Wrong motion type filter - used label LIKE '%motion%' but actual labels are codes like 'MJ408'. Now uses sub_type: Partimotion, Enskild motion, Kommittémotion. ROOT CAUSE 7: Hardcoded zeros for total_collaborative_motions, total_follow_up_motions, party/committee/individual_focused_members, highly_collaborative_members. Now computed from actual sub_type and document profile data. ROOT CAUSE 8: Wrong status filter in view_politician_behavioral_trends - used rule_violation.status = 'ACTIVE' but actual enum is OK/MINOR/MAJOR/CRITICAL. ROOT CAUSE 9: Non-existent document types in role_member views - 'ip'/'frs' don't exist, 'bet'/'yttr' have no person linkage. Replaced with sub_type classification. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/2fe8d220-26b9-4554-8614-0d9c6ce7ab3a * fix: correct data quality in 11 additional database views (vote case, doc joins, type filters) Session 2 adds changesets 1.79-005 through 1.79-015 fixing: ROOT CAUSE 10: Vote case sensitivity in mv_annual_voting_metrics (materialized view) - 'Ja'/'Nej'/'Avstår'/'Frånvarande' never match UPPERCASE data - Cascades to view_riksdagen_election_year_behavioral_patterns - Fixed: avg_yes_rate, avg_no_rate, avg_abstain_rate now populated ROOT CAUSE 11: Vote case sensitivity in seasonal quarterly activity - 'Frånvarande' never matches UPPERCASE, attendance_rate always ~100% - q_stddev_attendance = 0, attendance_z_score = 0 - Cascades to seasonal_activity_patterns, seasonal_anomaly_detection, q4_election_year_comparison ROOT CAUSE 12: Wrong document join (dsc.hjid = dprc.hjid) in 5 more views - view_risk_score_evolution, view_ministry_effectiveness_trends, view_ministry_risk_evolution, view_ministry_productivity_matrix, view_politician_risk_summary (stale in schema) - Fixed: document_count now populated in risk_score_evolution ROOT CAUSE 13: Non-existent document type 'ds' in ministry views - document_data only has: mot, bet, prop, kammakt - Fixed: replaced 'ds' with 'bet' (committee reports) ROOT CAUSE 14: Wrong sub_type filters in committee_productivity views - committee_document_data sub_types: bet, (empty), utl, ap - Not 'mot' or 'prop' ROOT CAUSE 15: Vote case sensitivity in crisis resilience & momentum - view_riksdagen_crisis_resilience_indicators: crisis_absence_rate = 0 - view_riksdagen_party_momentum_analysis: participation_rate = 0 Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/56b292be-89fa-4381-8f3c-c8f89326b6cc * fix: add 3 more view fixes (role_tier case, network hardcoded zeros) and regenerate schema via pg_dump Session 3 adds changesets 1.79-016 through 1.79-018: ROOT CAUSE 16: role_tier case sensitivity in 2 views - view_riksdagen_politician_role_evolution generates UPPERCASE role_tier (MINISTER, SPEAKER, PARTY_LEADER, COMMITTEE_CHAIR, COMMITTEE_MEMBER) - view_riksdagen_pre_election_quarterly_activity compared lowercase → politicians_with_new_roles and leadership_appointments always 0 - view_riksdagen_election_proximity_trends compared lowercase → leadership_role_count and committee_assignment_count always 0 ROOT CAUSE 17: hardcoded placeholder zeros in network analysis - view_election_cycle_network_analysis had literal (0)::bigint for influential_politicians, avg_network_centrality, power_broker_count - Replaced with calculated values from vote participation data All 18 changesets applied via Liquibase (mvn liquibase:update). full_schema.sql regenerated strictly via pg_dump per maintenance guide. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/f18a1e39-e11f-4b98-97f2-13d2f9bc52f8 * fix: correct integer division in power_broker_count calculation and regenerate schema via pg_dump Code review identified integer division truncation in 1.79-018 where LEAST(a,b)/GREATEST(a,b) would always truncate to 0 for unequal values. Fixed to use LEAST(a,b) directly as the power broker count metric. full_schema.sql regenerated strictly via pg_dump per maintenance guide. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/f18a1e39-e11f-4b98-97f2-13d2f9bc52f8 * fix: replace non-existent doc type 'frs' with 'kammakt' in committee_role_member view and regenerate schema via pg_dump Changeset 1.79-019 fixes view_riksdagen_committee_role_member where the initiatives count used non-existent document type 'frs' (written questions). Only mot, bet, prop, kammakt exist in document_data. Replaced with 'kammakt'. full_schema.sql regenerated strictly via pg_dump per maintenance guide. All 110 views now pass scan for known data quality issues: - No title-case vote comparisons (all use UPPERCASE JA/NEJ/FRÅNVARANDE/AVSTÅR) - No non-existent document types (ip/frs/yttr/ds) - No wrong rule_violation status ('ACTIVE') - No wrong document-to-person join chains - No lowercase role_tier comparisons Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/6d225996-45b0-4582-a083-7557a830c95a * fix: replace non-temporal view_party_performance_metrics joins with temporal data in 4 views and fix career trajectory cross-join Changesets 1.79-020 to 1.79-023: - view_riksdagen_party_electoral_trends: seat_count_proxy and documents_produced were constant per party (from non-temporal view_party_performance_metrics), making stddev_seats_party=0 and all seat_change/documents_change columns 0. Fixed: derive from party_total_votes/number_ballots (temporal) and politician_document matview per year. - view_riksdagen_party_longitudinal_performance: same non-temporal join caused membership_change, documents_change, discipline_change to always be 0. Fixed: same temporal data approach. - view_riksdagen_politician_career_trajectory: empty (0 rows) due to expensive cross-joins between vote_data (3.7M rows), assignment_data, and doc refs without time constraints. Fixed: pre-aggregate leadership_roles and documents_authored in separate CTEs to avoid Cartesian product. - view_election_cycle_comparative_analysis: CROSS JOIN to non-temporal view_party_performance_metrics made change_performance_pct and change_documents_pct always 0. Fixed: temporal data from ballot/doc views. full_schema.sql regenerated via pg_dump per maintenance guide. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/75b03ed1-2a37-439b-b6af-c4773b4a6c6f * fix: address PR review comments — fix column aliases, add missing JPA columns, recreate index Changesets 1.79-024 to 1.79-029: - view_committee_productivity (1.79-024): fix column alias mapping so reports→reports_count, other_documents→motions_count matches JPA entity - view_riksdagen_party_role_member (1.79-025): document that total_interpellations/total_written_questions now contain party_motions/committee_motions since interpellations don't exist in data - view_riksdagen_committee_role_member (1.79-026): same documentation fix - view_riksdagen_party_electoral_trends (1.79-027): add 13 missing columns required by JPA entity (rank_by_engagement, rank_by_effectiveness, quartile_by_performance, electoral_trend, party_size_category, seat_forecast, performance_forecast, election_readiness_score, is_pre_election_period, is_election_period, is_post_election_period, trend_position_seats, electoral_tier) - view_riksdagen_party_longitudinal_performance (1.79-028): add 29 missing columns required by JPA entity (rank_by_win_rate, rank_by_participation, percentile_win_rate, percentile_participation, percentile_approval, quartile_by_win_rate, quartile_by_overall_performance, etc.) - Recreate missing index idx_mv_annual_voting_metrics_year (1.79-029) full_schema.sql regenerated via pg_dump per maintenance guide. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/fb2bc0cc-7181-44bd-a83e-42e636f4c200 * fix: address second round PR review — fix JPA type mismatches, midterm z-score, and rebellion detection Changesets 1.79-030 to 1.79-033: - view_riksdagen_party_electoral_trends (1.79-030): Fix JPA type mismatches: seat_change_absolute→bigint (JPA Long), seat_forecast→text (JPA String), performance_forecast→text (JPA String), trend_position_seats→text (JPA String), projected_seat_change→numeric (JPA BigDecimal) - view_riksdagen_party_longitudinal_performance (1.79-031): Fix JPA type mismatches: membership_change→bigint (JPA Long), trend_position→text (JPA String) - view_election_cycle_comparative_analysis (1.79-032): Add midterm_stddev_docs to election_baseline and use it for non-election-year document z-score calculation instead of incorrectly using election_stddev_docs for midterm years - view_riksdagen_crisis_resilience_indicators (1.79-033): Fix meaningless rebellion detection that compared vote value (JA/NEJ) to party code (S/M/SD). Now uses the proper rebel flag from view_riksdagen_vote_data_ballot_politician_summary which correctly determines rebellion by comparing individual vote against party majority. full_schema.sql regenerated via pg_dump per maintenance guide. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/0e2de8c8-1029-478a-91e4-95713201c8c5 * fix: add midterm_stddev_docs to behavioral patterns view and restore view comments Changeset 1.79-034: - view_riksdagen_election_year_behavioral_patterns: Added midterm_stddev_docs to election_baseline CTE and used it for non-election-year document z-score calculation instead of incorrectly using election_stddev_docs for midterm years. Mirrors the fix applied to view_election_cycle_comparative_analysis in 1.79-032. - Added COMMENT ON VIEW for view_riksdagen_election_year_behavioral_patterns and view_riksdagen_politician_career_trajectory for DB documentation. full_schema.sql regenerated via pg_dump per maintenance guide — no manual edits. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/ff4d9fee-b7ed-4efc-b435-b1c9357447a7 * fix: regenerate full_schema.sql via pg_dump after validating all 34 changesets applied All 34 changesets (1.79-001 through 1.79-034) validated as applied. Liquibase status confirms database is up to date. full_schema.sql regenerated strictly via pg_dump per maintenance guide — never manually edited. Co-authored-by: pethers <1726836+pethers@users.noreply.github.com> Agent-Logs-Url: https://github.com/Hack23/cia/sessions/15ef740e-875d-4797-ba50-e9fc6c077b01 --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
1 parent d1f3dd8 commit 7d84182

File tree

3 files changed

+5919
-1237
lines changed

3 files changed

+5919
-1237
lines changed

0 commit comments

Comments
 (0)