@@ -133,84 +133,130 @@ The app handles shutdown and stops the Mongo container cleanly.
133133
134134Repeat for every tablet.
135135
136- ## 7) Data Export and Analysis Procedure
136+ ## 7) Data Migration + Analysis Pipeline (Unified 01..07)
137137
138- Use this when you need CSV/JSON outputs for strategy and picklist work.
138+ This workflow is now config-driven and runs through ordered scripts with CSV handoff:
139139
140- ### 7.1 Keep backend running
140+ 1 . ` 01_extract_source.py `
141+ 2 . ` 02_clean_normalize.py `
142+ 3 . ` 03_feature_engineering.py `
143+ 4 . ` 04_team_aggregation.py `
144+ 5 . ` 05_picklist_scores.py `
145+ 6 . ` 06_export_app_payloads.py `
146+ 7 . ` 07_seed_fake_data.py ` (optional, controlled by config/flags)
141147
142- In terminal #1 (repo root):
148+ ### 7.1 One-time schema migration (recommended before first 2026 event run)
149+
150+ From repo root:
143151
144152``` powershell
145- npm run start
153+ npm run --workspace server migrate-match-schema
146154```
147155
148- ### 7.2 Run analysis script
156+ Migration report output:
149157
150- In terminal #2 :
158+ - ` server/static/match-schema-migration-report.json `
159+
160+ ### 7.2 Python setup (once per machine)
151161
152162``` powershell
153163cd ScoutingApp2026\data-analysis
154164python -m venv venv
155165.\venv\Scripts\Activate.ps1
156166pip install -r requirements.txt
157- python export_csv.py
158167```
159168
160- ### 7.3 Output locations
169+ ### 7.3 Configure pipeline behavior
170+
171+ Edit:
161172
162- Primary outputs go to:
173+ - ` data-analysis/pipeline_config.json `
163174
164- - ` data-analysis/output `
175+ Main knobs:
165176
166- Legacy CSV outputs are also written to:
177+ - ` source.mode ` : ` mongo ` or ` fake `
178+ - ` source.mongo_url ` / ` source.db `
179+ - ` paths.output_dir `
180+ - ` analysis.metrics ` (enabled flags, weights, direction)
181+ - ` analysis.timeline_bin_sec `
182+ - ` fake_data.* ` (including ` run_stage_07 ` and ` seed_mongo ` )
167183
168- - ` data-analysis/match_raw_2026.csv `
169- - ` data-analysis/super_raw_2026.csv `
170- - ` data-analysis/pit_2026.csv `
171- - ` data-analysis/team_agg_2026.csv `
172- - ` data-analysis/metric_summary_2026.csv `
184+ ### 7.4 Run full pipeline (real Mongo data)
173185
174- ### 7.4 Optional analysis flags
186+ Keep server running in terminal # 1 :
175187
176188``` powershell
177- python export_csv.py --mongo-url mongodb://localhost:27017/ --db test --output-dir .\output
189+ npm run start
178190```
179191
180- ## 8) Generate Fake Data (for Testing Picklist/Recon)
192+ Then in terminal # 2 :
181193
182- There are two fake-data paths.
194+ ``` powershell
195+ cd ScoutingApp2026\data-analysis
196+ .\venv\Scripts\Activate.ps1
197+ python run_pipeline.py --source-mode mongo
198+ ```
183199
184- ### 8.1 Database fake scouting data (recommended )
200+ ### 7.5 Run full pipeline ( fake source, with fake generation stage )
185201
186- Populates Mongo collections with synthetic match/pit/leaderboard entries.
202+ ``` powershell
203+ cd ScoutingApp2026\data-analysis
204+ .\venv\Scripts\Activate.ps1
205+ python run_pipeline.py --source-mode fake --run-stage-07
206+ ```
187207
188- From repo root :
208+ Optional: seed Mongo during stage 07 :
189209
190210``` powershell
191- npm run --workspace server gen-fake-data
211+ python run_pipeline.py --source-mode fake --run-stage-07 --seed-mongo
192212```
193213
194- Optional environment overrides (PowerShell examples):
214+ ### 7.6 Pipeline outputs
215+
216+ All outputs are written to ` data-analysis/output ` (or ` paths.output_dir ` ):
217+
218+ - ` 00_pipeline_report.json `
219+ - ` 01_match_raw.csv ` , ` 01_pit_raw.csv ` , ` 01_raw_snapshot.json `
220+ - ` 02_match_clean.csv ` , ` 02_pit_clean.csv ` , ` 02_validation_report.csv `
221+ - ` 03_match_features.csv ` , ` 03_timeseries_long.csv ` , ` 03_auto_path_points.csv `
222+ - ` 04_team_aggregates.csv `
223+ - ` 05_picklist_scores.csv ` , ` 05_metric_contributions.csv `
224+ - ` 06_picklist_payload.json ` , ` 06_team_profiles.json `
225+ - ` 07_seed_report.json ` (only when stage 07 runs)
226+
227+ Picklist app reads analyzed payload from:
228+
229+ - ` data-analysis/output/06_picklist_payload.json `
230+ - API route: ` GET /data/retrieve/analyzed `
231+
232+ ### 7.7 Legacy command compatibility
233+
234+ ` python export_csv.py ` now forwards to ` run_pipeline.py ` and uses the same config/flags.
235+
236+ ## 8) Fake Data Options
237+
238+ ### 8.1 Pipeline-native fake data (recommended)
239+
240+ Use stage 07 directly:
195241
196242``` powershell
197- $env:FAKE_MATCH_COUNT='80'
198- $env:FAKE_TEAM_COUNT='40'
199- $env:FAKE_SCOUTER_COUNT='16'
200- $env:FAKE_CLEAR='true'
201- $env:FAKE_INCLUDE_PIT='true'
202- $env:FAKE_INCLUDE_LEADERBOARD='true'
203- $env:FAKE_INCLUDE_AUTO_PATH='true'
204- npm run --workspace server gen-fake-data
243+ cd data-analysis
244+ .\venv\Scripts\Activate.ps1
245+ python 07_seed_fake_data.py
205246```
206247
207- ### 8.2 Static analysis JSON file
248+ Or via orchestrator:
208249
209- Writes ` server/static/output_analysis.json ` .
250+ ``` powershell
251+ python run_pipeline.py --source-mode fake --run-stage-07
252+ ```
210253
211- From repo root:
254+ ### 8.2 Legacy server fake scripts (optional / dev-only)
255+
256+ These still exist for server-side testing:
212257
213258``` powershell
259+ npm run --workspace server gen-fake-data
214260npm run --workspace server gen-fake-json
215261```
216262
@@ -281,7 +327,10 @@ Writes: `client/src/assets/matchSchedule.json`
281327
282328### 10.3 Generate team metadata/colors/avatars
283329
284- Requires ` server/static/output_analysis.json ` (generate with ` gen-fake-json ` or provide your own).
330+ Requires either:
331+
332+ - ` data-analysis/output/06_team_profiles.json ` (preferred; generated by pipeline stage 06), or
333+ - ` server/static/output_analysis.json ` (legacy fallback).
285334
286335``` powershell
287336npm run --workspace server gen-team-info
@@ -328,17 +377,22 @@ npm run start
328377# Dev run
329378npm run dev
330379
331- # Analysis
380+ # Migration
381+ npm run --workspace server migrate-match-schema
382+
383+ # Analysis pipeline (real data)
332384cd data-analysis
333385python -m venv venv
334386.\venv\Scripts\Activate.ps1
335387pip install -r requirements.txt
336- python export_csv .py
388+ python run_pipeline .py --source-mode mongo
337389
338- # Fake data
390+ # Fake data (pipeline-native)
391+ python run_pipeline.py --source-mode fake --run-stage-07
392+
393+ # Legacy fake data scripts (optional)
339394cd ..
340395npm run --workspace server gen-fake-data
341- npm run --workspace server gen-fake-json
342396
343397# Event utilities
344398npm run --workspace server download-teams
@@ -370,7 +424,3 @@ npm run build --workspace database
370424
371425- Ensure backend is running (` npm run start ` ) or Mongo container is up.
372426- Check mongo URL (` mongodb://localhost:27017/ ` ).
373-
374- ### ` sendExport ` script note
375-
376- - ` server/scripts/sendExport.ts ` currently connects to ` mongodb://0.0.0.0:27107/ ` (port typo vs ` 27017 ` ). Update that file before relying on this script.
0 commit comments