@@ -57,23 +57,58 @@ graph TB
5757** Data Flow** : Raw data (FB) → Prepared features (FB) → Trained models (FA) → Real-time predictions → Alerts
5858
5959** Storage Strategy** :
60- - ** FA (FlashArray X70R3)** : Low-latency model serving and storage
61- - ** FB (FlashBlade S200)** : High-throughput parallel I/O for data processing
62- - ** S3** : Model versioning and raw data archival
60+
61+ This architecture leverages Pure Storage's dual-protocol approach with optimized storage placement for different workload characteristics:
62+
63+ ### File Storage (NFS Mounts)
64+ - ** FA** : ` /root/ebiser/nvidia.financial.fraud.detection `
65+ - ** Protocol** : NFS file mount
66+ - ** Optimized for** : Low-latency random I/O (<1ms read latency)
67+ - ** Use case** : Real-time model serving where inference requests require immediate model access
68+ - ** Pods** : Pod 3 (writes models), Pod 4 (reads models for serving)
69+
70+ - ** FB** : ` /mnt/fsaai-shared/ebiser `
71+ - ** Protocol** : NFS file mount
72+ - ** Optimized for** : High-throughput parallel I/O (>5GB/s)
73+ - ** Use case** : Bulk data processing where multiple GPU workers read/write large datasets simultaneously
74+ - ** Pods** : Pod 1 (writes raw data), Pod 2 (reads/writes features), Pod 3 (reads training data)
75+
76+ ### Object Storage (S3 Protocol)
77+ - ** FB S3 Endpoint** : ` s3://fraud-detection-bucket `
78+ - ** Protocol** : S3 API on FlashBlade
79+ - ** Optimized for** : Archival, versioning, and cross-region access
80+ - ** Use case** : Long-term storage of raw data archives and model versions for compliance and rollback
81+ - ** Pods** : Pod 1 (archives raw data), Pod 3 (versions trained models)
82+
83+ ** Mount Configuration** :
84+ ``` bash
85+ # FlashArray (FA) - Low Latency NFS Mount
86+ mount -t nfs fa-array.example.com:/volume/fraud-models \
87+ ~ /ebiser/nvidia.financial.fraud.detection
88+
89+ # FlashBlade (FB) - High Throughput NFS Mount
90+ mount -t nfs fb-array.example.com:/export/fraud-data \
91+ /mnt/fsaai-shared/ebiser
92+
93+ # FlashBlade S3 - Configure endpoint in .env
94+ S3_ENDPOINT=https://fb-array.example.com
95+ ```
96+
97+ This separation ensures that high-throughput ETL operations (data prep, feature engineering) don't interfere with latency-sensitive inference serving, while S3 provides durable archival storage.
6398
6499---
65100
66- ## Technology Stack
101+ ## Infrastructure and Technology Stack
67102
68103- ** GPUs** : 2x NVIDIA L40S (48GB each)
69104- ** Data Processing** : RAPIDS (cuDF, cuGraph)
70105- ** ML Training** : cuXGBoost, PyTorch
71106- ** Inference** : NVIDIA Triton Inference Server
72107- ** Orchestration** : Docker Compose
73108- ** Storage** :
74- - ** FA** : Low-latency
75- - ** FB** : file + S3 protocol
76- - ** S3** : Object storage for archival and versioning
109+ - ** FA (FlashArray X70R3) ** : Low-latency file storage
110+ - ** FB (FlashBlade S200) ** : Parallel I/O, file + S3 protocol
111+ - ** S3** : Object storage for archival and versioning
77112
78113---
79114
@@ -85,9 +120,8 @@ graph TB
85120# Required Hardware
86121- 2x NVIDIA L40S GPUs (48GB each)
87122- 1024 GB RAM (512 GB per CPU)
88- - Pure Storage FlashArray X70R3 (FA)
89- - Pure Storage FlashBlade S200 (FB)
90- - 4x 25Gb Cisco VIC NICs
123+ - Pure Storage FlashArray (FA)
124+ - Pure Storage FlashBlade (FB)
91125
92126# Required Software
93127- Ubuntu 22.04.5 LTS
@@ -222,6 +256,65 @@ curl -X POST http://localhost:8000/v2/models/fraud_xgboost/infer \
222256
223257---
224258
259+ ## Docker Compose Configuration
260+
261+ ``` yaml
262+ version : ' 3.8'
263+
264+ services :
265+ data-gather :
266+ build : ./pods/1-data-gather
267+ volumes :
268+ - ./data:/data
269+
270+ data-prep :
271+ build : ./pods/2-data-prep
272+ volumes :
273+ - ./data:/data
274+ deploy :
275+ resources :
276+ reservations :
277+ devices :
278+ - driver : nvidia
279+ count : 2
280+ capabilities : [gpu]
281+
282+ model-build :
283+ build : ./pods/3-model-build
284+ volumes :
285+ - ./data:/data
286+ deploy :
287+ resources :
288+ reservations :
289+ devices :
290+ - driver : nvidia
291+ count : 2
292+ capabilities : [gpu]
293+
294+ inference :
295+ build : ./pods/4-inference
296+ ports :
297+ - " 8000:8000" # HTTP
298+ - " 8001:8001" # gRPC
299+ - " 8002:8002" # Metrics
300+ volumes :
301+ - ./data:/data
302+ deploy :
303+ resources :
304+ reservations :
305+ devices :
306+ - driver : nvidia
307+ count : 2
308+ capabilities : [gpu]
309+
310+ notification :
311+ build : ./pods/5-notification
312+ ports :
313+ - " 5000:5000"
314+ ` ` `
315+
316+ ---
317+
225318## Data Flow
226319
227320### Storage Paths
@@ -263,7 +356,7 @@ s3://fraud-detection-bucket/
263356
2643571 . ** Pod 1** generates synthetic transactions → ** FB** ` /raw_data/ ` + ** S3** archive
2653582 . ** Pod 2** reads from ** FB** , processes with RAPIDS → ** FB** ` /prep_output/ `
266- 3 . ** Pod 3** reads features from ** FB** , trains models → ** FA** ` /model_repository/ ` + ** S3** versions
359+ 3 . ** Pod 3** reads features from ** FB** , trains models → ** FA** ` /model_repository/ ` + ** FB ** ** S3** versions
2673604 . ** Pod 4** loads models from ** FA** , serves predictions via Triton
2683615 . ** Pod 5** receives alerts from Pod 4 when fraud detected
269362
@@ -365,30 +458,16 @@ docker-compose down
365458
366459---
367460
368- ## 📞 Contact(s)
369-
370- ** Project Maintainers** : Emir Biser and Ed Hsu - your friendly AAI FSAs
371-
372- - 📧 Email: ebiser@purestorage.com and ehsu@purestorage.com
461+ ## License
373462
374- ** Repository ** : [ https://github.com/yourusername/nvidia-fraud-detection-pipeline ] ( https://github.com/yourusername/nvidia-fraud-detection-pipeline )
463+ Apache License 2.0 - see [ LICENSE ] ( LICENSE ) file
375464
376465---
377466
378- ## 🎯 Roadmap
467+ ## Contact
379468
380- - [ ] Add streaming data ingestion support (Kafka integration)
381- - [ ] Implement A/B testing for model versions
382- - [ ] Add automated model retraining pipeline
383- - [ ] Integrate with MLflow for experiment tracking
384- - [ ] Support for additional GPU architectures (A100, H100)
385- - [ ] Add comprehensive benchmark suite
386- - [ ] Develop web-based monitoring dashboard
469+ ** Repository** : [ https://github.com/yourusername/nvidia-fraud-detection-pipeline ] ( https://github.com/yourusername/nvidia-fraud-detection-pipeline )
387470
388471---
389472
390- ## License
391-
392- Apache License 2.0 - see [ LICENSE] ( LICENSE ) file
393-
394- ---
473+ ** Built for High-Performance Fraud Detection with Docker & NVIDIA L40S GPUs**
0 commit comments