Tugas Besar Pergudangan Data (SD25-31007) Program Studi Sains Data - Fakultas Sains Institut Teknologi Sumatera Tahun Ajaran 2024/2025
| NIM | Nama | Role | Kontribusi | |
|---|---|---|---|---|
| 123450093 | Syahrialdi Rachim Akbar (Aldi) | Project Lead & Database Designer | ERD, Schema Design, DDL Scripts | [email protected] |
| 123450026 | Zahra Putri Salsabilla | ETL Developer & Data Engineer | ETL Procedures, Data Quality | [email protected] |
| 123450039 | Feby Angelina | BI Developer & Documentation | Documentation, Mapping, Sample Data | [email protected] |
Data Mart Biro Akademik Umum (BAU) ITERA adalah solusi Business Intelligence yang dirancang untuk mendukung pengambilan keputusan berbasis data di Biro Akademik Umum ITERA. Project ini mengintegrasikan dan menganalisis data dari berbagai sistem sumber guna mendukung pengambilan keputusan operasional dan strategis.
- Mengintegrasikan data dari 6 sistem sumber (SIMASTER, Inventaris, SIMPEG, Layanan, Monitoring, Unit Organisasi)
- Menyediakan dimensional model (Star Schema) untuk analisis data yang efisien
- Membangun dashboard interaktif untuk monitoring KPI
- Implementasi ETL process yang robust dan scalable
- Mendukung proses bisnis utama BAU ITERA
Dimensi (7 tables):
dim.waktu- Time dimension (2020-2030)dim.pegawai- Employee dimension (SCD Type 2)dim.unit_organisasi- Organizational hierarchydim.jenis_surat- Document types & SLAdim.jenis_layanan- Service types & SLAdim.jenis_aset- Asset types & specificationsdim.lokasi- Location details
Fakta (3 tables):
fact.surat- Correspondence transactions (Grain: per surat)fact.layanan- Service requests & performance (Grain: per tiket)fact.aset- Asset inventory snapshots (Grain: per aset per bulan)
| Komponen | Teknologi |
|---|---|
| Database | PostgreSQL 14 / Azure SQL Database |
| ETL | Python (Pandas) & SQL Stored Procedures |
| Management Tools | pgAdmin & Azure Data Studio |
| BI Tools | Tableau Desktop (macOS compatible) |
| Cloud | Azure VM (Docker Container) |
| Version Control | Git & GitHub |
| Modeling Approach | Kimball Dimensional Modeling (Star Schema) |
graph TD
subgraph Sources
S1[SIMASTER]
S2[Inventaris]
S3[SIMPEG]
S4[Layanan]
end
subgraph PostgreSQL_Docker
STG[(Staging Area)]
ETL[Stored Procedures]
DW[(Data Warehouse)]
end
S1 -->|CSV Import| STG
S2 -->|CSV Import| STG
S3 -->|CSV Import| STG
S4 -->|CSV Import| STG
STG -->|Master ETL| ETL
ETL -->|Transform & Load| DW
TUBES_Pergudangan-Data_Kelompok-19/
βββ README.md # β File ini
βββ .gitignore
β
βββ etl/ # π ETL Components
β βββ sample_data/ # Sample CSV Data (400+ rows)
β β βββ stg_inventaris.csv
β β βββ stg_layanan.csv
β β βββ stg_simpeg.csv
β β βββ stg_simaster_surat.csv
β β βββ stg_unit_kerja.csv
β βββ scripts/ # Python Generators
β βββ generate_dummy_data.py
β
βββ docs/ # π Dokumentasi lengkap
β βββ 01-requirements/ # Misi 1 Documents
β βββ 02-design/ # Misi 1 & 2 Design Documents
β βββ 03-implementation/ # Misi 2 Technical Docs
β β βββ Data Quality Report.pdf # β Hasil Testing Misi 2
β β βββ Performance Test Results.pdf
β β βββ Technical Documentation.pdf
β βββ 04-deployment/ # Misi 3 Deployment Docs
β βββ 01_Production_Database_Credentials.md
β βββ 02_Deployment_Documentation.md
β βββ 03_Operations_Manual.md
β βββ Mission_3_Presentation.pptx
β
βββ sql/ # πΎ SQL Scripts (PostgreSQL)
β βββ 01_Create_Database.sql # Schema setup
β βββ 02_Create_Dimensions.sql # Dim tables + Seeding
β βββ 03_Create_Facts.sql # Fact tables
β βββ 04_Create_Indexes.sql # Optimization
β βββ 05_Create_Partitions.sql # Partitioning
β βββ 06_Create_Staging.sql # Validation views
β βββ 07_ETL_Procedures.sql # Main ETL Logic
β βββ 08_Data_Quality_Checks.sql # DQ Logic
β βββ 09_Test_Queries.sql # Performance tests
β βββ 10_Security.sql # RBAC
β βββ 11_Backup.sql # Backup ops
β βββ 12_Run_ETL_Pipeline.sql # β ONE-CLICK DEMO
β
βββ dashboards/ # π BI Dashboards
β βββ dashboard_kelompok_DW19.twb # Tableau Workbook
β
βββ tests/ # π§ͺ Testing Scripts
Ikuti panduan "Zero-Friction" ini untuk membangun dan menjalankan Data Mart secara otomatis di lingkungan lokal Anda.
- Pastikan PostgreSQL 14+ atau Docker sudah terinstall.
- Untuk Docker:
docker run --name datamart_bau -e POSTGRES_PASSWORD=password -p 5432:5432 -d postgres:14 - Buka PostgreSQL client (pgAdmin atau psql).
- Buat database baru bernama:
datamart_bau_itera. - Buka folder
sql/di repository ini. - Jalankan script SQL berikut secara berurutan:
01_Create_Database.sql(Membuat Schema & Tabel)02_Create_Dimensions.sql(Membuat Dimensi & Seeding Data Referensi)03_Create_Facts.sql(Membuat Fakta)04_Create_Indexes.sql(Optimasi Index)05_Create_Partitions.sql(Partisi Tabel Fakta)06_Create_Staging.sql(View Monitoring & Validasi)07_ETL_Procedures.sql(Mesin ETL Utama)08_Data_Quality_Checks.sql(Prosedur Validasi Kualitas)10_Security.sql(User & Roles)11_Backup.sql(Prosedur Backup)
Sistem membutuhkan data mentah agar bisa bekerja.
- Gunakan psql atau pgAdmin untuk import CSV files.
- Pilih file CSV dari folder
etl/sample_data/. - Biarkan nama tabel tujuan DEFAULT (sesuai nama file CSV).
- Contoh: File
stg_unit_kerja.csv-> Tablestg_unit_kerja
- Contoh: File
- Ulangi untuk ke-5 file CSV.
Catatan: Script ETL kami memiliki fitur "Smart Ingestion" yang otomatis mendeteksi tabel import tersebut.
Setelah data diimpor, jalankan script otomatisasi berikut:
- Buka file
sql/12_Run_ETL_Pipeline.sql. - Execute dengan psql atau pgAdmin.
Script ini akan secara otomatis:
- β Mereset status staging.
- β Menjalankan Master ETL Stored Procedure (Memindahkan data Staging -> DW).
- β Menampilkan jumlah baris data yang berhasil masuk.
- β Menjalankan Data Quality Checks dan menampilkan laporannya.
Untuk melihat bukti performa query:
- Buka file
sql/09_Test_Queries.sql. - Execute.
- Cek hasil query execution time.
- Business Requirements - Tujuan, scope, proses bisnis
- Data Sources - Inventory sistem sumber, volume, refresh rate
- KPI Definitions - Definisi KPI dan target
- ERD Diagram - Entity Relationship Diagram
- Dimensional Model - Star Schema visualization
- Data Dictionary - Definisi kolom, tipe data, constraints
- Bus Matrix - Dimensi vs Fact mapping
- Source-to-Target Mapping - Field-level mapping
- ETL Strategy - Load strategy, SCD policy, logging
- π Technical Documentation Misi 2 β NEW
- π ETL Process Flow β NEW
- π¨ ETL Architecture Diagram β NEW
- π ETL Mapping Spreadsheet β NEW
- ETL Documentation - Detailed ETL procedures
- Sample Data (400+ rows) β NEW
- Test Results
- π Production Database Credentials β NEW
- π Deployment Documentation β NEW
- βοΈ Operations Manual β NEW
- π Tableau Dashboard β NEW
- π― Mission 3 Presentation β NEW
- Automated data validation
- Referential integrity checks
- Business rule validation
- Completeness & consistency checks
- Comprehensive error logging via
etl_logschema - Data quality metrics tracking
- Overall quality score: 94.2%
- Optimized indexing strategy (B-tree, composite indexes)
- 42 performance indexes deployed
- Partitioning for large tables
- Materialized views for reporting
- Query optimization (<1ms response time)
- Incremental ETL loads
- SCD Type 2 for slowly changing dimensions
- ETL execution logging
- Data quality metrics dashboard
- Performance dashboards
- Error tracking & alerting
- Audit trails
- Row count validation
- Role-Based Access Control (RBAC)
- Row-Level Security (RLS)
- Data masking for sensitive fields (PII)
- Encrypted connections (SSL/TLS)
- Audit logging for all modifications
- Backup automation
| Metric | Value |
|---|---|
| Source Systems | 6 databases (SIMASTER, Inventaris, SIMPEG, Layanan, Monitoring, Unit Org) |
| Schemas | 8 (stg, dim, fact, etl, etl_log, dw, analytics, reports) |
| Dimension Tables | 7 tables |
| Fact Tables | 3 tables |
| Performance Indexes | 42 indexes |
| ETL Procedures | 6 procedures |
| Analytical Views | 5 views |
| Sample Data Records | 400+ rows |
| ETL Mappings | 83+ field-level mappings |
| SQL Scripts | 20+ files |
| Documentation | 70+ KB markdown |
| Test Coverage | Unit + Integration + Data Quality tests |
| Time Dimension Range | 2020-2030 (10 years) |
| Data Quality Score | 94.2% |
| Query Response Time | <1ms |
- Create feature branch:
git checkout -b feature/nama-fitur - Commit changes:
git commit -m "Add: deskripsi fitur" - Push to branch:
git push origin feature/nama-fitur - Create Pull Request
- Code review & merge
Add: Menambahkan fitur baru
Fix: Memperbaiki bug
Update: Memperbarui fitur existing
Docs: Memperbarui dokumentasi
Test: Menambahkan atau memperbaiki test
Refactor: Refactoring code tanpa mengubah fungsionalitas
Style: Perubahan formatting (whitespace, indentation)
| Misi | Periode | Status | Deliverables |
|---|---|---|---|
| Misi 1 | Week 1-4 | β Complete | Business Requirements, Data Sources, ERD, Dimensional Model, Data Dictionary, Bus Matrix, ETL Strategy, Database Bootstrap |
| Misi 2 | Week 5-8 | β Complete | DDL Scripts, ETL Procedures, Indexes, Sample Data (400 rows), Technical Documentation, ETL Mapping, Testing |
| Misi 3 | Week 9-12 | β Complete | Tableau Dashboard, Production Deployment, Operations Manual, Documentation, Final Presentation |
- β Business Requirements Document
- β Data Sources Inventory
- β ERD (Star Schema)
- β Dimensional Model
- β Bus Matrix
- β Data Dictionary
- β Source-to-Target Mapping
- β ETL Strategy
- β Database Bootstrap (PostgreSQL)
- β Create Database Script (idempotent)
- β Create Dimensions Tables
- β Create Facts Tables
- β Create Staging Tables
- β Create Indexes & Constraints
- β ETL Stored Procedures
- β Sample Data (400+ rows)
- β Technical Documentation
- β ETL Mapping Spreadsheet
- β Unit & Integration Tests
- β Tableau BI Dashboard
- β Production Deployment to Azure VM
- β Production Database Credentials & Security
- β Deployment Documentation
- β Operations Manual
- β Final Presentation (19 slides)
Database & Infrastructure
- β PostgreSQL 14 deployed in Docker on Azure VM
- β 8 schemas created with 30+ tables
- β 42 performance indexes deployed
- β 6 ETL stored procedures operational
- β 5 analytical views created
- β Audit trail and logging infrastructure
Security & Access Control
- β Role-Based Access Control (RBAC) implemented
- β 3 user roles with distinct permissions
- β Password-based authentication configured
- β Encrypted audit trail enabled
Business Intelligence
- β Tableau Dashboard File (dashboard_kelompok_DW19.twb)
- β Developed on macOS (Tableau Desktop 2025.2 compatible)
- β Ready for Tableau Server/Public publishing
Data Quality & Operations
- β Overall quality score: 94.2%
- β Automated validation procedures
- β Daily startup checklist documented
- β Monitoring & alerts framework
- β Backup & recovery procedures established
- β Troubleshooting guide completed
Infrastructure Details:
- Host: Azure Virtual Machine (104.43.93.28:5432)
- Engine: PostgreSQL 14.19
- Deployment: Docker Container
- Storage: Docker named volume with daily backups
- Database: datamart_bau_itera
Schemas Deployed:
stg- Staging areadim- Dimension tablesfact- Fact tablesetl- ETL processesetl_log- Logging & auditdw- Data warehouseanalytics- Analytical viewsreports- Reporting views
| Metric | Value | Status |
|---|---|---|
| Query Response Time | <1ms | β Excellent |
| Index Coverage | 42 indexes | β Complete |
| Data Quality Score | 94.2% | β Good |
| Database Size | 50MB (schema) | β Optimal |
| Connection Pool | Stable | β Healthy |
| Uptime Target | 99.5% | β Achievable |
Documentation Files (Markdown):
01_Production_Database_Credentials.md- Database access, user accounts, security02_Deployment_Documentation.md- Complete deployment guide (~8,000 words)03_Operations_Manual.md- Day-to-day procedures (~7,000 words)
BI & Presentation Files:
dashboard_kelompok_DW19.twb- Tableau workbook (313 KB)Mission_3_Presentation.pptx- Professional presentation (19 slides, 5.2 MB)
Total Package Size: ~5.4 MB
Connection Command:
psql -h 104.43.93.28 -U datamart_user -d datamart_bau_iteraDefault User Accounts:
| User | Password | Role |
|---|---|---|
| datamart_user | Kelompok19@2025! | Application User |
| user_bi | BiPassItera2025! | BI User |
| user_etl | EtlPassItera2025! | ETL Admin |
| postgres | Kelompok19@2025! | Postgres Admin |
Before Using:
β οΈ All passwords in documentation are examplesβ οΈ Change passwords in production environmentβ οΈ Restrict database access via firewallβ οΈ Enable SSL/TLS for remote connectionsβ οΈ Configure automated backups on deployment
Known Limitations:
- βΉοΈ Fact tables empty (awaiting source data)
- βΉοΈ Dashboard in development mode
- βΉοΈ Historical data not yet loaded
- βΉοΈ ETL scheduling not automated
- βΉοΈ Mobile interfaces not yet implemented
Future Enhancements:
- π Automated ETL job scheduling
- π Real-time data streaming capability
- π Advanced analytics and ML models
- π Mobile dashboard versions
- π API exposure for third-party integration
For Database Administrators:
β Read: 02_Deployment_Documentation.md
- Complete deployment process
- Architecture overview
- SQL script execution details
- Performance testing results
- Troubleshooting guide
For Operations Team:
β Read: 03_Operations_Manual.md
- Daily startup procedures
- ETL pipeline execution
- Monitoring & alerting
- Backup & recovery
- User management
- Common issues & solutions
For Business Users:
β View: Mission_3_Presentation.pptx
- Executive summary
- Architecture overview
- Results & achievements
- Next steps & roadmap
For Security Review:
β Read: 01_Production_Database_Credentials.md
- User accounts & roles
- Access control matrix
- Security considerations
- Compliance & audit trail
All components verified and operational:
- β Database connectivity (localhost & remote)
- β Schema creation (8 schemas, 30+ tables)
- β Index creation (42 performance indexes)
- β ETL procedures (6 procedures created)
- β Analytical views (5 views operational)
- β User access (3 roles configured)
- β Security controls (RBAC implemented)
- β Audit logging (Trail enabled)
β
Database deployed to production
β
All schemas and tables created
β
ETL processes implemented
β
Analytical views available
β
Security and access control configured
β
Documentation completed
β
Backup procedures established
β
Dashboard framework ready
β
Team coordination successful
β
Professional quality deliverables
[Nama Dosen]
Email: [[email protected]]
Syahrialdi Rachim Akbar (Aldi) - Project Lead & Database Designer
π§ [email protected]
Zahra Putri Salsabilla - ETL Developer & Data Engineer
π§ [email protected]
Feby Angelina - BI Developer & Documentation
π§ [email protected]
- Dosen Pengampu: [Nama Dosen] - Mata Kuliah Pergudangan Data (SD25-31007)
- Asisten Praktikum: [Nama Asisten]
- Institut Teknologi Sumatera - Program Studi Sains Data
- Biro Akademik Umum ITERA - Domain knowledge & business requirements
- Kimball Group - Dimensional modeling methodology
Project ini dikembangkan untuk keperluan akademik mata kuliah Pergudangan Data (SD25-31007) - Program Studi Sains Data, Fakultas Sains, Institut Teknologi Sumatera.
Β© 2025 Tim Kelompok 19 - Data Mart BAU ITERA. All rights reserved.
Last Updated: December 1, 2025
Version: 3.0 (Misi 3 Complete - All Deliverables Ready)
Status: β
READY FOR SUBMISSION
"Turning raw data into actionable insights through collaboration, modeling, and analytics."
β Tim Kelompok 19, Data Mart BAU ITERA
- π Full Documentation
- π¨ ETL Architecture Diagram
- π Sample Data
- π§ͺ Test Results
- π Misi 3 Deployment Docs
- π Tableau Dashboard
- π Report Issues
π Star this repo if you find it useful!
