Hi there! I'm Alya, a Data Platform Engineer with a passion for Cloud Architecture and scalable data solutions. With a Master's degree in Computational Science from Institut Teknologi Bandung and hands-on experience at Multi-National Companies, I specialize in building robust, cloud-native data platforms that transform raw data into business intelligence.
My journey from Cloud Engineer to Data Platform Engineer has given me a unique perspective on designing enterprise-scale data architectures. I bridge the gap between infrastructure and analytics, building pipelines that process millions of records while maintaining reliability, security, and cost-efficiency.
Currently at LinkNet, I design and manage 20+ production Airflow DAGs, orchestrate multi-cloud data migrations (GCP β AWS), and implement real-time CDC pipelines using Kafka and Debezium. I'm driven by the challenge of architecting data platforms that scale seamlessly and empower data-driven decision making.
π― Current Focus: Cloud Data Architecture | Multi-Cloud Platforms (GCP, AWS) | Real-Time Data Streaming | DataOps & MLOps
π‘ Aspiration: Solution Architect specializing in Cloud Data Platforms and Enterprise Data Architecture
Description: End-to-end automated ML pipeline for customer lifetime value prediction and segmentation using RFM analysis. Built with Medallion architecture on GCP.
Key Features:
- Automated ETL pipeline with Astronomer Airflow
- Data transformation with dbt (Bronze β Silver β Gold layers)
- ML model training and deployment (Random Forest, XGBoost, SVM)
- Data quality checks with Soda
- Interactive dashboard with Looker Studio
- FastAPI for model serving
Tech Stack: Python, BigQuery, Astronomer, dbt, Docker, Looker Studio, Vertex AI, FastAPI
Impact: Automated customer segmentation reducing manual analysis time by 80%, enabling data-driven marketing strategies.
Description: Scalable ETL pipeline for university attendance data processing with Docker containerization.
Key Features:
- Three-layer architecture (Staging β Warehouse β Mart)
- Python-based Extract, Transform, Load modules
- PostgreSQL database with normalized schema
- Docker deployment for easy scalability
- Automated weekly attendance reporting
Tech Stack: Python, PostgreSQL, Docker, pandas
Architecture: Bronze (Staging) β Silver (Data Warehouse) β Gold (Data Mart)
Description: Data ingestion pipeline from local PostgreSQL to BigQuery using Python transformation and Cloud SQL.
Key Features:
- CSV to PostgreSQL data loading
- Data transformation and enrichment with pandas
- Cloud SQL integration
- BigQuery data warehouse setup
- Banking fraud detection dataset processing
Tech Stack: Python, PostgreSQL, Google Cloud SQL, BigQuery, pandas
Data Flow: Local CSV β PostgreSQL β Transformation β Cloud SQL β BigQuery
Description: Interactive stock market analysis dashboard visualizing market trends and trading volumes.
Key Features:
- Real-time stock price visualization
- Trading volume analysis
- Market trend indicators
- Interactive filtering and drill-down capabilities
- Performance metrics and comparisons
Tech Stack: Tableau, Python for data preprocessing
Insights: Enables quick identification of market patterns and trading opportunities through visual analytics.
Description: Production-ready CI/CD pipeline automating AWS Glue job deployment using Azure DevOps with multi-environment support (DEV/PRD).
Key Features:
- Automated Change Detection: Intelligent git diff analysis to identify modified Glue jobs
- Multi-Stage Deployment: Detect Changes β Sync to S3 β Deploy Glue Jobs
- Queue Management: Persistent deployment queue handling failures and rollbacks
- Configuration-Driven: YAML-based job configuration with environment variable substitution
- Smart Updates: Compares existing Glue job configurations to avoid unnecessary updates
- S3 Tables Integration: Automatically creates tables in S3 Tables catalog using Apache Iceberg
Pipeline Architecture:
Feature Branch β DEV Branch (Auto-Deploy to DEV) β PRD Branch (Auto-Deploy to PRD)
Tech Stack: AWS Glue, AWS S3, Azure DevOps Pipelines, Python, boto3, PyYAML, Apache Iceberg
Impact: Reduced manual deployment time from hours to minutes while ensuring consistent configurations across environments and zero-downtime deployments.
Description: Comprehensive SQL analysis of online movie rental platform exploring customer behavior and movie popularity.
Key Insights:
- Customer demographics and preferences analysis
- Revenue patterns by period and category
- Actor popularity impact on movie rentals
- Subscription behavior analysis
Tech Stack: SQLite, Jupyter Notebook, SQL
π Read the detailed article: SQL Project: Analyzing Online Movie Rental
Leading Indonesian telecommunications provider - First Media & Link Net Fiber brands
πΉ Cloud Data Architecture & Multi-Cloud Migration
- Orchestrated migration of 100+ tables from GCP to AWS, architecting robust data pipelines using AWS Glue, Lake Formation, and MWAA (Managed Workflows for Apache Airflow)
- Optimized metadata governance and query performance, reducing data access latency by 40%
πΉ Data Pipeline Orchestration & Automation
- Designed and managed 20+ production Airflow DAGs with SLA monitoring, ensuring 99.9% pipeline reliability for analytics and reporting
- Automated complex ETL workflows on GCP using Python, Java, and shell scripts with Dataproc and Dataflow
πΉ Real-Time Data Streaming Architecture
- Implemented CDC pipeline using Apache Kafka and Debezium to stream real-time data changes from SQL Server to downstream APIs
- Enabled event-driven architecture, reducing data latency from hours to seconds
πΉ Multi-Source Data Integration
- Built end-to-end ETL pipelines ingesting data from on-premise sources (MySQL, Oracle, PostgreSQL, SAP DB, SQL Server)
- Maintained multi-layer data warehouse architecture in BigQuery for efficient analytics
πΉ Cross-Functional Collaboration
- Partnered with data analysts and business teams for ad-hoc analysis, delivering actionable insights supporting data-driven strategies
Tech Stack: AWS (Glue, Lake Formation, MWAA), GCP (BigQuery, Dataproc, Dataflow), Apache Airflow, Apache Kafka, Debezium, Python, Java, SQL
Digital transformation company - Cloud solutions & migrations (Singapore HQ)
πΉ Cloud Infrastructure & Platform Management
- Maintained and optimized 20+ internal AWS EC2 servers for Content Website Platform, utilizing Load Balancers and Auto Scaling Groups
- Ensured seamless operations, proactive troubleshooting, and continuous infrastructure enhancements
πΉ Security & Compliance
- Remediated 500+ vulnerability findings using Nexpose scanner, fortifying cloud security
- Achieved 70% reduction in security incidents through systematic vulnerability management
πΉ DevOps & Multi-Platform Services
- Empowered developer teams by streamlining installation, configuration, and troubleshooting for 5 CMS platforms (WordPress, SWIIIT, Sitecore, Sitefinity, SharePoint) across Linux and Windows environments
πΉ Documentation & Knowledge Management
- Documented 100+ complex technical issues as RFC, Incident Reports, Build Docs, and Wiki documentation
- Mentored 2 new team members while collaborating on deployment tasks
πΉ Cost Optimization
- Reviewed and optimized AWS resource configurations, significantly reducing idle resource costs for CWP project
Tech Stack: AWS (EC2, ELB, Auto Scaling, S3, RDS), Linux, Windows Server, CMS platforms, Nexpose, Infrastructure as Code
Intensive 4-month comprehensive data engineering program
π Selected as 1 of 20 students awarded fully-funded scholarship for Data Fellowship Batch 12
π Best Capstone Project - Led team of 5 in building customer segmentation model:
- Project: "Building Customer Segmentation for Effective Personalized Marketing"
- Built end-to-end ML pipeline with automated customer lifetime value prediction
- Achieved 85% model accuracy using Random Forest and XGBoost algorithms
Key Technical Learning:
- Developed production-grade data pipelines ingesting data from GCP data lakes
- Performed transformations using Apache Airflow, dbt, Apache Kafka, Apache NiFi, BigQuery
- Created data visualizations and dashboards with Looker and Tableau
- Leveraged dbt for data transformation and modeling in Google BigQuery as data warehouse
- Implemented Medallion architecture (Bronze β Silver β Gold layers)
- Applied DataOps best practices including data quality testing with Soda
Skills Gained: Data Pipeline Design, Cloud Data Architecture, Real-Time Streaming, Data Modeling, Data Visualization, MLOps
View all course completion certificates: certificate.md
π Master of Science in Computational Science
Institut Teknologi Bandung (ITB) | 2020 - 2023
π Bachelor of Science in Physics
Institut Teknologi Bandung (ITB) | 2014 - 2019
I'm always excited to collaborate on data projects, discuss new opportunities, or exchange ideas about data engineering and analytics!
π Download my resume: CV
Last updated: November 2025