Alya Mutiara Firdausyi - Data Portfolio 🚀

👋 About Me

Hi there! I'm Alya, a Data Platform Engineer with a passion for Cloud Architecture and scalable data solutions. With a Master's degree in Computational Science from Institut Teknologi Bandung and hands-on experience at Multi-National Companies, I specialize in building robust, cloud-native data platforms that transform raw data into business intelligence.

My journey from Cloud Engineer to Data Platform Engineer has given me a unique perspective on designing enterprise-scale data architectures. I bridge the gap between infrastructure and analytics, building pipelines that process millions of records while maintaining reliability, security, and cost-efficiency.

Currently at LinkNet, I design and manage 20+ production Airflow DAGs, orchestrate multi-cloud data migrations (GCP ↔ AWS), and implement real-time CDC pipelines using Kafka and Debezium. I'm driven by the challenge of architecting data platforms that scale seamlessly and empower data-driven decision making.

🎯 Current Focus: Cloud Data Architecture | Multi-Cloud Platforms (GCP, AWS) | Real-Time Data Streaming | DataOps & MLOps

💡 Aspiration: Solution Architect specializing in Cloud Data Platforms and Enterprise Data Architecture

🛠️ Technical Skills

Data Engineering & Orchestration

Cloud Platforms & Data Services

Infrastructure & DevOps

Data Analysis & Visualization

Databases & Storage

Machine Learning & Analytics

📂 Featured Projects

🔷 Data Engineering Projects

1. Customer Segmentation & CLV Prediction Pipeline

Description: End-to-end automated ML pipeline for customer lifetime value prediction and segmentation using RFM analysis. Built with Medallion architecture on GCP.

Key Features:

Automated ETL pipeline with Astronomer Airflow
Data transformation with dbt (Bronze → Silver → Gold layers)
ML model training and deployment (Random Forest, XGBoost, SVM)
Data quality checks with Soda
Interactive dashboard with Looker Studio
FastAPI for model serving

Tech Stack: Python, BigQuery, Astronomer, dbt, Docker, Looker Studio, Vertex AI, FastAPI

Impact: Automated customer segmentation reducing manual analysis time by 80%, enabling data-driven marketing strategies.

2. Attendance ETL Pipeline

Description: Scalable ETL pipeline for university attendance data processing with Docker containerization.

Key Features:

Three-layer architecture (Staging → Warehouse → Mart)
Python-based Extract, Transform, Load modules
PostgreSQL database with normalized schema
Docker deployment for easy scalability
Automated weekly attendance reporting

Tech Stack: Python, PostgreSQL, Docker, pandas

Architecture: Bronze (Staging) → Silver (Data Warehouse) → Gold (Data Mart)

3. BigQuery Data Ingestion Pipeline

Description: Data ingestion pipeline from local PostgreSQL to BigQuery using Python transformation and Cloud SQL.

Key Features:

CSV to PostgreSQL data loading
Data transformation and enrichment with pandas
Cloud SQL integration
BigQuery data warehouse setup
Banking fraud detection dataset processing

Tech Stack: Python, PostgreSQL, Google Cloud SQL, BigQuery, pandas

Data Flow: Local CSV → PostgreSQL → Transformation → Cloud SQL → BigQuery

4. Stock Market Dashboard

Description: Interactive stock market analysis dashboard visualizing market trends and trading volumes.

Key Features:

Real-time stock price visualization
Trading volume analysis
Market trend indicators
Interactive filtering and drill-down capabilities
Performance metrics and comparisons

Tech Stack: Tableau, Python for data preprocessing

Insights: Enables quick identification of market patterns and trading opportunities through visual analytics.

5. CI/CD Pipeline for AWS Glue Jobs

Description: Production-ready CI/CD pipeline automating AWS Glue job deployment using Azure DevOps with multi-environment support (DEV/PRD).

Key Features:

Automated Change Detection: Intelligent git diff analysis to identify modified Glue jobs
Multi-Stage Deployment: Detect Changes → Sync to S3 → Deploy Glue Jobs
Queue Management: Persistent deployment queue handling failures and rollbacks
Configuration-Driven: YAML-based job configuration with environment variable substitution
Smart Updates: Compares existing Glue job configurations to avoid unnecessary updates
S3 Tables Integration: Automatically creates tables in S3 Tables catalog using Apache Iceberg

Pipeline Architecture:

Feature Branch → DEV Branch (Auto-Deploy to DEV) → PRD Branch (Auto-Deploy to PRD)

Tech Stack: AWS Glue, AWS S3, Azure DevOps Pipelines, Python, boto3, PyYAML, Apache Iceberg

Impact: Reduced manual deployment time from hours to minutes while ensuring consistent configurations across environments and zero-downtime deployments.

🔷 Data Analysis Projects

6. Online Rental Movies Analysis

Description: Comprehensive SQL analysis of online movie rental platform exploring customer behavior and movie popularity.

Key Insights:

Customer demographics and preferences analysis
Revenue patterns by period and category
Actor popularity impact on movie rentals
Subscription behavior analysis

Tech Stack: SQLite, Jupyter Notebook, SQL

📝 Read the detailed article: SQL Project: Analyzing Online Movie Rental

Professional Experience

Data Platform Engineer | LinkNet (PT Link Net Tbk) | Aug 2024 - Present

Leading Indonesian telecommunications provider - First Media & Link Net Fiber brands

🔹 Cloud Data Architecture & Multi-Cloud Migration

Orchestrated migration of 100+ tables from GCP to AWS, architecting robust data pipelines using AWS Glue, Lake Formation, and MWAA (Managed Workflows for Apache Airflow)
Optimized metadata governance and query performance, reducing data access latency by 40%

🔹 Data Pipeline Orchestration & Automation

Designed and managed 20+ production Airflow DAGs with SLA monitoring, ensuring 99.9% pipeline reliability for analytics and reporting
Automated complex ETL workflows on GCP using Python, Java, and shell scripts with Dataproc and Dataflow

🔹 Real-Time Data Streaming Architecture

Implemented CDC pipeline using Apache Kafka and Debezium to stream real-time data changes from SQL Server to downstream APIs
Enabled event-driven architecture, reducing data latency from hours to seconds

🔹 Multi-Source Data Integration

Built end-to-end ETL pipelines ingesting data from on-premise sources (MySQL, Oracle, PostgreSQL, SAP DB, SQL Server)
Maintained multi-layer data warehouse architecture in BigQuery for efficient analytics

🔹 Cross-Functional Collaboration

Partnered with data analysts and business teams for ad-hoc analysis, delivering actionable insights supporting data-driven strategies

Tech Stack: AWS (Glue, Lake Formation, MWAA), GCP (BigQuery, Dataproc, Dataflow), Apache Airflow, Apache Kafka, Debezium, Python, Java, SQL

Cloud Engineer | Xtremax Teknologi Indonesia | Mar 2019 - Oct 2022

Digital transformation company - Cloud solutions & migrations (Singapore HQ)

🔹 Cloud Infrastructure & Platform Management

Maintained and optimized 20+ internal AWS EC2 servers for Content Website Platform, utilizing Load Balancers and Auto Scaling Groups
Ensured seamless operations, proactive troubleshooting, and continuous infrastructure enhancements

🔹 Security & Compliance

Remediated 500+ vulnerability findings using Nexpose scanner, fortifying cloud security
Achieved 70% reduction in security incidents through systematic vulnerability management

🔹 DevOps & Multi-Platform Services

Empowered developer teams by streamlining installation, configuration, and troubleshooting for 5 CMS platforms (WordPress, SWIIIT, Sitecore, Sitefinity, SharePoint) across Linux and Windows environments

🔹 Documentation & Knowledge Management

Documented 100+ complex technical issues as RFC, Incident Reports, Build Docs, and Wiki documentation
Mentored 2 new team members while collaborating on deployment tasks

🔹 Cost Optimization

Reviewed and optimized AWS resource configurations, significantly reducing idle resource costs for CWP project

Tech Stack: AWS (EC2, ELB, Auto Scaling, S3, RDS), Linux, Windows Server, CMS platforms, Nexpose, Infrastructure as Code

🎓 Training & Workshops

Data Engineering Fellowship | IYKRA | Mar 2024 - Jul 2024

Intensive 4-month comprehensive data engineering program

🏆 Selected as 1 of 20 students awarded fully-funded scholarship for Data Fellowship Batch 12

🏆 Best Capstone Project - Led team of 5 in building customer segmentation model:

Project: "Building Customer Segmentation for Effective Personalized Marketing"
Built end-to-end ML pipeline with automated customer lifetime value prediction
Achieved 85% model accuracy using Random Forest and XGBoost algorithms

Key Technical Learning:

Developed production-grade data pipelines ingesting data from GCP data lakes
Performed transformations using Apache Airflow, dbt, Apache Kafka, Apache NiFi, BigQuery
Created data visualizations and dashboards with Looker and Tableau
Leveraged dbt for data transformation and modeling in Google BigQuery as data warehouse
Implemented Medallion architecture (Bronze → Silver → Gold layers)
Applied DataOps best practices including data quality testing with Soda

Skills Gained: Data Pipeline Design, Cloud Data Architecture, Real-Time Streaming, Data Modeling, Data Visualization, MLOps

📜 Certifications

Professional Certifications

AWS Certified Data Engineer - Associate
Issued: Aug 2025
Valid until: Aug 2028

AWS Certified Solutions Architect - Associate
Issued: 2022
Valid until: 2025

Microsoft Azure Fundamentals
Issued: 2023
Valid until: 2025

Additional Certifications

View all course completion certificates: certificate.md

🎓 Education

🎓 Master of Science in Computational Science
Institut Teknologi Bandung (ITB) | 2020 - 2023

🎓 Bachelor of Science in Physics
Institut Teknologi Bandung (ITB) | 2014 - 2019

📫 Let's Connect!

I'm always excited to collaborate on data projects, discuss new opportunities, or exchange ideas about data engineering and analytics!

📄 Download my resume: CV

Last updated: November 2025

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
asset		asset
project		project
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
certificate.md		certificate.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alya Mutiara Firdausyi - Data Portfolio 🚀

👋 About Me