Skip to content

alyamutiara/Data-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Alya Mutiara Firdausyi - Data Portfolio πŸš€

LinkedIn Medium Email

πŸ‘‹ About Me

Hi there! I'm Alya, a Data Platform Engineer with a passion for Cloud Architecture and scalable data solutions. With a Master's degree in Computational Science from Institut Teknologi Bandung and hands-on experience at Multi-National Companies, I specialize in building robust, cloud-native data platforms that transform raw data into business intelligence.

My journey from Cloud Engineer to Data Platform Engineer has given me a unique perspective on designing enterprise-scale data architectures. I bridge the gap between infrastructure and analytics, building pipelines that process millions of records while maintaining reliability, security, and cost-efficiency.

Currently at LinkNet, I design and manage 20+ production Airflow DAGs, orchestrate multi-cloud data migrations (GCP ↔ AWS), and implement real-time CDC pipelines using Kafka and Debezium. I'm driven by the challenge of architecting data platforms that scale seamlessly and empower data-driven decision making.

🎯 Current Focus: Cloud Data Architecture | Multi-Cloud Platforms (GCP, AWS) | Real-Time Data Streaming | DataOps & MLOps

πŸ’‘ Aspiration: Solution Architect specializing in Cloud Data Platforms and Enterprise Data Architecture


πŸ› οΈ Technical Skills

Data Engineering & Orchestration

Python SQL Apache Airflow dbt Apache Kafka Apache NiFi

Cloud Platforms & Data Services

AWS GCP Azure BigQuery AWS Glue Dataproc Dataflow

Infrastructure & DevOps

Docker Kubernetes Terraform Linux

Data Analysis & Visualization

Tableau Looker Excel Power Query

Databases & Storage

PostgreSQL MySQL Oracle SQL Server SAP

Machine Learning & Analytics

scikit-learn Pandas NumPy Java


πŸ“‚ Featured Projects

πŸ”· Data Engineering Projects

Python BigQuery Airflow dbt ML

Description: End-to-end automated ML pipeline for customer lifetime value prediction and segmentation using RFM analysis. Built with Medallion architecture on GCP.

Key Features:

  • Automated ETL pipeline with Astronomer Airflow
  • Data transformation with dbt (Bronze β†’ Silver β†’ Gold layers)
  • ML model training and deployment (Random Forest, XGBoost, SVM)
  • Data quality checks with Soda
  • Interactive dashboard with Looker Studio
  • FastAPI for model serving

Tech Stack: Python, BigQuery, Astronomer, dbt, Docker, Looker Studio, Vertex AI, FastAPI

Impact: Automated customer segmentation reducing manual analysis time by 80%, enabling data-driven marketing strategies.


Python PostgreSQL Docker

Description: Scalable ETL pipeline for university attendance data processing with Docker containerization.

Key Features:

  • Three-layer architecture (Staging β†’ Warehouse β†’ Mart)
  • Python-based Extract, Transform, Load modules
  • PostgreSQL database with normalized schema
  • Docker deployment for easy scalability
  • Automated weekly attendance reporting

Tech Stack: Python, PostgreSQL, Docker, pandas

Architecture: Bronze (Staging) β†’ Silver (Data Warehouse) β†’ Gold (Data Mart)


Python BigQuery PostgreSQL GCP

Description: Data ingestion pipeline from local PostgreSQL to BigQuery using Python transformation and Cloud SQL.

Key Features:

  • CSV to PostgreSQL data loading
  • Data transformation and enrichment with pandas
  • Cloud SQL integration
  • BigQuery data warehouse setup
  • Banking fraud detection dataset processing

Tech Stack: Python, PostgreSQL, Google Cloud SQL, BigQuery, pandas

Data Flow: Local CSV β†’ PostgreSQL β†’ Transformation β†’ Cloud SQL β†’ BigQuery


Tableau Python Analytics

Description: Interactive stock market analysis dashboard visualizing market trends and trading volumes.

Key Features:

  • Real-time stock price visualization
  • Trading volume analysis
  • Market trend indicators
  • Interactive filtering and drill-down capabilities
  • Performance metrics and comparisons

Tech Stack: Tableau, Python for data preprocessing

Insights: Enables quick identification of market patterns and trading opportunities through visual analytics.


AWS Glue Azure DevOps Python S3

Description: Production-ready CI/CD pipeline automating AWS Glue job deployment using Azure DevOps with multi-environment support (DEV/PRD).

Key Features:

  • Automated Change Detection: Intelligent git diff analysis to identify modified Glue jobs
  • Multi-Stage Deployment: Detect Changes β†’ Sync to S3 β†’ Deploy Glue Jobs
  • Queue Management: Persistent deployment queue handling failures and rollbacks
  • Configuration-Driven: YAML-based job configuration with environment variable substitution
  • Smart Updates: Compares existing Glue job configurations to avoid unnecessary updates
  • S3 Tables Integration: Automatically creates tables in S3 Tables catalog using Apache Iceberg

Pipeline Architecture:

Feature Branch β†’ DEV Branch (Auto-Deploy to DEV) β†’ PRD Branch (Auto-Deploy to PRD)

Tech Stack: AWS Glue, AWS S3, Azure DevOps Pipelines, Python, boto3, PyYAML, Apache Iceberg

Impact: Reduced manual deployment time from hours to minutes while ensuring consistent configurations across environments and zero-downtime deployments.


πŸ”· Data Analysis Projects

SQL Jupyter

Description: Comprehensive SQL analysis of online movie rental platform exploring customer behavior and movie popularity.

Key Insights:

  • Customer demographics and preferences analysis
  • Revenue patterns by period and category
  • Actor popularity impact on movie rentals
  • Subscription behavior analysis

Tech Stack: SQLite, Jupyter Notebook, SQL

πŸ“ Read the detailed article: SQL Project: Analyzing Online Movie Rental


Professional Experience

Data Platform Engineer | LinkNet (PT Link Net Tbk) | Aug 2024 - Present

Leading Indonesian telecommunications provider - First Media & Link Net Fiber brands

πŸ”Ή Cloud Data Architecture & Multi-Cloud Migration

  • Orchestrated migration of 100+ tables from GCP to AWS, architecting robust data pipelines using AWS Glue, Lake Formation, and MWAA (Managed Workflows for Apache Airflow)
  • Optimized metadata governance and query performance, reducing data access latency by 40%

πŸ”Ή Data Pipeline Orchestration & Automation

  • Designed and managed 20+ production Airflow DAGs with SLA monitoring, ensuring 99.9% pipeline reliability for analytics and reporting
  • Automated complex ETL workflows on GCP using Python, Java, and shell scripts with Dataproc and Dataflow

πŸ”Ή Real-Time Data Streaming Architecture

  • Implemented CDC pipeline using Apache Kafka and Debezium to stream real-time data changes from SQL Server to downstream APIs
  • Enabled event-driven architecture, reducing data latency from hours to seconds

πŸ”Ή Multi-Source Data Integration

  • Built end-to-end ETL pipelines ingesting data from on-premise sources (MySQL, Oracle, PostgreSQL, SAP DB, SQL Server)
  • Maintained multi-layer data warehouse architecture in BigQuery for efficient analytics

πŸ”Ή Cross-Functional Collaboration

  • Partnered with data analysts and business teams for ad-hoc analysis, delivering actionable insights supporting data-driven strategies

Tech Stack: AWS (Glue, Lake Formation, MWAA), GCP (BigQuery, Dataproc, Dataflow), Apache Airflow, Apache Kafka, Debezium, Python, Java, SQL


Cloud Engineer | Xtremax Teknologi Indonesia | Mar 2019 - Oct 2022

Digital transformation company - Cloud solutions & migrations (Singapore HQ)

πŸ”Ή Cloud Infrastructure & Platform Management

  • Maintained and optimized 20+ internal AWS EC2 servers for Content Website Platform, utilizing Load Balancers and Auto Scaling Groups
  • Ensured seamless operations, proactive troubleshooting, and continuous infrastructure enhancements

πŸ”Ή Security & Compliance

  • Remediated 500+ vulnerability findings using Nexpose scanner, fortifying cloud security
  • Achieved 70% reduction in security incidents through systematic vulnerability management

πŸ”Ή DevOps & Multi-Platform Services

  • Empowered developer teams by streamlining installation, configuration, and troubleshooting for 5 CMS platforms (WordPress, SWIIIT, Sitecore, Sitefinity, SharePoint) across Linux and Windows environments

πŸ”Ή Documentation & Knowledge Management

  • Documented 100+ complex technical issues as RFC, Incident Reports, Build Docs, and Wiki documentation
  • Mentored 2 new team members while collaborating on deployment tasks

πŸ”Ή Cost Optimization

  • Reviewed and optimized AWS resource configurations, significantly reducing idle resource costs for CWP project

Tech Stack: AWS (EC2, ELB, Auto Scaling, S3, RDS), Linux, Windows Server, CMS platforms, Nexpose, Infrastructure as Code


πŸŽ“ Training & Workshops

Data Engineering Fellowship | IYKRA | Mar 2024 - Jul 2024

Intensive 4-month comprehensive data engineering program

πŸ† Selected as 1 of 20 students awarded fully-funded scholarship for Data Fellowship Batch 12

πŸ† Best Capstone Project - Led team of 5 in building customer segmentation model:

  • Project: "Building Customer Segmentation for Effective Personalized Marketing"
  • Built end-to-end ML pipeline with automated customer lifetime value prediction
  • Achieved 85% model accuracy using Random Forest and XGBoost algorithms

Key Technical Learning:

  • Developed production-grade data pipelines ingesting data from GCP data lakes
  • Performed transformations using Apache Airflow, dbt, Apache Kafka, Apache NiFi, BigQuery
  • Created data visualizations and dashboards with Looker and Tableau
  • Leveraged dbt for data transformation and modeling in Google BigQuery as data warehouse
  • Implemented Medallion architecture (Bronze β†’ Silver β†’ Gold layers)
  • Applied DataOps best practices including data quality testing with Soda

Skills Gained: Data Pipeline Design, Cloud Data Architecture, Real-Time Streaming, Data Modeling, Data Visualization, MLOps


πŸ“œ Certifications

Professional Certifications


AWS Certified Data Engineer - Associate
Issued: Aug 2025
Valid until: Aug 2028

AWS Certified Solutions Architect - Associate
Issued: 2022
Valid until: 2025

Microsoft Azure Fundamentals
Issued: 2023
Valid until: 2025

Additional Certifications

View all course completion certificates: certificate.md


πŸŽ“ Education

πŸŽ“ Master of Science in Computational Science
Institut Teknologi Bandung (ITB) | 2020 - 2023

πŸŽ“ Bachelor of Science in Physics
Institut Teknologi Bandung (ITB) | 2014 - 2019


πŸ“« Let's Connect!

I'm always excited to collaborate on data projects, discuss new opportunities, or exchange ideas about data engineering and analytics!

LinkedIn Medium Email GitHub

πŸ“„ Download my resume: CV


Last updated: November 2025

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published