Skip to content

Latest commit

Β 

History

History
756 lines (576 loc) Β· 23.6 KB

File metadata and controls

756 lines (576 loc) Β· 23.6 KB

Lab 1-1: MLOps ν™˜κ²½ ꡬ좕

πŸ“‹ μ‹€μŠ΅ κ°œμš”

ν•­λͺ© λ‚΄μš©
μ†Œμš”μ‹œκ°„ 65λΆ„
λ‚œμ΄λ„ ⭐⭐ (μ΄ˆμ€‘κΈ‰)
λͺ©ν‘œ AWS EKS 기반 MLOps ν”Œλž«νΌ ν™˜κ²½ 확인 및 Tenant 검증
λŒ€μƒ μˆ˜κ°•μƒ 15λͺ… (user01~user15) + 강사 1λͺ… (user20)

🎯 ν•™μŠ΅ λͺ©ν‘œ

이 μ‹€μŠ΅μ„ 톡해 λ‹€μŒμ„ ν•™μŠ΅ν•©λ‹ˆλ‹€:

  • AWS EKS ν΄λŸ¬μŠ€ν„° μ—°κ²° 및 μƒνƒœ 확인
  • Kubeflow Tenant μ„€μ • 확인 (Profile, Namespace, ServiceAccount)
  • MLflow Tracking Server μ—°κ²° 및 PodDefault 확인
  • AWS S3 & ECR μŠ€ν† λ¦¬μ§€ ꡬ성 확인
  • MLOps ν”Œλž«νΌ 전체 μ•„ν‚€ν…μ²˜ 이해

πŸ—οΈ μ‹€μŠ΅ ꡬ쑰

Lab 1-1: MLOps ν™˜κ²½ ꡬ좕 (65λΆ„)
β”‚
β”œβ”€β”€ 사전 μ€€λΉ„ (10λΆ„)
β”‚   β”œβ”€β”€ ν•„μˆ˜ 도ꡬ μ„€μΉ˜ 확인
β”‚   β”œβ”€β”€ ν™˜κ²½ λ³€μˆ˜ μ„€μ •
β”‚   β”œβ”€β”€ AWS 자격 증λͺ… μ„€μ •
β”‚   └── EKS ν΄λŸ¬μŠ€ν„° μ—°κ²°
β”‚
β”œβ”€β”€ Part 1: Kubeflow Tenant 검증 (20λΆ„)
β”‚   β”œβ”€β”€ Namespace 쑴재 확인
β”‚   β”œβ”€β”€ Profile 및 Owner Email 확인
β”‚   β”œβ”€β”€ ServiceAccount 확인
β”‚   β”œβ”€β”€ ResourceQuota 확인
β”‚   └── κΆŒν•œ 격리 ν…ŒμŠ€νŠΈ
β”‚
β”œβ”€β”€ Part 2: MLflow ν™˜κ²½ 검증 (20λΆ„)
β”‚   β”œβ”€β”€ MLflow Server μƒνƒœ 확인
β”‚   β”œβ”€β”€ PostgreSQL λ°±μ—”λ“œ 확인
β”‚   β”œβ”€β”€ PodDefault μ„€μ • 확인
β”‚   └── MLflow UI 포트 ν¬μ›Œλ”© ν…ŒμŠ€νŠΈ
β”‚
└── Part 3: AWS μŠ€ν† λ¦¬μ§€ 확인 (15λΆ„)
    β”œβ”€β”€ S3 버킷 확인
    β”œβ”€β”€ ECR λ ˆμ§€μŠ€νŠΈλ¦¬ 확인
    β”œβ”€β”€ MLflow Artifacts 폴더 확인
    └── 전체 μ•„ν‚€ν…μ²˜ 이해

πŸ“ 파일 ꡬ쑰

lab1-1_mlops-environment-setup/
β”œβ”€β”€ README.md                          # ⭐ 이 파일 (μ‹€μŠ΅ κ°€μ΄λ“œ)
β”œβ”€β”€ verify_all.sh                      # πŸ”§ 톡합 검증 슀크립트
β”‚
β”œβ”€β”€ 1_kubeflow_setup/
β”‚   β”œβ”€β”€ verify_kubeflow.sh             # Part 1: Kubeflow 검증 슀크립트
β”‚
β”œβ”€β”€ 2_mlflow_setup/
β”‚   β”œβ”€β”€ verify_mlflow.sh               # Part 2: MLflow 검증 슀크립트
β”‚
└── 3_storage_setup/
    β”œβ”€β”€ verify_storage.sh              # Part 3: Storage 검증 슀크립트

🎯 Tenant ꡬ성

λ³Έ κ΅μœ‘μ—μ„œλŠ” μˆ˜κ°•μƒ 15λͺ…κ³Ό 강사 1λͺ…μ—κ²Œ 각각 λ…λ¦½λœ MLOps ν™˜κ²½μ„ μ œκ³΅ν•©λ‹ˆλ‹€.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Profile            β”‚ Owner Email                    β”‚ λ¦¬μ†ŒμŠ€                       β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ profile-user01     β”‚ user01@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user02     β”‚ user02@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user03     β”‚ user03@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user04     β”‚ user04@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user05     β”‚ user05@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user06     β”‚ user06@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user07     β”‚ user07@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user08     β”‚ user08@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user09     β”‚ user09@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user10     β”‚ user10@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user11     β”‚ user11@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user12     β”‚ user12@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user13     β”‚ user13@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user14     β”‚ user14@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user15     β”‚ user15@mlops.local             β”‚ CPU 8, Memory 16Gi           β”‚
β”‚ profile-user20     β”‚ user20@mlops.local (강사)       β”‚ CPU 16, Memory 32Gi ⭐       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”§ 사전 μ€€λΉ„ (10λΆ„)

Step 0-1: ν•„μˆ˜ 도ꡬ 확인

이 μ‹€μŠ΅μ„ μ‹œμž‘ν•˜κΈ° 전에 λ‹€μŒ 도ꡬ가 μ„€μΉ˜λ˜μ–΄ μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€:

# 1. AWS CLI 버전 확인
aws --version
# μ˜ˆμƒ 좜λ ₯: aws-cli/2.x.x Python/3.x.x ...

# 2. kubectl 버전 확인
kubectl version --client
# μ˜ˆμƒ 좜λ ₯: Client Version: v1.27.x ...

# 3. Git 버전 확인
git --version
# μ˜ˆμƒ 좜λ ₯: git version 2.x.x

μ„€μΉ˜λ˜μ§€ μ•Šμ€ 경우:

도ꡬ macOS Windows
AWS CLI brew install awscli AWS CLI μ„€μΉ˜ ν”„λ‘œκ·Έλž¨
kubectl brew install kubectl choco install kubernetes-cli
Git brew install git Git for Windows

Step 0-2: ν™˜κ²½ λ³€μˆ˜ μ„€μ •

⚠️ 맀우 μ€‘μš”: 본인의 μ‚¬μš©μž 번호λ₯Ό μ •ν™•νžˆ μž…λ ₯ν•˜μ„Έμš”!

macOS / Linux:

# μ‚¬μš©μž 번호 μ„€μ • (예: 01, 02, 03... 15, 20)
export USER_NUM="01"  # ⚠️ λ°˜λ“œμ‹œ 본인 번호둜 λ³€κ²½ν•˜μ„Έμš”!

# κ΄€λ ¨ ν™˜κ²½ λ³€μˆ˜ μžλ™ μ„€μ •
export NAMESPACE="kubeflow-user${USER_NUM}"
export S3_BUCKET="mlops-training-user${USER_NUM}"
export AWS_REGION="ap-northeast-2"

# ν™˜κ²½ λ³€μˆ˜ 확인
echo "μ‚¬μš©μž 번호: $USER_NUM"
echo "λ„€μž„μŠ€νŽ˜μ΄μŠ€: $NAMESPACE"
echo "S3 버킷: $S3_BUCKET"

Windows PowerShell:

# μ‚¬μš©μž 번호 μ„€μ •
$env:USER_NUM = "01"  # ⚠️ λ°˜λ“œμ‹œ 본인 번호둜 λ³€κ²½ν•˜μ„Έμš”!

# κ΄€λ ¨ ν™˜κ²½ λ³€μˆ˜ μžλ™ μ„€μ •
$env:NAMESPACE = "kubeflow-user$env:USER_NUM"
$env:S3_BUCKET = "mlops-training-user$env:USER_NUM"
$env:AWS_REGION = "ap-northeast-2"

# ν™˜κ²½ λ³€μˆ˜ 확인
echo "μ‚¬μš©μž 번호: $env:USER_NUM"
echo "λ„€μž„μŠ€νŽ˜μ΄μŠ€: $env:NAMESPACE"
echo "S3 버킷: $env:S3_BUCKET"

Step 0-3: AWS 자격 증λͺ… μ„€μ •

강사가 μ œκ³΅ν•œ AWS Access Key와 Secret Keyλ₯Ό μ€€λΉ„ν•˜μ„Έμš”.

# AWS 자격 증λͺ… μ„€μ •
aws configure

# μž…λ ₯ ν•­λͺ©:
# AWS Access Key ID: (강사 제곡)
# AWS Secret Access Key: (강사 제곡)
# Default region name: ap-northeast-2
# Default output format: json

# μ„€μ • 확인
aws sts get-caller-identity

μ˜ˆμƒ 좜λ ₯:

{
    "UserId": "AIDAXXXXXXXXXXXXXXXXX",
    "Account": "123456789012",
    "Arn": "arn:aws:iam::123456789012:user/mlops-training"
}

Step 0-4: EKS ν΄λŸ¬μŠ€ν„° μ—°κ²°

# EKS ν΄λŸ¬μŠ€ν„° μ—°κ²°
aws eks update-kubeconfig \
    --region ap-northeast-2 \
    --name mlops-training-cluster

# μ—°κ²° 확인
kubectl cluster-info

μ˜ˆμƒ 좜λ ₯:

Kubernetes control plane is running at https://XXXXX.ap-northeast-2.eks.amazonaws.com
CoreDNS is running at https://XXXXX.ap-northeast-2.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

Step 0-5: μ‹€μŠ΅ μ €μž₯μ†Œ 클둠

# GitHub μ €μž₯μ†Œ 클둠
git clone https://github.com/fastcampusdevmlops/ha-mlops-pipeline.git
cd ha-mlops-pipeline

# Lab 1-1 λ””λ ‰ν† λ¦¬λ‘œ 이동
cd day1/lab1-1_mlops-environment-setup

πŸ”΅ Part 1: Kubeflow Tenant 검증 (20λΆ„)

κ°œμš”

Kubeflow TenantλŠ” 각 μ‚¬μš©μžμ—κ²Œ λ…λ¦½λœ MLOps ν™˜κ²½μ„ μ œκ³΅ν•©λ‹ˆλ‹€. 이 μ„Ήμ…˜μ—μ„œλŠ” 본인의 Tenantκ°€ μ˜¬λ°”λ₯΄κ²Œ μ„€μ •λ˜μ—ˆλŠ”μ§€ ν™•μΈν•©λ‹ˆλ‹€.

검증 슀크립트 μ‹€ν–‰

cd 1_kubeflow_setup

# μ‹€ν–‰ κΆŒν•œ λΆ€μ—¬
chmod +x verify_kubeflow.sh

# ν™˜κ²½ λ³€μˆ˜ μ„€μ • 및 μ‹€ν–‰
export USER_NUM="01"  # 본인 번호둜 λ³€κ²½
./verify_kubeflow.sh

검증 ν•­λͺ©

Step 검증 ν•­λͺ© μ„€λͺ…
1 Namespace kubeflow-user{XX} λ„€μž„μŠ€νŽ˜μ΄μŠ€ 쑴재 확인
2 Profile profile-user{XX} ν”„λ‘œν•„ 및 Owner email 일치 확인
3 ServiceAccount default-editor, default-viewer SA 확인
4 ResourceQuota CPU, Memory ν• λ‹ΉλŸ‰ 확인
5 RoleBinding μ‚¬μš©μž κΆŒν•œ μ„€μ • 확인
6 PodDefault MLflow, Pipeline μ ‘κ·Ό μ„€μ • 확인
7 λ¦¬μ†ŒμŠ€ μƒνƒœ Pods, Services, PVC ν˜„ν™© 확인
8 κΆŒν•œ 격리 λ‹€λ₯Έ μ‚¬μš©μž Namespace μ ‘κ·Ό 차단 확인
9 μ‹œμŠ€ν…œ μƒνƒœ Kubeflow μ£Όμš” μ»΄ν¬λ„ŒνŠΈ μƒνƒœ 확인
10 μ΅œμ’… νŒλ‹¨ μ‹€μŠ΅ κ°€λŠ₯ μ—¬λΆ€ νŒλ‹¨

μˆ˜λ™ 검증 λͺ…λ Ήμ–΄

슀크립트 없이 μˆ˜λ™μœΌλ‘œ ν™•μΈν•˜λ €λ©΄:

# 1. Namespace 확인
kubectl get namespace kubeflow-user${USER_NUM}

# 2. Profile 확인
kubectl get profile profile-user${USER_NUM}
kubectl get profile profile-user${USER_NUM} -o jsonpath='{.spec.owner.name}'

# 3. ServiceAccount 확인
kubectl get serviceaccount -n kubeflow-user${USER_NUM}

# 4. ResourceQuota 확인
kubectl get resourcequota -n kubeflow-user${USER_NUM}

# 5. PodDefault 확인
kubectl get poddefaults -n kubeflow-user${USER_NUM}

# 6. κΆŒν•œ 격리 ν…ŒμŠ€νŠΈ (λ‹€λ₯Έ μ‚¬μš©μž μ ‘κ·Ό μ‹œλ„ - μ‹€νŒ¨ν•΄μ•Ό 정상)
kubectl get pods -n kubeflow-user02  # user01인 경우

μ˜ˆμƒ κ²°κ³Ό

============================================================
검증 κ²°κ³Ό μš”μ•½
============================================================

  βœ… 톡과: 10
  ❌ μ‹€νŒ¨: 0
  ⚠️  경고: 0
  πŸ“Š 총점: 10/10

πŸŽ‰ λͺ¨λ“  검증을 μ™„λ²½ν•˜κ²Œ ν†΅κ³Όν–ˆμŠ΅λ‹ˆλ‹€!

λ‹€μŒ 단계: Part 2 (MLflow ν™˜κ²½ 검증)둜 μ§„ν–‰ν•˜μ„Έμš”.

🟒 Part 2: MLflow ν™˜κ²½ 검증 (20λΆ„)

κ°œμš”

MLflowλŠ” ML μ‹€ν—˜ 좔적, λͺ¨λΈ 관리, 배포λ₯Ό μœ„ν•œ ν”Œλž«νΌμž…λ‹ˆλ‹€. 이 μ„Ήμ…˜μ—μ„œλŠ” MLflow μ„œλ²„ μ—°κ²° 및 Tenant별 섀정을 ν™•μΈν•©λ‹ˆλ‹€.

검증 슀크립트 μ‹€ν–‰

cd ../2_mlflow_setup

# μ‹€ν–‰ κΆŒν•œ λΆ€μ—¬
chmod +x verify_mlflow.sh

# ν™˜κ²½ λ³€μˆ˜ μ„€μ • 및 μ‹€ν–‰
export USER_NUM="01"  # 본인 번호둜 λ³€κ²½
./verify_mlflow.sh

검증 ν•­λͺ©

Step 검증 ν•­λͺ© μ„€λͺ…
1 Namespace Kubeflow Namespace 쑴재 확인
2 Profile Profile 및 Owner email 확인
3 S3 버킷 mlops-training-user{XX} 버킷 확인
4 ECR λ ˆμ§€μŠ€νŠΈλ¦¬ mlops-training/user{XX}* 확인
5 MLflow PodDefault access-mlflow PodDefault 확인
6 MLflow Server MLflow Tracking Server μƒνƒœ 확인
7 κΆŒν•œ 격리 Namespace κ°„ μ ‘κ·Ό 차단 확인

MLflow Server 확인

# MLflow νŒŒλ“œ μƒνƒœ 확인 (μžμ‹ μ˜ λ„€μž„μŠ€νŽ˜μ΄μŠ€)
kubectl get pods -n kubeflow-user${USER_NUM} -l app=mlflow-server

# MLflow μ„œλΉ„μŠ€ 확인
kubectl get svc -n kubeflow-user${USER_NUM} | grep mlflow

# PostgreSQL μƒνƒœ 확인 (MLflow Backend)
kubectl get pods -n mlflow-system -l app=postgres

MLflow UI 접속 ν…ŒμŠ€νŠΈ

# MLflow UI 포트 ν¬μ›Œλ”©
kubectl port-forward svc/mlflow-server -n kubeflow-user${USER_NUM} 5000:5000

# λΈŒλΌμš°μ €μ—μ„œ 접속
# http://localhost:5000

PodDefault 확인

PodDefaultλŠ” Jupyter Notebookμ—μ„œ MLflow에 μžλ™μœΌλ‘œ μ—°κ²°ν•  수 μžˆλ„λ‘ ν™˜κ²½ λ³€μˆ˜λ₯Ό μ£Όμž…ν•©λ‹ˆλ‹€.

# MLflow PodDefault 확인
kubectl get poddefault access-mlflow -n kubeflow-user${USER_NUM} -o yaml

확인할 ν™˜κ²½ λ³€μˆ˜:

env:
- name: MLFLOW_TRACKING_URI
  value: "http://mlflow-server.kubeflow-user{XX}.svc.cluster.local:5000"
- name: MLFLOW_S3_ENDPOINT_URL
  value: "https://s3.ap-northeast-2.amazonaws.com"
- name: AWS_DEFAULT_REGION
  value: "ap-northeast-2"

μ˜ˆμƒ κ²°κ³Ό

============================================================
  검증 κ²°κ³Ό μš”μ•½
============================================================

   βœ… 톡과: 7
   ❌ μ‹€νŒ¨: 0
   ⚠️  경고: 0
   πŸ“Š 총점: 7/7

πŸŽ‰ λͺ¨λ“  검증을 μ™„λ²½ν•˜κ²Œ ν†΅κ³Όν–ˆμŠ΅λ‹ˆλ‹€!

   λ‹€μŒ 단계: Part 3 (AWS μŠ€ν† λ¦¬μ§€ 확인)둜 μ§„ν–‰ν•˜μ„Έμš”.

🟑 Part 3: AWS μŠ€ν† λ¦¬μ§€ 확인 (15λΆ„)

κ°œμš”

MLOps ν”Œλž«νΌμ—μ„œ μ‚¬μš©ν•˜λŠ” AWS μŠ€ν† λ¦¬μ§€(S3, ECR)λ₯Ό ν™•μΈν•©λ‹ˆλ‹€.

검증 슀크립트 μ‹€ν–‰

cd ../3_storage_setup

# μ‹€ν–‰ κΆŒν•œ λΆ€μ—¬
chmod +x verify_storage.sh

# ν™˜κ²½ λ³€μˆ˜ μ„€μ • 및 μ‹€ν–‰
export USER_NUM="01"  # 본인 번호둜 λ³€κ²½
./verify_storage.sh

검증 ν•­λͺ©

Step 검증 ν•­λͺ© μ„€λͺ…
1 S3 버킷 mlops-training-user{XX} 버킷 쑴재 및 μ ‘κ·Ό 확인
2 ECR λ ˆμ§€μŠ€νŠΈλ¦¬ mlops-training/user{XX} λ ˆμ§€μŠ€νŠΈλ¦¬ 확인
3 MLflow Artifacts S3 MLflow Artifacts 폴더 확인
4 Pipeline Artifacts Kubeflow Pipeline Artifacts 폴더 확인
5 μ•„ν‚€ν…μ²˜ 전체 μŠ€ν† λ¦¬μ§€ μ•„ν‚€ν…μ²˜ μš”μ•½
6 데이터 흐름 ν•™μŠ΅ β†’ μ €μž₯ β†’ 배포 흐름 μ„€λͺ…

S3 버킷 확인

# S3 버킷 쑴재 확인
aws s3 ls s3://mlops-training-user${USER_NUM} --region ap-northeast-2

# 버킷 λ‚΄μš© 확인
aws s3 ls s3://mlops-training-user${USER_NUM}/ --region ap-northeast-2

# MLflow Artifacts 폴더 확인
aws s3 ls s3://mlops-training-user${USER_NUM}/mlflow-artifacts/ --region ap-northeast-2

ECR λ ˆμ§€μŠ€νŠΈλ¦¬ 확인

# ECR λ ˆμ§€μŠ€νŠΈλ¦¬ 확인
aws ecr describe-repositories \
    --repository-names mlops-training/user${USER_NUM} \
    --region ap-northeast-2

# ECR 둜그인
aws ecr get-login-password --region ap-northeast-2 | \
    docker login --username AWS --password-stdin \
    $(aws sts get-caller-identity --query Account --output text).dkr.ecr.ap-northeast-2.amazonaws.com

μŠ€ν† λ¦¬μ§€ μ•„ν‚€ν…μ²˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              AWS MLOps Storage Architecture                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚   Kubeflow      β”‚         β”‚     MLflow       β”‚          β”‚
β”‚  β”‚   Pipeline      │────────▢│  Tracking Server β”‚          β”‚
β”‚  β”‚                 β”‚         β”‚    (Port 5000)   β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚          β”‚                             β”‚                    β”‚
β”‚          β”‚                   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚          β”‚                   β”‚   PostgreSQL     β”‚          β”‚
β”‚          β”‚                   β”‚   (Metadata DB)  β”‚          β”‚
β”‚          β”‚                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚          β”‚                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚   AWS S3       │◀────────────────────────────────────── β”‚
β”‚  β”‚  (Artifacts)   β”‚         (Model & Artifact Store)       β”‚
β”‚  β”‚                β”‚                                        β”‚
β”‚  β”‚  πŸ“ mlops-training-user{XX}/                           β”‚
β”‚  β”‚     β”œβ”€β”€ mlflow-artifacts/                               β”‚
β”‚  β”‚     └── kubeflow-pipeline-artifacts/                    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                                        β”‚
β”‚  β”‚   AWS ECR      │◀────────────────────────────────────── β”‚
β”‚  β”‚  (Container    β”‚         (Container Images)             β”‚
β”‚  β”‚   Registry)    β”‚                                        β”‚
β”‚  β”‚                β”‚                                        β”‚
β”‚  β”‚  πŸ“¦ mlops-training/user{XX}                             β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                        β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

데이터 흐름

1. ν•™μŠ΅ μ‹€ν–‰
   └─▢ MLflow Tracking
       β”œβ”€β–Ά S3: Model 파일, Artifacts μ €μž₯
       └─▢ PostgreSQL: Metrics, Parameters 기둝

2. λͺ¨λΈ 배포
   β”œβ”€β–Ά S3: λͺ¨λΈ 파일 쑰회
   β”œβ”€β–Ά ECR: μΆ”λ‘  μ„œλ²„ 이미지 μ €μž₯
   └─▢ KServe: InferenceService 생성

3. νŒŒμ΄ν”„λΌμΈ μ‹€ν–‰
   β”œβ”€β–Ά ECR: μ»΄ν¬λ„ŒνŠΈ 이미지 μ‚¬μš©
   β”œβ”€β–Ά S3: μž…λ ₯ 데이터 λ‘œλ“œ
   └─▢ S3: μ‹€ν–‰ κ²°κ³Ό μ €μž₯

πŸ”§ 톡합 검증 슀크립트

전체 ν™˜κ²½ 검증

μ„Έ κ°€μ§€ Partλ₯Ό ν•œ λ²ˆμ— κ²€μ¦ν•˜λ €λ©΄:

cd ..  # lab1-1_mlops-environment-setup λ””λ ‰ν† λ¦¬λ‘œ 이동

# 톡합 검증 슀크립트 μ‹€ν–‰
chmod +x verify_all.sh
export USER_NUM="01"  # 본인 번호둜 λ³€κ²½
./verify_all.sh

verify_all.sh λ‚΄μš©

#!/bin/bash
# Lab 1-1 톡합 검증 슀크립트

echo "============================================================"
echo "  Lab 1-1: MLOps ν™˜κ²½ 톡합 검증"
echo "============================================================"

# ν™˜κ²½ λ³€μˆ˜ 확인
if [ -z "$USER_NUM" ]; then
    read -p "μ‚¬μš©μž 번호λ₯Ό μž…λ ₯ν•˜μ„Έμš” (예: 01): " USER_NUM
    export USER_NUM
fi

echo ""
echo "πŸ‘€ μ‚¬μš©μž: user${USER_NUM}"
echo ""

# Part 1: Kubeflow 검증
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "[Part 1] Kubeflow Tenant 검증"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
cd 1_kubeflow_setup && ./verify_kubeflow.sh
PART1_RESULT=$?
cd ..

# Part 2: MLflow 검증
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "[Part 2] MLflow ν™˜κ²½ 검증"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
cd 2_mlflow_setup && ./verify_mlflow.sh
PART2_RESULT=$?
cd ..

# Part 3: Storage 검증
echo ""
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "[Part 3] AWS μŠ€ν† λ¦¬μ§€ 검증"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
cd 3_storage_setup && ./verify_storage.sh
PART3_RESULT=$?
cd ..

# μ΅œμ’… κ²°κ³Ό
echo ""
echo "============================================================"
echo "  μ΅œμ’… 검증 κ²°κ³Ό"
echo "============================================================"
echo ""

TOTAL_FAIL=$((PART1_RESULT + PART2_RESULT + PART3_RESULT))

if [ $TOTAL_FAIL -eq 0 ]; then
    echo "πŸŽ‰ λͺ¨λ“  검증을 ν†΅κ³Όν–ˆμŠ΅λ‹ˆλ‹€!"
    echo ""
    echo "λ‹€μŒ 단계: Lab 1-2 (Hello Pipeline)둜 μ§„ν–‰ν•˜μ„Έμš”."
else
    echo "⚠️  일뢀 검증이 μ‹€νŒ¨ν–ˆμŠ΅λ‹ˆλ‹€."
    echo ""
    echo "μ‹€νŒ¨ν•œ Partλ₯Ό ν™•μΈν•˜κ³  κ°•μ‚¬μ—κ²Œ λ¬Έμ˜ν•˜μ„Έμš”."
fi

exit $TOTAL_FAIL

πŸ› οΈ νŠΈλŸ¬λΈ”μŠˆνŒ…

문제 1: kubectl λͺ…λ Ήμ–΄ μ‹€ν–‰ μ‹€νŒ¨

증상:

error: You must be logged in to the server (Unauthorized)

ν•΄κ²°:

# kubeconfig κ°±μ‹ 
aws eks update-kubeconfig \
    --region ap-northeast-2 \
    --name mlops-training-cluster

# μ—°κ²° 확인
kubectl cluster-info

문제 2: Profile Owner Email 뢈일치

증상:

⚠️  Owner email 뢈일치!
   μ˜ˆμƒ: user07@mlops.local
   μ‹€μ œ: user@example.com

ν•΄κ²°: κ°•μ‚¬μ—κ²Œ λ¬Έμ˜ν•˜μ—¬ Profile Owner μˆ˜μ •μ„ μš”μ²­ν•˜μ„Έμš”.

# 강사가 μ‹€ν–‰ν•  λͺ…λ Ήμ–΄
kubectl patch profile profile-user07 --type=merge \
    -p '{"spec":{"owner":{"name":"user07@mlops.local"}}}'

문제 3: S3 버킷 μ ‘κ·Ό κ±°λΆ€

증상:

An error occurred (AccessDenied) when calling the ListBuckets operation

ν•΄κ²°:

# AWS 자격증λͺ… μž¬μ„€μ •
aws configure

# 자격증λͺ… 확인
aws sts get-caller-identity

문제 4: PodDefault μ—†μŒ

증상:

❌ MLflow PodDefault μ—†μŒ: access-mlflow

ν•΄κ²°: κ°•μ‚¬μ—κ²Œ λ¬Έμ˜ν•˜μ—¬ PodDefault 생성을 μš”μ²­ν•˜μ„Έμš”.

문제 5: MLflow Server μ—°κ²° μ‹€νŒ¨

증상:

❌ MLflow Serverλ₯Ό 찾을 수 μ—†μŠ΅λ‹ˆλ‹€

ν•΄κ²°:

# MLflow νŒŒλ“œ μƒνƒœ 확인
kubectl get pods -n mlflow-system

# νŒŒλ“œ 둜그 확인
kubectl logs -n mlflow-system -l app=mlflow-server

문제 6: λ‹€λ₯Έ μ‚¬μš©μž Namespace μ ‘κ·Ό κ°€λŠ₯

증상:

⚠️  λ‹€λ₯Έ λ„€μž„μŠ€νŽ˜μ΄μŠ€ μ ‘κ·Ό κ°€λŠ₯ (κΆŒν•œ 확인 ν•„μš”)

ν•΄κ²°: 이 경우 RBAC 섀정이 μ˜¬λ°”λ₯΄μ§€ μ•Šμ„ 수 μžˆμŠ΅λ‹ˆλ‹€. κ°•μ‚¬μ—κ²Œ NetworkPolicy 및 RBAC μ„€μ • 확인을 μš”μ²­ν•˜μ„Έμš”.


βœ… μ™„λ£Œ 체크리슀트

사전 μ€€λΉ„

  • AWS CLI μ„€μΉ˜ 및 자격 증λͺ… μ„€μ •
  • kubectl μ„€μΉ˜ 및 EKS ν΄λŸ¬μŠ€ν„° μ—°κ²°
  • ν™˜κ²½ λ³€μˆ˜ μ„€μ • (USER_NUM, NAMESPACE, S3_BUCKET)
  • GitHub μ €μž₯μ†Œ 클둠

Part 1: Kubeflow Tenant

  • Namespace 쑴재 확인 (kubeflow-user{XX})
  • Profile 및 Owner Email 확인 (user{XX}@mlops.local)
  • ServiceAccount 확인 (default-editor, default-viewer)
  • ResourceQuota 확인
  • κΆŒν•œ 격리 ν…ŒμŠ€νŠΈ 톡과

Part 2: MLflow ν™˜κ²½

  • MLflow Server μ‹€ν–‰ 쀑
  • PostgreSQL μ‹€ν–‰ 쀑
  • MLflow PodDefault 쑴재 (access-mlflow)
  • Pipeline PodDefault 쑴재 (access-ml-pipeline)
  • MLflow UI 포트 ν¬μ›Œλ”© ν…ŒμŠ€νŠΈ

Part 3: AWS μŠ€ν† λ¦¬μ§€

  • S3 버킷 쑴재 (mlops-training-user{XX})
  • ECR λ ˆμ§€μŠ€νŠΈλ¦¬ 확인
  • ECR 둜그인 성곡

πŸ“š λ‹€μŒ 단계

λͺ¨λ“  검증을 ν†΅κ³Όν–ˆλ‹€λ©΄ λ‹€μŒ μ‹€μŠ΅μœΌλ‘œ μ§„ν–‰ν•˜μ„Έμš”:

➑️ Lab 1-2: Hello World Pipeline

cd ../lab1-2_hello-pipeline

Lab 1-2μ—μ„œλŠ” Kubeflow Pipelinesλ₯Ό μ‚¬μš©ν•˜μ—¬ 첫 번째 ML νŒŒμ΄ν”„λΌμΈμ„ μž‘μ„±ν•©λ‹ˆλ‹€.


πŸ“ž 지원

문제 λ°œμƒ μ‹œ κ°•μ‚¬μ—κ²Œ λ‹€μŒ 정보λ₯Ό μ „λ‹¬ν•˜μ„Έμš”:

  1. μ‚¬μš©μž 번호 (예: 07)
  2. 검증 슀크립트 μ‹€ν–‰ κ²°κ³Ό 캑처
  3. 였λ₯˜ λ©”μ‹œμ§€ μ „λ¬Έ
  4. μ‹€ν–‰ν•œ λͺ…λ Ήμ–΄

πŸ“– μ°Έκ³  자료


Β© 2025 ν˜„λŒ€μ˜€ν† μ—λ²„ MLOps Training