# Qwen-Image-Layered로 포스터 자동 레이어 분해 (9/10): 배포 및 성능 튜닝 ## 프로덕션 아키텍처 개발 환경에서 프로덕션으로 이전하며 다음을 추가한다: ``` [Internet] ↓ [Nginx] - SSL, Rate Limiting, Static Files ↓ [FastAPI] × 3 instances - Load Balanced ↓ [Redis Cluster] - Job Queue + Cache ↓ [Worker] × 2 instances - GPU Instances ↓ [Storage] - S3 or NFS ``` ## Docker 컨테이너화 ### Backend Dockerfile `Dockerfile.api`: ```dockerfile FROM python:3.11-slim # CUDA 기본 이미지 사용 (GPU 지원) FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 # Python 설치 RUN apt-get update && apt-get install -y \ python3.11 \ python3-pip \ && rm -rf /var/lib/apt/lists/* WORKDIR /app # 의존성 설치 COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # 애플리케이션 복사 COPY app/ ./app/ COPY .env .env # 모델 캐시 볼륨 (재시작 시 재다운로드 방지) VOLUME ["/app/models"] # FastAPI 실행 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"] ``` ### Worker Dockerfile `Dockerfile.worker`: ```dockerfile FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04 # Python 설치 RUN apt-get update && apt-get install -y python3.11 python3-pip WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt # PyTorch with CUDA RUN pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121 COPY app/ ./app/ COPY worker.py . COPY .env .env VOLUME ["/app/models", "/app/storage"] # GPU 필요 CMD ["python3", "worker.py"] ``` ### docker-compose.yml ```yaml version: '3.8' services: redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis-data:/data command: redis-server --appendonly yes api: build: context: . dockerfile: Dockerfile.api ports: - "8000:8000" depends_on: - redis environment: - REDIS_HOST=redis - REDIS_PORT=6379 volumes: - ./storage:/app/storage - model-cache:/app/models worker: build: context: . dockerfile: Dockerfile.worker depends_on: - redis environment: - REDIS_HOST=redis volumes: - ./storage:/app/storage - model-cache:/app/models deploy: resources: reservations: devices: - driver: nvidia count: 1 capabilities: [gpu] volumes: redis-data: model-cache: ``` ## Nginx 설정 `nginx.conf`: ```nginx upstream api_backend { least_conn; server api:8000 max_fails=3 fail_timeout=30s; } # Rate Limiting limit_req_zone $binary_remote_addr zone=upload_limit:10m rate=5r/m; server { listen 80; server_name poster-decomposer.example.com; # HTTP → HTTPS 리다이렉트 return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name poster-decomposer.example.com; # SSL 인증서 ssl_certificate /etc/nginx/ssl/fullchain.pem; ssl_certificate_key /etc/nginx/ssl/privkey.pem; ssl_protocols TLSv1.2 TLSv1.3; # 최대 업로드 크기 client_max_body_size 10M; # 정적 파일 location / { root /var/www/frontend; try_files $uri $uri/ /index.html; } # API 프록시 location /api/ { proxy_pass http://api_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; # 타임아웃 (긴 작업 대비) proxy_read_timeout 300s; proxy_connect_timeout 300s; } # WebSocket location /ws/ { proxy_pass http://api_backend; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; } # 업로드 Rate Limiting location /api/upload { limit_req zone=upload_limit burst=3 nodelay; proxy_pass http://api_backend; } # 결과 파일 location /results/ { alias /var/www/storage/results/; expires 1d; add_header Cache-Control "public, immutable"; } } ``` ## 성능 튜닝 ### FastAPI 설정 ```python # main.py from fastapi import FastAPI from fastapi.middleware.gzip import GZipMiddleware from fastapi.middleware.cors import CORSMiddleware app = FastAPI() # Gzip 압축 app.add_middleware(GZipMiddleware, minimum_size=1000) # CORS app.add_middleware( CORSMiddleware, allow_origins=["https://poster-decomposer.example.com"], allow_methods=["GET", "POST"], allow_headers=["*"], ) # 실행 시 Workers 설정 # uvicorn app.main:app --workers 4 --worker-class uvicorn.workers.UvicornWorker ``` ### Redis 튜닝 `redis.conf`: ```conf # 메모리 최대 사용량 maxmemory 2gb maxmemory-policy allkeys-lru # Persistence (중요 데이터는 아니므로 끔) save "" appendonly no # 최대 클라이언트 연결 maxclients 10000 ``` ### GPU 메모리 관리 ```python import torch # 메모리 할당 전략 torch.cuda.set_per_process_memory_fraction(0.9) # 90% 사용 # 캐시 정리 def cleanup_gpu_memory(): torch.cuda.empty_cache() import gc gc.collect() # 작업 완료 시 호출 async def process_job(job_id, job_data): try: # ... 처리 ... pass finally: cleanup_gpu_memory() ``` ## 모니터링 ### Prometheus Metrics ```python # metrics.py from prometheus_client import Counter, Histogram, Gauge, generate_latest # 메트릭 정의 jobs_total = Counter('jobs_total', 'Total jobs processed', ['status']) job_duration = Histogram('job_duration_seconds', 'Job processing time') gpu_memory = Gauge('gpu_memory_bytes', 'GPU memory usage') queue_length = Gauge('queue_length', 'Job queue length') # FastAPI 엔드포인트 @app.get("/metrics") async def metrics(): # GPU 메모리 업데이트 if torch.cuda.is_available(): gpu_memory.set(torch.cuda.memory_allocated()) # 큐 길이 업데이트 queue = JobQueue() queue_length.set(queue.get_queue_length()) return Response(generate_latest(), media_type="text/plain") ``` ### Grafana 대시보드 `grafana-dashboard.json`: ```json { "dashboard": { "title": "Poster Decomposer Monitoring", "panels": [ { "title": "Job Success Rate", "targets": [{ "expr": "rate(jobs_total{status='completed'}[5m]) / rate(jobs_total[5m])" }] }, { "title": "Average Job Duration", "targets": [{ "expr": "rate(job_duration_seconds_sum[5m]) / rate(job_duration_seconds_count[5m])" }] }, { "title": "GPU Memory Usage", "targets": [{ "expr": "gpu_memory_bytes / 1024^3" }] }, { "title": "Queue Length", "targets": [{ "expr": "queue_length" }] } ] } } ``` ## 에러 트래킹 ### Sentry 통합 ```python import sentry_sdk from sentry_sdk.integrations.fastapi import FastApiIntegration sentry_sdk.init( dsn="https://xxx@sentry.io/xxx", integrations=[FastApiIntegration()], traces_sample_rate=0.1, # 10% 트레이싱 environment="production" ) # 자동으로 에러 캡처 ``` ## 자동 스케일링 ### Kubernetes Deployment `k8s/deployment.yaml`: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: poster-decomposer-api spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api spec: containers: - name: api image: poster-decomposer-api:latest resources: requests: memory: "2Gi" cpu: "1000m" limits: memory: "4Gi" cpu: "2000m" ports: - containerPort: 8000 --- apiVersion: v1 kind: Service metadata: name: api-service spec: selector: app: api ports: - port: 80 targetPort: 8000 type: LoadBalancer --- # GPU Worker (별도 노드풀) apiVersion: apps/v1 kind: Deployment metadata: name: poster-decomposer-worker spec: replicas: 2 template: spec: nodeSelector: cloud.google.com/gke-accelerator: nvidia-tesla-t4 containers: - name: worker image: poster-decomposer-worker:latest resources: limits: nvidia.com/gpu: 1 ``` ### Horizontal Pod Autoscaler ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: poster-decomposer-api minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Pods pods: metric: name: queue_length target: type: AverageValue averageValue: "5" ``` ## 비용 최적화 ### Spot Instances (AWS/GCP) ```python # worker.py에 Graceful Shutdown 추가 import signal import sys def handle_shutdown(signum, frame): """Spot Instance 종료 신호 처리""" print("Received shutdown signal, finishing current job...") # 현재 작업만 완료하고 종료 global should_shutdown should_shutdown = True signal.signal(signal.SIGTERM, handle_shutdown) async def worker_loop(): while not should_shutdown: # ... 작업 처리 ... pass print("Worker shutdown gracefully") sys.exit(0) ``` ### 사용량 기반 스케일링 ```python # auto_scaler.py class WorkerAutoScaler: def __init__(self): self.queue = JobQueue() self.k8s_client = kubernetes.client.AppsV1Api() async def scale_loop(self): """큐 길이 기반 Worker 자동 스케일""" while True: queue_len = self.queue.get_queue_length() if queue_len > 20: # Worker 증가 self.scale_workers(4) elif queue_len < 5: # Worker 감소 self.scale_workers(1) await asyncio.sleep(60) # 1분마다 체크 def scale_workers(self, count: int): self.k8s_client.patch_namespaced_deployment_scale( name="poster-decomposer-worker", namespace="default", body={"spec": {"replicas": count}} ) ``` ## 백업 및 복구 ### 주기적 백업 ```bash #!/bin/bash # backup.sh # Redis 백업 redis-cli SAVE cp /var/lib/redis/dump.rdb /backups/redis_$(date +%Y%m%d).rdb # 결과 파일 S3 동기화 aws s3 sync /var/www/storage/results s3://poster-decomposer-results/ # 7일 이상 된 백업 삭제 find /backups -name "redis_*.rdb" -mtime +7 -delete ``` Cron: ``` 0 2 * * * /opt/scripts/backup.sh ``` ## 다음 단계 v10에서는 **프로덕션 운영 가이드**를 다룬다: - 일반적인 장애 대응 - 성능 이슈 디버깅 - 사용자 피드백 수집 - 향후 확장 로드맵 배포가 완료되었으니, 안정적인 운영 방법을 정리하자. --- **이전 글**: [후처리 및 최적화 (8/10)](./qwen-image-layered-v8.md) **다음 글**: [프로덕션 운영 가이드 (10/10)](./qwen-image-layered-v10.md)