# Qwen-Image-Layered로 포스터 자동 레이어 분해 (6/10): Vertex AI 통합

## Gemini Vision API 활용 전략

Qwen-Image-Layered는 레이어를 분해하지만, 각 레이어가 무엇인지 설명하지 않는다. 사용자는 `layer_0.png`, `layer_1.png` 파일만 받게 된다.

**사용자 경험 개선**:
- "이 레이어는 배경입니다"
- "이 레이어는 메인 타이틀 텍스트입니다"
- "최적 레이어 수는 5개입니다"

이를 위해 Google Vertex AI의 Gemini Vision API를 통합한다.

## Vertex AI 세팅

### Google Cloud 프로젝트 생성

```bash
# gcloud CLI 설치 (이미 설치되어 있다면 생략)
curl https://sdk.cloud.google.com | bash
exec -l $SHELL

# 로그인
gcloud auth login

# 프로젝트 생성
gcloud projects create qwen-layered-001 --name="Qwen Layered"

# 프로젝트 설정
gcloud config set project qwen-layered-001

# Vertex AI API 활성화
gcloud services enable aiplatform.googleapis.com
```

### 서비스 계정 생성

```bash
# 서비스 계정 생성
gcloud iam service-accounts create qwen-layered-sa \
    --display-name="Qwen Layered Service Account"

# Vertex AI 권한 부여
gcloud projects add-iam-policy-binding qwen-layered-001 \
    --member="serviceAccount:qwen-layered-sa@qwen-layered-001.iam.gserviceaccount.com" \
    --role="roles/aiplatform.user"

# 키 생성
gcloud iam service-accounts keys create service-account-key.json \
    --iam-account=qwen-layered-sa@qwen-layered-001.iam.gserviceaccount.com

# .env에 추가
echo "GOOGLE_APPLICATION_CREDENTIALS=./service-account-key.json" >> .env
echo "GOOGLE_CLOUD_PROJECT=qwen-layered-001" >> .env
echo "VERTEX_AI_LOCATION=us-central1" >> .env
```

## Vertex AI 클라이언트 구현

`models/gemini_analyzer.py`:

```python
from google.cloud import aiplatform
from vertexai.generative_models import GenerativeModel, Part, Image as VertexImage
import base64
from typing import List, Dict
from PIL import Image
import io

class GeminiLayerAnalyzer:
    def __init__(self):
        """Gemini Vision 클라이언트 초기화"""
        # Vertex AI 초기화
        project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
        location = os.getenv("VERTEX_AI_LOCATION", "us-central1")

        aiplatform.init(project=project_id, location=location)

        # Gemini 1.5 Flash 모델
        self.model = GenerativeModel("gemini-1.5-flash")

        print(f"✓ Gemini Vision 클라이언트 초기화 완료 (Project: {project_id})")

    async def analyze_layer(self, layer_image: Image.Image, layer_index: int) -> str:
        """단일 레이어 분석 및 설명 생성"""

        # PIL Image → bytes
        img_byte_arr = io.BytesIO()
        layer_image.save(img_byte_arr, format='PNG')
        img_bytes = img_byte_arr.getvalue()

        # Vertex AI Image 객체
        vertex_image = VertexImage.from_bytes(img_bytes)

        # 프롬프트
        prompt = f"""
이 이미지는 포스터를 레이어로 분해한 결과 중 레이어 {layer_index}입니다.
RGBA 형식이므로 투명한 부분이 있을 수 있습니다.

이 레이어에 포함된 요소를 한국어로 간단히 설명해주세요.
5-10단어 이내로 핵심만 작성하세요.

예시:
- "파란색 그라디언트 배경"
- "메인 타이틀 텍스트 (AI Seminar)"
- "로봇 일러스트"
- "하단 주최사 로고"

설명:
"""

        # Gemini API 호출
        response = await self.model.generate_content_async([
            prompt,
            vertex_image
        ])

        description = response.text.strip()
        return description

    async def analyze_all_layers(
        self,
        layers: List[Image.Image]
    ) -> List[str]:
        """모든 레이어 분석"""
        descriptions = []

        for i, layer in enumerate(layers):
            print(f"  Analyzing layer {i}...")
            description = await self.analyze_layer(layer, i)
            descriptions.append(description)

        return descriptions

    async def suggest_layer_count(
        self,
        original_image: Image.Image
    ) -> Dict[str, any]:
        """최적 레이어 수 추천"""

        # 이미지 → bytes
        img_byte_arr = io.BytesIO()
        original_image.save(img_byte_arr, format='PNG')
        img_bytes = img_byte_arr.getvalue()

        vertex_image = VertexImage.from_bytes(img_bytes)

        prompt = """
이 포스터 이미지를 분석하여 레이어 분해 시 최적 레이어 수를 추천해주세요.

다음 JSON 형식으로 응답하세요:
{
  "recommended_layers": 5,
  "reason": "배경, 메인 이미지, 타이틀, 부제목, 로고로 구분 가능",
  "complexity": "medium"
}

complexity는 "simple", "medium", "complex" 중 하나입니다.

JSON:
"""

        response = await self.model.generate_content_async([
            prompt,
            vertex_image
        ])

        # JSON 파싱
        import json
        import re

        text = response.text
        # JSON 부분만 추출
        json_match = re.search(r'\{.*\}', text, re.DOTALL)
        if json_match:
            result = json.loads(json_match.group())
            return result
        else:
            # 파싱 실패 시 기본값
            return {
                "recommended_layers": 5,
                "reason": "표준 레이어 구조",
                "complexity": "medium"
            }

    async def evaluate_quality(
        self,
        original_image: Image.Image,
        layers: List[Image.Image]
    ) -> Dict[str, any]:
        """레이어 분해 품질 평가"""

        # 원본 이미지와 레이어 재합성 이미지 비교
        from PIL import ImageChops

        # 레이어 합성
        composite = Image.new("RGBA", layers[0].size, (0, 0, 0, 0))
        for layer in layers:
            composite = Image.alpha_composite(composite, layer)

        # 원본과 비교
        original_rgb = original_image.convert("RGBA")
        diff = ImageChops.difference(original_rgb, composite)

        # 차이 계산
        import numpy as np
        diff_array = np.array(diff)
        avg_diff = np.mean(diff_array)

        # 품질 점수 (0-100)
        quality_score = max(0, 100 - (avg_diff / 255 * 100))

        return {
            "quality_score": round(quality_score, 2),
            "avg_pixel_diff": round(avg_diff, 2),
            "evaluation": "Good" if quality_score > 80 else "Fair" if quality_score > 60 else "Poor"
        }
```

## 워커에 Gemini 통합

`worker.py` 수정:

```python
from app.models.gemini_analyzer import GeminiLayerAnalyzer

async def process_job(job_id: str, job_data: dict):
    """작업 처리 (Gemini 통합)"""
    queue = JobQueue()
    decomposer = QwenDecomposer()
    gemini = GeminiLayerAnalyzer()  # 추가
    settings = get_settings()

    try:
        # ... 이전 코드 (레이어 분해) ...

        # Gemini로 레이어 분석
        queue.update_job(
            job_id,
            progress=90,
            message="AI로 레이어 설명 생성 중..."
        )

        descriptions = await gemini.analyze_all_layers(layers)

        # 레이어 정보 생성 (설명 포함)
        layer_info = []
        for i, layer in enumerate(layers):
            filename = f"layer_{i}.png"
            filepath = os.path.join(result_dir, filename)
            layer.save(filepath, format="PNG")

            size_kb = os.path.getsize(filepath) / 1024

            layer_info.append({
                "index": i,
                "filename": filename,
                "url": f"/results/{job_id}/{filename}",
                "size_kb": round(size_kb, 2),
                "description": descriptions[i]  # Gemini 설명
            })

        # 품질 평가
        original_image = Image.open(job_data["image_path"])
        quality = await gemini.evaluate_quality(original_image, layers)

        # 완료 (품질 정보 포함)
        queue.update_job(
            job_id,
            status="completed",
            progress=100,
            message=f"완료! (품질: {quality['quality_score']}/100)",
            layers=layer_info
        )

    except Exception as e:
        # ... 에러 처리 ...
```

## API 엔드포인트 추가

`api/analyze.py` (신규):

```python
from fastapi import APIRouter, HTTPException, UploadFile, File
from app.models.gemini_analyzer import GeminiLayerAnalyzer
from PIL import Image
import io

router = APIRouter()

@router.post("/analyze/suggest-layers")
async def suggest_layer_count(file: UploadFile = File(...)):
    """최적 레이어 수 추천"""
    gemini = GeminiLayerAnalyzer()

    # 이미지 로드
    contents = await file.read()
    image = Image.open(io.BytesIO(contents))

    # Gemini 분석
    result = await gemini.suggest_layer_count(image)

    return {
        "recommended_layers": result["recommended_layers"],
        "reason": result["reason"],
        "complexity": result["complexity"]
    }
```

`main.py`에 라우터 추가:

```python
from app.api import analyze

app.include_router(analyze.router, prefix="/api", tags=["analyze"])
```

## 테스트

### 레이어 수 추천 테스트

```bash
curl -X POST http://localhost:8000/api/analyze/suggest-layers \
  -F "file=@poster.jpg"

# 응답:
{
  "recommended_layers": 5,
  "reason": "배경 그라디언트, 메인 이미지, 타이틀, 날짜 정보, 로고로 구성",
  "complexity": "medium"
}
```

### 레이어 설명 확인

```bash
# 분해 작업 생성
JOB_ID=$(curl -X POST http://localhost:8000/api/decompose \
  -H "Content-Type: application/json" \
  -d '{"file_id": "uuid", "num_layers": 5, "resolution": 1024}' \
  | jq -r '.job_id')

# 완료 대기
sleep 60

# 결과 확인
curl http://localhost:8000/api/status/$JOB_ID | jq '.layers'

# 응답:
[
  {
    "index": 0,
    "filename": "layer_0.png",
    "url": "/results/uuid/layer_0.png",
    "size_kb": 523.4,
    "description": "파란색에서 보라색으로 그라디언트 배경"
  },
  {
    "index": 1,
    "filename": "layer_1.png",
    "url": "/results/uuid/layer_1.png",
    "size_kb": 312.8,
    "description": "중앙의 로봇 일러스트"
  },
  {
    "index": 2,
    "filename": "layer_2.png",
    "url": "/results/uuid/layer_2.png",
    "size_kb": 156.2,
    "description": "AI Seminar 메인 타이틀 텍스트"
  },
  {
    "index": 3,
    "filename": "layer_3.png",
    "url": "/results/uuid/layer_3.png",
    "size_kb": 89.5,
    "description": "날짜 및 장소 정보 텍스트"
  },
  {
    "index": 4,
    "filename": "layer_4.png",
    "url": "/results/uuid/layer_4.png",
    "size_kb": 67.1,
    "description": "하단 주최사 로고"
  }
]
```

## 비용 최적화

Gemini Vision API는 유료 서비스다. 비용 최적화 전략:

### 1. 캐싱

```python
import hashlib
import json

class GeminiLayerAnalyzer:
    def __init__(self):
        self.cache = {}  # 간단한 메모리 캐시

    def _get_image_hash(self, image: Image.Image) -> str:
        """이미지 해시 생성"""
        img_bytes = io.BytesIO()
        image.save(img_bytes, format='PNG')
        return hashlib.md5(img_bytes.getvalue()).hexdigest()

    async def analyze_layer(self, layer_image: Image.Image, layer_index: int) -> str:
        # 캐시 확인
        img_hash = self._get_image_hash(layer_image)
        cache_key = f"{img_hash}_{layer_index}"

        if cache_key in self.cache:
            print(f"  Cache hit for layer {layer_index}")
            return self.cache[cache_key]

        # API 호출
        description = await self._call_gemini_api(layer_image, layer_index)

        # 캐시 저장
        self.cache[cache_key] = description

        return description
```

### 2. 선택적 분석

```python
# 사용자가 요청할 때만 분석
@router.post("/decompose")
async def decompose_image(
    file_id: str = Body(...),
    job_params: JobCreate = Body(...),
    analyze_with_ai: bool = Body(default=False)  # 추가
):
    ...
```

### 3. Batch 처리

```python
# 여러 레이어를 한 번의 API 호출로
async def analyze_all_layers_batch(self, layers: List[Image.Image]) -> List[str]:
    """배치 분석 (비용 절감)"""
    # 모든 레이어를 하나의 이미지로 합성
    grid_image = create_grid_image(layers)

    prompt = """
이 이미지는 여러 레이어를 그리드로 배열한 것입니다.
각 레이어를 간단히 설명하세요.

응답 형식:
1. [레이어 0 설명]
2. [레이어 1 설명]
...
"""

    response = await self.model.generate_content_async([prompt, grid_image])

    # 파싱
    descriptions = parse_numbered_list(response.text)
    return descriptions
```

### 4. Gemini Flash vs Pro 선택

```python
class GeminiLayerAnalyzer:
    def __init__(self, use_flash=True):
        # Flash: 저렴, 빠름
        # Pro: 비싸지만 더 정확
        model_name = "gemini-1.5-flash" if use_flash else "gemini-1.5-pro"
        self.model = GenerativeModel(model_name)
```

**가격 비교** (2024년 12월 기준):

```
Gemini 1.5 Flash:
- Input: $0.000125 / 1K chars
- Output: $0.000375 / 1K chars

Gemini 1.5 Pro:
- Input: $0.00125 / 1K chars (10배)
- Output: $0.005 / 1K chars (13배)
```

**권장**: 대부분 Flash로 충분. Pro는 복잡한 분석에만.

## 비용 모니터링

```python
# monitoring.py
from google.cloud import billing_v1

class CostMonitor:
    def __init__(self):
        self.billing_client = billing_v1.CloudBillingClient()

    def get_month_to_date_cost(self, project_id: str):
        """월간 누적 비용 조회"""
        # 구현 생략 (Google Cloud Billing API 사용)
        pass

    def check_budget_alert(self, threshold_usd: float):
        """예산 초과 알림"""
        current_cost = self.get_month_to_date_cost()

        if current_cost > threshold_usd:
            # Slack/Email 알림
            send_alert(f"Vertex AI 비용 초과: ${current_cost}")
```

## 대체 전략: 로컬 Vision 모델

Vertex AI 비용이 부담스럽다면 로컬 모델 사용:

### LLaVA (Local Vision-Language Model)

```python
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration

class LocalVisionAnalyzer:
    def __init__(self):
        self.processor = LlavaNextProcessor.from_pretrained(
            "llava-hf/llava-v1.6-mistral-7b-hf"
        )
        self.model = LlavaNextForConditionalGeneration.from_pretrained(
            "llava-hf/llava-v1.6-mistral-7b-hf",
            torch_dtype=torch.float16,
            device_map="auto"
        )

    def analyze_layer(self, image: Image.Image, prompt: str) -> str:
        inputs = self.processor(prompt, image, return_tensors="pt").to("cuda")
        output = self.model.generate(**inputs, max_new_tokens=100)
        description = self.processor.decode(output[0], skip_special_tokens=True)
        return description
```

**장점**: 무료, 오프라인 가능
**단점**: 품질 낮음, GPU 메모리 추가 필요

## 다음 단계

v7에서는 **웹 인터페이스 구현**을 다룬다:
- HTML/CSS/JavaScript UI
- Drag & Drop 업로드
- WebSocket 실시간 진행률
- 레이어 프리뷰 및 다운로드

백엔드 + AI가 완성되었으니, 사용자 인터페이스를 만들자.

---

**이전 글**: [FastAPI 백엔드 구축 (5/10)](./qwen-image-layered-v5.md)

**다음 글**: [웹 인터페이스 구현 (7/10)](./qwen-image-layered-v7.md)