# APNG Lip Sync Tool 개발기 - v4: 얼굴/입 영역 감지

## 개요

v4에서는 Gemini Vision을 활용하여 업로드된 이미지에서 얼굴과 입 영역을 자동으로 감지하는 기능을 구현합니다.

## 구현: face_detector.py

```python
def detect_face_regions(image_path: str | Path) -> dict:
    """Detect face and mouth regions using Gemini Vision."""
    prompt = """Analyze this face image and identify the regions.
    Return a JSON object with:
    - face: {x, y, width, height}
    - mouth: {x, y, width, height}
    - eyes: {left: {x, y}, right: {x, y}}
    - face_angle: frontal/profile/three_quarter
    - art_style: realistic/anime/cartoon
    - confidence: 0-1
    """

    response = client.models.generate_content(
        model=GEMINI_MODEL,
        contents=[prompt, image_part]
    )
    return json.loads(response.text)
```

## 이미지 검증

```python
def validate_face_image(image_path) -> dict:
    """Validate that an image is suitable for lip sync."""
    result = detect_face_regions(image_path)

    recommendations = []
    if face_angle not in ["frontal", "three_quarter"]:
        recommendations.append("Use frontal view for best results")

    if confidence < 0.7:
        recommendations.append("Image quality may affect results")

    return {
        "valid": len(recommendations) == 0,
        "recommendations": recommendations
    }
```

## 활용

- 업로드 시 자동으로 이미지 품질 검증
- 입 영역 좌표를 이용한 정밀한 viseme 생성
- 애니메이션/실사 스타일 자동 감지

---

*다음: v5 - 립싱크 표준 입모양 세트 정의*