Skip to content

Latest commit

 

History

History
99 lines (68 loc) · 2.72 KB

File metadata and controls

99 lines (68 loc) · 2.72 KB

Health Food Advertisement Review System

Date: 2024-12-06 Topic: Multi-scenario compliance checking with VL+LLM


Background

Today I built a health food advertisement review system that checks three types of compliance violations: disease treatment claims, non-pharmaceutical statements, and proper identification marks.


The Three Review Scenarios

  1. Disease Prevention/Treatment Claims: Health foods cannot claim to prevent or treat diseases
  2. Non-Pharmaceutical Statement: Must clearly state the product is not a medicine
  3. Identification Mark: Must display proper certification marks (like the "Blue Hat" symbol)

Data Model Design

The system uses Pydantic models with inheritance:

# Base judgment - common to all checks
class BaseJudgement(BaseModel):
    is_or_not: bool
    reason: str
    reference: str

# Scenario-specific models
class DiseaseFunctionCheck(BaseModel):
    has_disease_function: bool
    disease_claims: List[str]
    is_valid: bool

# Combined response
class HealthFoodAnalysisResponse(BaseResponse):
    data: Optional[HealthFoodAnalysisResult]

LLM Prompt Design

The prompt needs to produce structured JSON output:

HEALTH_FOOD_LLM_PROMPT = """
You are a health food advertisement review assistant.
Analyze the following text and check for compliance issues.

Output JSON strictly in this format:
{
    "is_or_not": false,
    "reason": "Review found issues...",
    "reference": "Regulatory basis..."
}
"""

Key considerations:

  • Define exact JSON structure
  • Specify field types explicitly
  • Handle Chinese character encoding

Multi-Layer Error Handling

1. File Validation Layer   → Check file type and size
2. OCR Processing Layer    → Handle text recognition failures
3. JSON Parsing Layer      → Handle format errors
4. Business Logic Layer    → Handle validation failures

Each layer catches and transforms errors into appropriate responses.


Today's Reflection

Building a multi-scenario review system forced me to think carefully about model design. The key insight was using composition over inheritance - each scenario check is a separate model that gets combined into the final result.

The prompt engineering was tricky. LLMs tend to be "creative" with output format, so I had to be very explicit about JSON structure. Including example outputs in the prompt helped consistency.

One challenge: the LLM sometimes mixed Chinese and English in responses. Adding explicit language instructions helped, but it's not 100% reliable.


Further Learning

  • Pydantic validators and computed fields
  • Few-shot prompting for structured output
  • Chinese text processing in LLMs
  • Regulatory compliance system design