Date: 2024-12-06 Topic: Multi-scenario compliance checking with VL+LLM
Today I built a health food advertisement review system that checks three types of compliance violations: disease treatment claims, non-pharmaceutical statements, and proper identification marks.
- Disease Prevention/Treatment Claims: Health foods cannot claim to prevent or treat diseases
- Non-Pharmaceutical Statement: Must clearly state the product is not a medicine
- Identification Mark: Must display proper certification marks (like the "Blue Hat" symbol)
The system uses Pydantic models with inheritance:
# Base judgment - common to all checks
class BaseJudgement(BaseModel):
is_or_not: bool
reason: str
reference: str
# Scenario-specific models
class DiseaseFunctionCheck(BaseModel):
has_disease_function: bool
disease_claims: List[str]
is_valid: bool
# Combined response
class HealthFoodAnalysisResponse(BaseResponse):
data: Optional[HealthFoodAnalysisResult]The prompt needs to produce structured JSON output:
HEALTH_FOOD_LLM_PROMPT = """
You are a health food advertisement review assistant.
Analyze the following text and check for compliance issues.
Output JSON strictly in this format:
{
"is_or_not": false,
"reason": "Review found issues...",
"reference": "Regulatory basis..."
}
"""Key considerations:
- Define exact JSON structure
- Specify field types explicitly
- Handle Chinese character encoding
1. File Validation Layer → Check file type and size
2. OCR Processing Layer → Handle text recognition failures
3. JSON Parsing Layer → Handle format errors
4. Business Logic Layer → Handle validation failures
Each layer catches and transforms errors into appropriate responses.
Building a multi-scenario review system forced me to think carefully about model design. The key insight was using composition over inheritance - each scenario check is a separate model that gets combined into the final result.
The prompt engineering was tricky. LLMs tend to be "creative" with output format, so I had to be very explicit about JSON structure. Including example outputs in the prompt helped consistency.
One challenge: the LLM sometimes mixed Chinese and English in responses. Adding explicit language instructions helped, but it's not 100% reliable.
- Pydantic validators and computed fields
- Few-shot prompting for structured output
- Chinese text processing in LLMs
- Regulatory compliance system design