Implement flag (report) system for content moderation

> [!NOTE]
> This issue is based on the Korean specification document [Hackers' Pub 신고(flag) 기능 기획서](https://hackers.pub/@hongminhee/2025/moderation).


Summary
-------

This issue proposes implementing a comprehensive flag (report) system for Hackers' Pub that allows users to report content or accounts that violate the [code of conduct]. The system should support the federated nature of ActivityPub while protecting reporter anonymity and ensuring fair treatment of reported users.


Core philosophy
---------------

The flag system's ultimate purpose is *education and growth*, not creating a sterile community of perfect users. Reports should help individuals reflect on their behavior and grow as better community members.

**Expulsion is the last resort.** The system encourages a graduated approach:

 1. *Awareness* — The reported user learns their behavior may be problematic
 2. *Reflection* — They understand why the behavior is problematic
 3. *Improvement* — They modify their behavior and participate harmoniously
 4. *Sanctions* — Applied only when there's no willingness to improve or for severe violations


Design principles
-----------------

### Reporter protection

 - Reporter identity must be kept strictly confidential
 - Only moderators can access reporter information
 - The reported user only sees the report reason and violated rules, never who reported them
 - This protection is essential for the reporting system to function effectively

### Right to know for reported users

 - Reported users have the right to know why they were reported
 - They receive information about which code of conduct provisions were violated
 - They see which content was flagged
 - This enables them to understand and improve their behavior

### Flexible code of conduct referencing

 - The code of conduct is a living document that evolves over time
 - Report reasons should not be hardcoded to specific clause numbers
 - The system records the code of conduct version (Git commit hash) at report time
 - LLM analysis references the current full code of conduct text dynamically
 - Original reporter-written reasons are always preserved

### Transparent processing

 - Reported users are informed of actions taken and reasons
 - Moderator decision rationale is recorded

> [!IMPORTANT]
> **Reporter notification requires careful consideration.** Notifying reporters of outcomes could enable malicious actors to probe moderation boundaries through repeated reports. The system should:
>
> - Distinguish between internal reports (from local users) and external reports (from remote instances via ActivityPub `Flag`)
> - Consider limiting detailed outcome notifications to trusted reporters or omitting them entirely
> - At minimum, avoid revealing specific sanction details that could be exploited

### Graduated sanctions

 - Sanctions are proportional to violation severity and frequency
 - Warning → Content censorship → Temporary suspension → Permanent suspension
 - Violation history accumulates and affects subsequent sanction levels
 - Severe violations may skip stages for immediate strong action


Report targets
--------------

### Content reports

Users can report individual pieces of content:

#### Article reports

 - Target: Long-form blog-style posts
 - Location: "Report" option in post footer or overflow menu
 - Collected data:
 - Article ID and permalink
 - Author information
 - Content snapshot at report time (evidence preservation)
 - Reporter's written reason

#### Note reports

 - Target: Short microblog-style posts
 - Location: "Report" option in overflow menu
 - Collected data: Same as articles

### User reports

For problematic behavior patterns across multiple posts:

 - Use cases:
 - Persistent problematic behavior across multiple posts
 - Individual posts are borderline, but the overall pattern is problematic
 - Profile itself (name, bio, avatar) violates code of conduct
 - Location: "Report user" option in profile page overflow menu
 - Collected data:
 - User ID and profile link
 - Profile snapshot at report time
 - Reporter's written reason
 - (Optional) Related content links

### Remote content and users

Content and users from other ActivityPub instances can be reported identically:

 - Federated timeline content affects Hackers' Pub users
 - Actions affect display/federation within Hackers' Pub
 - Optional: Reporter may explicitly opt in to forward the report to the remote instance via ActivityPub `Flag` activity (see [Cross-instance report forwarding](#cross-instance-report-forwarding))


Report process
--------------

### Report flow

~~~~ mermaid
flowchart TB
 start[User clicks Report on content/user]
 form[Report form displayed]
 reason[Write reason in free-form text]
 submit[Submit report]
 llm[LLM analyzes reason]
 coc[(Code of Conduct)]
 save[Save report with pending status]
 notify_mod[Notify moderators]
 notify_reporter[Send confirmation to reporter]

 start --> form --> reason --> submit --> llm
 coc -.-> llm
 llm --> save --> notify_mod --> notify_reporter
~~~~

### Report form

The form should be simple while collecting necessary information.

#### Required field: Report reason (free-form text)

~~~~ markdown
Please explain why you are reporting this content/user.
You don't need to know the specific code of conduct provisions.
Feel free to describe what felt uncomfortable or problematic.

[ ]
[ ]
[ ]

Minimum 10 characters required.
~~~~

Rationale:

 - Users shouldn't need to know all code of conduct provisions
 - Free-form input captures richer context
 - LLM analyzes the reason and matches relevant provisions

#### Optional field: Additional content links (for user reports)

~~~~ markdown
If there are other related content items, add links here. (Optional)

[Add link +]
~~~~

#### Optional field: Cross-instance forwarding (for remote content/users)

When reporting content or users from other ActivityPub instances, reporters may choose to forward the report:

~~~~ markdown
☐ Also send this report to the remote instance (@user@remote.example)

Note: If enabled, a Flag activity will be sent to the remote server's
moderators. Your identity will NOT be revealed to the remote instance;
only the report reason and target will be shared.
~~~~

This is opt-in by the reporter, not automatic.

### LLM-based code of conduct matching

When a report is submitted, the LLM analyzes the reason and identifies relevant code of conduct provisions.

> [!WARNING]
> **LLM analysis is a reference tool for moderators, not an automated decision system.** LLMs can exhibit biases, particularly against marginalized communities. Research by [Timnit Gebru], [DAIR Institute], and others has documented how AI systems can perpetuate harmful biases. The matching results should:
>
> - Be treated as one input among many, not as authoritative judgments
> - Always be reviewed and validated by human moderators
> - Never be used for automated actions without human oversight
> - Be monitored for patterns of bias over time

[Timnit Gebru]: https://en.wikipedia.org/wiki/Timnit_Gebru
[DAIR Institute]: https://www.dair-institute.org/

#### Matching process

 1. *Input composition*:
 - Reporter's written reason
 - Current version of full code of conduct
 - Reported content (if applicable)

 2. *LLM analysis*:
 - Analyze relevance between report reason and code of conduct provisions
 - Identify relevant provisions with confidence scores
 - Generate analysis summary

 3. *Result storage*:
 - List of matched provisions with confidence scores
 - LLM analysis summary
 - Code of conduct version identifier at report time

#### Code of conduct version management

 - Use Git commit hash of code of conduct file as version identifier
 - Store version identifier with report record
 - Moderators can reference the specific version when reviewing

#### Matching result usage

 - Results serve as reference material for moderator review
 - Moderators can modify or override matching results
 - Final confirmed violations are communicated to reported user

### Duplicate report handling

Multiple reports for the same content/user:

 - Reports for the same target are grouped into a single "report case"
 - Each report's reason is preserved individually
 - Displayed to moderators with report count
 - Higher report count increases priority

Rationale:

 - Multiple independent reporters finding the same issue suggests severity
 - Combining diverse report reasons enables more accurate judgment

### Report history

Reporters can check status of their submitted reports:

 - Viewable: Report target, date, their written reason, processing status (pending/reviewing/resolved)
 - Not viewable: Specific action details, existence of other reporters, reported user's appeal content

> [!NOTE]
> To prevent malicious actors from using reports to probe moderation boundaries, detailed outcome information (e.g., whether action was taken, what type of sanction) is intentionally limited.


Moderator processing
--------------------

### Moderation flow

~~~~ mermaid
flowchart TB
 pending[Report received pending status]
 check[Moderator reviews report]
 reviewing[Status: reviewing]

 subgraph review [Review Process]
 r1[Check reported content]
 r2[Review report reasons]
 r3[Check LLM analysis]
 r4[Review user history]
 r5[Understand context]
 end

 decision{Decision}
 dismiss[Dismiss]
 warn[Warning]
 action[Sanction]

 notify[Record action & notify • Action to reported user • Flag to remote server if opted in]

 pending --> check --> reviewing --> review
 review --> decision
 decision --> dismiss
 decision --> warn
 decision --> action
 dismiss --> notify
 warn --> notify
 action --> notify
~~~~

### Report states

| State | Description |
| ----------- | ---------------------------------- |
| `pending` | Report received, awaiting review |
| `reviewing` | Moderator is reviewing |
| `resolved` | Processing complete (action taken) |
| `dismissed` | Dismissed (not a violation) |

### Review checklist

Moderators review the following comprehensively:

#### Report information

 - Reporter's written reason
 - LLM-matched code of conduct provisions (as reference only; verify independently)
 - Report count (for duplicates)
 - Each reporter's reason (for duplicates)

#### Content information

 - Reported content original text
 - Content context (comment thread, etc.)
 - Snapshot at report time (if modified/deleted)

#### User information

 - Reported user's previous violation history
 - Previous warning/sanction records
 - Account creation date and activity period
 - Local/remote user status

### Action options

| Action | Description | Criteria |
| -------------------- | ------------------------- | ------------------------------------------------- |
| Dismiss | Judged not a violation | No code of conduct violation found |
| Warning | Send warning message | Minor violation, first offense |
| Content censorship | Hide the content | The content itself is problematic |
| Temporary suspension | Suspend account for period | Repeated violations or moderate severity |
| Permanent suspension | Permanently suspend account | Severe violation or persistent malicious behavior |

### Required inputs for actions

When taking action, moderators must record:

 - Violation provisions (final confirmation)
 - Action rationale (detailed description of judgment basis)
 - Message to reported user
 - (For temporary suspension) Suspension period: start date – end date


Reported user process
---------------------

### Report notification

Reported users receive notification of the report fact and reason.

#### Notification timing

 - *Not immediately notified*: No notification right after report submission (prevents stress from frivolous reports)
 - *Notified when*: After moderator reviews and decides on action; optionally for dismissed reports (moderator discretion for educational purposes)

#### Notification content for warnings/sanctions

~~~~ markdown
A report was received regarding your [content/account] and upon review,
it was determined to violate the code of conduct. The following action has been taken.

Violation:
[Related code of conduct provisions]

Target content:
[Content link if applicable]

Action:
[Warning / Content censorship / N-day suspension / Permanent suspension]

Moderator message:
[Moderator's written explanation]

If you have objections to this action, you can file an appeal
using the button below.

[File Appeal]
~~~~

#### Optional notification for dismissals

~~~~ markdown
A report was received regarding your [content/account], but upon review,
it was determined not to constitute a code of conduct violation.

However, please note that some community members may have felt uncomfortable.

Related information:
[Brief explanation]
~~~~

### Information visibility for reported users

| Information | Visible |
| --------------------------------------------- | ------- |
| Report fact | Yes |
| Code of conduct provisions cited as violation | Yes |
| Target content | Yes |
| Action details and duration | Yes |
| Moderator's judgment rationale | Yes |
| Reporter identity | No |
| Original report reason written by reporter | No |
| Number of reports | No |

### Restrictions during sanctions

#### Content censorship

 - Content hidden from timeline and search
 - Accessible via permalink with censorship notice displayed
 - Author can still view their own content

#### Temporary suspension

 - Cannot create new content
 - Cannot write comments
 - Cannot react
 - Cannot follow/unfollow
 - Can view existing content
 - Can receive DMs but cannot send

#### Permanent suspension

 - Cannot access account
 - All features disabled
 - Existing content hidden


Appeal process
--------------

### Appeal flow

~~~~ mermaid
flowchart TB
 start[Reported user files appeal]
 write[Write appeal reason]
 submit[Submit appeal]
 review[Moderator review preferably different moderator]

 decision{Decision}
 reject[Dismiss appeal]
 uphold[Maintain action]
 modify[Modify action]

 notify[Notify result • To reported user]

 start --> write --> submit --> review --> decision
 decision --> reject
 decision --> uphold
 decision --> modify
 reject --> notify
 uphold --> notify
 modify --> notify
~~~~

### Appeal eligibility

 - Only the sanctioned user can appeal
 - One appeal per action
 - Deadline: Within 14 days of action notification

### Appeal form

~~~~ markdown
Appeal reason:
[Explain why you believe this action is unjust]

Additional context or evidence:
[Provide any context or information you believe was not
considered in the decision]

[Submit]
~~~~

### Appeal review

 - Preferably reviewed by a different moderator than the original decision-maker
 - Comprehensive review of original report, action rationale, and appeal content
 - Check for new information or context

#### Decision options

 - *Appeal dismissed*: Original action maintained
 - *Action reduced*: Changed to lighter action (e.g., suspension → warning)
 - *Action withdrawn*: Action canceled and record corrected
 - *Action increased*: Rare cases where more severe violation discovered during appeal

### Result notification

#### To reported user

~~~~ markdown
Here is the result of your appeal review.

Decision: [Appeal dismissed / Action reduced / Action withdrawn]

Judgment rationale:
[Moderator's review result explanation]

(If applicable) Modified action:
[New action details]
~~~~


Penalty system
--------------

### Penalty types and criteria

#### Warning

 - Description: Lightest action informing of violation and requesting prevention of recurrence
 - Criteria: Minor code of conduct violation, first offense with no apparent malice, violation due to mistake or ignorance
 - Effect: Warning message sent, warning history recorded for future reference, may be removed from history after certain period (e.g., 1 year)
 - Accumulation: 3 warnings automatically triggers review for stronger action (but no automatic sanction; requires moderator judgment)

#### Content censorship

 - Description: Hide specific content from public areas
 - Criteria: The content itself violates code of conduct, the specific content is problematic rather than user's overall behavior
 - Effect: Content excluded from timeline, search, recommendations; permalink retained but shows censorship notice; `Delete` activity may be sent to federated servers

#### Temporary suspension

 - Description: Restrict account activity for a period
 - Criteria: Repeated violations despite warnings, moderate severity violation, immediate activity stop needed but not warranting permanent suspension
 - Duration: Minimum 1 day – Maximum 90 days
 - Minor repeated violations: 1–7 days
 - Moderate violations: 7–30 days
 - Severe violations (first offense): 30–90 days
 - Effect: Cannot create new content, cannot interact (reactions, comments), can view existing content, full functionality restored upon suspension end
 - For remote users: Federation blocked for the period within Hackers' Pub; if reporter opted in, notification sent to remote server admin via ActivityPub `Flag` activity

#### Permanent suspension

 - Description: Most severe action permanently deactivating the account
 - Criteria: Very severe code of conduct violation (hate speech, illegal content), same violation repeated after temporary suspension, clear malicious intent to harm community confirmed
 - Effect: Cannot log into account, all features disabled, public content hidden, profile page shows suspension notice
 - For remote users: Permanent federation block with Hackers' Pub; if reporter opted in, notification sent to remote server admin via ActivityPub `Flag` activity
 - Recovery: Permanent suspension is not restored in principle; in exceptional cases, re-review request possible after sufficient time

### Penalty history management

| Penalty | Retention period | Notes |
| -------------------- | ---------------- | ------------------------------------------------------------ |
| Warning | 1 year | Excluded from history if no additional violations for 1 year |
| Content censorship | Indefinite | Maintained as long as content exists |
| Temporary suspension | Indefinite | Record maintained, elapsed time considered in judgment |
| Permanent suspension | Indefinite | — |


ActivityPub federation handling
-------------------------------

### Overview

Hackers' Pub is part of a decentralized network using the ActivityPub protocol. The flag system must operate smoothly in this environment.

### `Flag` activity

The ActivityPub specification defines the `Flag` activity for propagating reports across the federated network.

~~~~ json
{
 "@context": "https://www.w3.org/ns/activitystreams",
 "type": "Flag",
 "actor": "https://hackerspub.example/users/moderator",
 "object": [
 "https://remote.example/users/reported_user",
 "https://remote.example/posts/problematic_post"
 ],
 "content": "Violation of Code of Conduct: harassment"
}
~~~~

### Cross-instance report forwarding

When a local user reports remote content or users, they may choose to forward the report to the remote instance.

> [!IMPORTANT]
> Cross-instance report forwarding is **opt-in by the reporter**, not automatic. This design choice:
>
> - Respects reporter agency: some reporters may not want their report shared beyond Hackers' Pub
> - Prevents potential harassment: automatic forwarding could be weaponized
> - Aligns with privacy expectations: reporters should control where their report goes

#### Forwarding process (when opted in)

 1. Reporter explicitly checks the forwarding option in the report form
 2. After moderator action (not immediately upon report submission), the `Flag` activity is sent
 3. The `Flag` activity includes the violation reason but NOT the reporter's identity
 4. Action by the remote server is at their discretion

### Remote content report processing

#### Report receipt

 1. Local user reports remote content/user
 2. Report saved to Hackers' Pub database
 3. Moderator reviews as with regular reports

#### Action application

 1. *Within Hackers' Pub*:
 - Hide/delete local cache of content
 - Block federation with user (temporary/permanent)

 2. *Remote server notification (only if reporter opted in)*:
 - Send `Flag` activity to remote server
 - Action by remote server is at their discretion

### Handling incoming `Flag` activities

When a `Flag` activity is received from another server:

 1. Receive and parse `Flag` activity
 2. Verify if target is local user/content
 3. Notify moderator as external report
 4. Moderator reviews and acts according to own judgment

External report display:

~~~~ markdown
[External Report] Received from remote.example

Target: @localuser's content
Reason: "Violation of our community guidelines"

* This report was received from an external server.
 Please judge according to our own code of conduct.
~~~~

> [!NOTE]
> External reports (incoming `Flag` activities) are treated differently from internal reports. They provide less context and come from unknown moderation cultures, so moderators should apply additional scrutiny.

### Mastodon compatibility

Mastodon is the most widely used ActivityPub implementation:

 - Support Mastodon's `Flag` activity format
 - Consider integration with Mastodon admin API (future)
 - Receive and process reports sent from Mastodon


Notification system
-------------------

### Notification types

| Type | Recipient | Content |
| ------------------- | ------------- | ------------------------------------- |
| `flag_received` | Moderator | New report received |
| `action_taken` | Reported user | Action has been taken |
| `appeal_received` | Moderator | Appeal received |
| `appeal_resolved` | Reported user | Appeal processing complete |
| `suspension_ending` | Reported user | Suspension end approaching |

### Notification channels

 - *In-app notification*: Default method
 - *Email*: For important notifications (actions, suspensions)
 - *ActivityPub*: For remote users, sent to their server


Privacy and security
--------------------

### Reporter anonymity protection

 - Filter reporter information from API responses
 - Display reporter information only in moderator UI
 - Mask reporter information in logs (as needed)

### Data access control

| Role | Accessible information |
| ------------- | ------------------------------------------------------------ |
| Regular user | Only their own report history |
| Reported user | Actions and reasons for themselves (excluding reporter info) |
| Moderator | All report information (including reporter info) |

### Content snapshots

Reasons for saving content snapshots at report time:

 - Preserve original evidence even if reported user modifies/deletes content
 - Maintain records for fair judgment
 - Reference material for appeals

Retention period:

 - Minimum 1 year after case closure
 - Longer retention if legally required

### Abuse prevention

#### False report prevention

 - Limit repeated reports of same target by same user
 - Possible sanctions for false reporters
 - Monitor report patterns

#### Report flood prevention

 - Rate limiting for many reports in short time
 - Alert moderators of abnormal patterns


Moderator dashboard
-------------------

### Dashboard overview

The moderator dashboard is the central hub for report management.

Main screens:

 1. Pending reports list
 2. Report detail and processing screen
 3. Appeals list
 4. Statistics and analysis
 5. Currently sanctioned users list

### Report list screen

The main report list provides moderators with an at-a-glance view of all pending reports, prioritized by urgency.

Key elements:

 - *Header*: Title with a link to statistics view
 - *Filters*: Dropdowns for status (all/pending/reviewing/resolved), state, and sort order; plus a search box
 - *High priority section*: Reports with 5+ flags are highlighted with a warning indicator and displayed at the top
 - *Report cards*: Each card shows:
 - Priority indicator (red/yellow/green based on report count)
 - Target identifier (user handle or content reference)
 - Report count in parentheses
 - Summary of report reasons
 - Time since first report

~~~~ markdown
┌─────────────────────────────────────────────────────────┐
│ Report Management [View Stats] │
├─────────────────────────────────────────────────────────┤
│ Filter: [All ▼] [Pending ▼] [Newest ▼] Search: [____]│
├─────────────────────────────────────────────────────────┤
│ │
│ ⚠️ High Priority (5+ reports) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 🔴 @user123's content (7 reports) │ │
│ │ "Hate speech", "Discriminatory language" +5 │ │
│ │ First report: 2 hours ago │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Regular Reports │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 🟡 @remote@other.server user (2 reports) │ │
│ │ "Spam behavior" │ │
│ │ First report: 5 hours ago │ │
│ └─────────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ 🟢 @newuser's comment (1 report) │ │
│ │ "Inappropriate language" │ │
│ │ Report: 1 day ago │ │
│ └─────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
~~~~

### Statistics screen

The statistics screen helps moderators understand reporting trends and measure moderation effectiveness.

Key features:

 - *Period selector*: Dropdown to set query range (e.g., last 7 days, 30 days, 90 days, or custom range)
 - *Summary metrics*: High-level KPIs for quick assessment
 - *Action distribution*: Breakdown of how reports were resolved, useful for identifying patterns (e.g., high dismissal rate may indicate unclear reporting guidelines)
 - *Violation types*: Most common violation categories, helps prioritize community guidelines and education efforts
 - *Export option*: Ability to export data for external analysis (future consideration)

#### Summary

These metrics provide a quick health check of the moderation queue:

| Item | Value |
| ----------------------- | --------: |
| Total reports | 127 |
| Processed | 98 (77%) |
| Average processing time | 4.2 hours |

#### Action distribution

Shows how reports were resolved. A healthy distribution typically shows most reports resulting in dismissals or warnings, with severe sanctions being rare. Unusual patterns (e.g., very high dismissal rate) may indicate issues with reporting guidelines or user education.

| Action | Count | Ratio |
| -------------------- | ----: | ----: |
| Dismissed | 45 | 46% |
| Warning | 38 | 39% |
| Content censorship | 10 | 10% |
| Temporary suspension | 4 | 4% |
| Permanent suspension | 1 | 1% |

#### Violation types (top 5)

Identifies the most common types of violations reported. This data helps:

 - Prioritize which code of conduct sections need clearer communication
 - Identify emerging problem areas (e.g., sudden spike in spam reports)
 - Guide community education efforts
 - Inform decisions about automated detection tools

| Rank | Type | Count |
| ---: | --------------------- | ----: |
| 1 | Spam/advertising | 32 |
| 2 | Hate speech | 24 |
| 3 | Harassment | 18 |
| 4 | Inappropriate content | 12 |
| 5 | Misinformation | 8 |


Future considerations
---------------------

### Automation features (for future consideration)

 - *Auto-hide*: Temporarily hide content when reports exceed threshold before moderator review
 - *AI-based pre-filtering*: Automatic detection of obvious violations
 - *Auto-spam handling*: Automatic action for obvious spam

> [!CAUTION]
> Automation features risk false positives and should be introduced carefully.

### Community participation

 - *Trusted reporters*: Higher weight for reports from users with accurate report history
 - *Community moderators*: Consider community moderator system to distribute moderator burden

### Multilingual support

 - Auto-translate report reasons (when moderator uses different language)
 - Integration with multilingual code of conduct versions
 - Multilingual templates for action notification messages

### Legal requirements

 - Procedures for data preservation/provision per legal requests
 - Separate procedure for copyright infringement reports (DMCA, etc.)
 - Law enforcement cooperation procedures


Database schema (draft)
-----------------------

### Design rationale

The schema separates *individual reports* (`flag`) from *cases* (`flag_case`) to handle duplicate reports elegantly. When multiple users report the same content, each report is preserved individually (maintaining reporter anonymity and capturing diverse perspectives), while moderators work with a single unified case.

Key design decisions:

 - *Separation of flag and flag_case*: Allows multiple reports to be grouped without losing individual report data. This is essential for both accurate statistics and protecting reporter privacy.
 - *Immutable actions*: Actions are recorded as separate records rather than updating the case, creating a full audit trail. If a case is reopened or an appeal changes the outcome, the history is preserved.
 - *Content snapshots*: Stored separately to preserve evidence even if the original content is modified or deleted. This ensures fair judgment and supports appeals.
 - *Code of conduct versioning*: The `coc_version` field captures the exact version of the code of conduct at report time, ensuring reports are evaluated in their original context even as rules evolve.

### Tables

#### `flag` table

Stores individual reports submitted by users. Each report is a separate record, even if multiple users report the same content.

Design notes:

 - `reporter_id` is nullable to support anonymous reports from external ActivityPub servers where we may not have a local account
 - `target_post_id` is nullable because user reports (as opposed to content reports) don't target a specific post
 - `llm_analysis` uses JSONB to flexibly store structured analysis results without schema constraints, allowing the LLM output format to evolve
 - `coc_version` stores the Git commit hash rather than a version number, providing an immutable reference to the exact code of conduct text
 - `forward_to_remote` tracks whether the reporter opted in to cross-instance forwarding

| Column | Type | Description |
| ------------------- | ------------- | -------------------------------------------------- |
| `id` | `uuid` | Primary key |
| `reporter_id` | `uuid` | Foreign key to `account` (nullable for anonymous/external) |
| `target_account_id` | `uuid` | Foreign key to `account` (reported user) |
| `target_post_id` | `uuid` | Foreign key to `post` (nullable, for content reports) |
| `reason` | `text` | Reporter's written reason |
| `coc_version` | `varchar` | Code of conduct Git commit hash at report time |
| `llm_analysis` | `jsonb` | LLM matching results |
| `status` | `varchar` | `pending`, `reviewing`, `resolved`, `dismissed` |
| `case_id` | `uuid` | Foreign key to `flag_case` (for grouping duplicates) |
| `forward_to_remote` | `boolean` | Whether reporter opted in to forward to remote instance |
| `created_at` | `timestamptz` | Report submission time |
| `updated_at` | `timestamptz` | Last update time |

#### `flag_case` table

Groups related reports into a single case for moderator review. This is the primary entity moderators interact with.

Design notes:

 - Cases are created automatically when the first report for a target is submitted
 - Subsequent reports for the same target are linked to the existing case
 - `assigned_moderator_id` enables workload distribution and prevents conflicts when multiple moderators are active
 - `resolved_at` is separate from `created_at` to track resolution time metrics

| Column | Type | Description |
| ----------------------- | ------------- | -------------------------------------------- |
| `id` | `uuid` | Primary key |
| `target_account_id` | `uuid` | Foreign key to `account` |
| `target_post_id` | `uuid` | Foreign key to `post` (nullable) |
| `status` | `varchar` | Case status |
| `assigned_moderator_id` | `uuid` | Foreign key to `account` (assigned moderator) |
| `created_at` | `timestamptz` | Case creation time |
| `resolved_at` | `timestamptz` | Case resolution time |

#### `flag_action` table

Records actions taken by moderators. Actions are immutable—if a decision changes, a new action record is created rather than updating the existing one.

Design notes:

 - Immutable records create a complete audit trail for accountability and appeals
 - `violated_provisions` is an array to support cases where multiple provisions are violated
 - `rationale` is required for transparency and to support consistent decision-making across moderators
 - `message_to_user` is stored separately from `rationale` because the internal rationale may contain details not appropriate to share with the reported user
 - Suspension timestamps are nullable since they only apply to suspension actions

| Column | Type | Description |
| -------------------- | ------------- | ------------------------------------------------ |
| `id` | `uuid` | Primary key |
| `case_id` | `uuid` | Foreign key to `flag_case` |
| `moderator_id` | `uuid` | Foreign key to `account` (moderator who took action) |
| `action_type` | `varchar` | `dismiss`, `warning`, `censor`, `suspend`, `ban` |
| `violated_provisions` | `text[]` | List of violated code of conduct provisions |
| `rationale` | `text` | Moderator's judgment rationale |
| `message_to_user` | `text` | Message sent to reported user |
| `suspension_start` | `timestamptz` | Suspension start (for temporary suspension) |
| `suspension_end` | `timestamptz` | Suspension end (for temporary suspension) |
| `created_at` | `timestamptz` | Action time |

#### `flag_appeal` table

Stores appeals filed by users against moderation actions.

Design notes:

 - Links to `flag_action` rather than `flag_case` because users appeal specific actions, and a case may have multiple actions over time
 - `reviewer_id` is tracked separately to ensure appeals are reviewed by a different moderator when possible
 - `result` is separate from `status` because a resolved appeal has both a completion status and an outcome
 - Both the appellant's reason and the reviewer's rationale are preserved for transparency

| Column | Type | Description |
| -------------------- | ------------- | ---------------------------------------------- |
| `id` | `uuid` | Primary key |
| `action_id` | `uuid` | Foreign key to `flag_action` |
| `appellant_id` | `uuid` | Foreign key to `account` (person filing appeal) |
| `reason` | `text` | Appeal reason |
| `additional_context` | `text` | Additional context or evidence |
| `status` | `varchar` | `pending`, `reviewing`, `resolved` |
| `result` | `varchar` | `dismissed`, `reduced`, `withdrawn`, `increased` |
| `reviewer_id` | `uuid` | Foreign key to `account` (reviewing moderator) |
| `review_rationale` | `text` | Review result rationale |
| `created_at` | `timestamptz` | Appeal submission time |
| `resolved_at` | `timestamptz` | Appeal resolution time |

#### `content_snapshot` table

Preserves the state of reported content at the time of the report.

Design notes:

 - Essential for fair judgment—reported users may edit or delete content after being reported
 - Both text and HTML versions are stored because some violations may only be apparent in the rendered form (e.g., misleading link text)
 - `metadata` captures contextual information like author display name, avatar, and timestamps that may change or be lost if the account is modified
 - Linked to `flag` rather than `flag_case` because different reporters may see different versions of content if it's edited between reports

| Column | Type | Description |
| -------------- | ------------- | -------------------------------------------------- |
| `id` | `uuid` | Primary key |
| `flag_id` | `uuid` | Foreign key to `flag` |
| `post_id` | `uuid` | Foreign key to `post` |
| `content` | `text` | Content text at snapshot time |
| `content_html` | `text` | Content HTML at snapshot time |
| `metadata` | `jsonb` | Additional metadata (author info, timestamps, etc.) |
| `created_at` | `timestamptz` | Snapshot creation time |

### Indexes

 - `flag(target_account_id)` — For querying reports by target user
 - `flag(target_post_id)` — For querying reports by target content
 - `flag(status)` — For filtering by status
 - `flag(case_id)` — For grouping by case
 - `flag_case(status)` — For moderator queue
 - `flag_case(assigned_moderator_id)` — For moderator workload
 - `flag_action(case_id)` — For case action history
 - `flag_appeal(action_id)` — For finding appeals by action


Implementation phases
---------------------

### Phase 1: Core infrastructure

 - Database schema and migrations
 - Basic flag model and CRUD operations
 - Content snapshot functionality

### Phase 2: User-facing features

 - Report UI for articles, notes, and users
 - Report form with free-form reason input
 - Report history page for users

### Phase 3: Moderator tools

 - Moderator dashboard
 - Report review and action interface
 - Statistics and analytics

### Phase 4: LLM integration

 - Code of conduct version management
 - LLM-based provision matching
 - Analysis result storage and display

### Phase 5: Notification system

 - In-app notifications for all parties
 - Email notifications for important actions
 - Notification preferences

### Phase 6: Appeal system

 - Appeal submission interface
 - Appeal review workflow
 - Result notifications

### Phase 7: ActivityPub federation

 - Outgoing `Flag` activity support (opt-in by reporter)
 - Incoming `Flag` activity handling
 - Remote user action handling

### Phase 8: Advanced features

 - Automation features (with careful consideration)
 - Advanced analytics
 - Multilingual support enhancements


Terminology reference
---------------------

| English | Korean | Description |
| --------------- | --------- | ----------------------------------------------- |
| flag/report | 신고 | Notifying of suspected violation content/user |
| code of conduct | 행동 강령 | Community rules |
| moderator | 관리자 | Person with report processing authority |
| censorship | 검열 | Hiding content |
| suspension | 정지 | Restricting account activity |
| appeal | 이의 제기 | Request for action re-review |
| federation | 연합 | Connection between decentralized networks |
| post | 콘텐츠 | Collective term for articles and notes |
| article | 게시글 | Long-form blog-style post |
| note | 단문 | Short microblog-style post |
| timeline | 타임라인 | Content feed |
| fediverse | 연합우주 | ActivityPub-based decentralized social network |
| instance | 인스턴스 | Individual server in the fediverse |


[code of conduct]: https://hackers.pub/coc

Column	Type	Description
`id`	`uuid`	Primary key
`reporter_id`	`uuid`	Foreign key to `account` (nullable for anonymous/external)
`target_account_id`	`uuid`	Foreign key to `account` (reported user)
`target_post_id`	`uuid`	Foreign key to `post` (nullable, for content reports)
`reason`	`text`	Reporter's written reason
`coc_version`	`varchar`	Code of conduct Git commit hash at report time
`llm_analysis`	`jsonb`	LLM matching results
`status`	`varchar`	`pending`, `reviewing`, `resolved`, `dismissed`
`case_id`	`uuid`	Foreign key to `flag_case` (for grouping duplicates)
`forward_to_remote`	`boolean`	Whether reporter opted in to forward to remote instance
`created_at`	`timestamptz`	Report submission time
`updated_at`	`timestamptz`	Last update time

Column	Type	Description
`id`	`uuid`	Primary key
`target_account_id`	`uuid`	Foreign key to `account`
`target_post_id`	`uuid`	Foreign key to `post` (nullable)
`status`	`varchar`	Case status
`assigned_moderator_id`	`uuid`	Foreign key to `account` (assigned moderator)
`created_at`	`timestamptz`	Case creation time
`resolved_at`	`timestamptz`	Case resolution time

Column	Type	Description
`id`	`uuid`	Primary key
`case_id`	`uuid`	Foreign key to `flag_case`
`moderator_id`	`uuid`	Foreign key to `account` (moderator who took action)
`action_type`	`varchar`	`dismiss`, `warning`, `censor`, `suspend`, `ban`
`violated_provisions`	`text[]`	List of violated code of conduct provisions
`rationale`	`text`	Moderator's judgment rationale
`message_to_user`	`text`	Message sent to reported user
`suspension_start`	`timestamptz`	Suspension start (for temporary suspension)
`suspension_end`	`timestamptz`	Suspension end (for temporary suspension)
`created_at`	`timestamptz`	Action time

Column	Type	Description
`id`	`uuid`	Primary key
`action_id`	`uuid`	Foreign key to `flag_action`
`appellant_id`	`uuid`	Foreign key to `account` (person filing appeal)
`reason`	`text`	Appeal reason
`additional_context`	`text`	Additional context or evidence
`status`	`varchar`	`pending`, `reviewing`, `resolved`
`result`	`varchar`	`dismissed`, `reduced`, `withdrawn`, `increased`
`reviewer_id`	`uuid`	Foreign key to `account` (reviewing moderator)
`review_rationale`	`text`	Review result rationale
`created_at`	`timestamptz`	Appeal submission time
`resolved_at`	`timestamptz`	Appeal resolution time

Column	Type	Description
`id`	`uuid`	Primary key
`flag_id`	`uuid`	Foreign key to `flag`
`post_id`	`uuid`	Foreign key to `post`
`content`	`text`	Content text at snapshot time
`content_html`	`text`	Content HTML at snapshot time
`metadata`	`jsonb`	Additional metadata (author info, timestamps, etc.)
`created_at`	`timestamptz`	Snapshot creation time

State	Description
`pending`	Report received, awaiting review
`reviewing`	Moderator is reviewing
`resolved`	Processing complete (action taken)
`dismissed`	Dismissed (not a violation)

Action	Description	Criteria
Dismiss	Judged not a violation	No code of conduct violation found
Warning	Send warning message	Minor violation, first offense
Content censorship	Hide the content	The content itself is problematic
Temporary suspension	Suspend account for period	Repeated violations or moderate severity
Permanent suspension	Permanently suspend account	Severe violation or persistent malicious behavior

Information	Visible
Report fact	Yes
Code of conduct provisions cited as violation	Yes
Target content	Yes
Action details and duration	Yes
Moderator's judgment rationale	Yes
Reporter identity	No
Original report reason written by reporter	No
Number of reports	No

Penalty	Retention period	Notes
Warning	1 year	Excluded from history if no additional violations for 1 year
Content censorship	Indefinite	Maintained as long as content exists
Temporary suspension	Indefinite	Record maintained, elapsed time considered in judgment
Permanent suspension	Indefinite	—

Type	Recipient	Content
`flag_received`	Moderator	New report received
`action_taken`	Reported user	Action has been taken
`appeal_received`	Moderator	Appeal received
`appeal_resolved`	Reported user	Appeal processing complete
`suspension_ending`	Reported user	Suspension end approaching

Role	Accessible information
Regular user	Only their own report history
Reported user	Actions and reasons for themselves (excluding reporter info)
Moderator	All report information (including reporter info)

Item	Value
Total reports	127
Processed	98 (77%)
Average processing time	4.2 hours

Action	Count	Ratio
Dismissed	45	46%
Warning	38	39%
Content censorship	10	10%
Temporary suspension	4	4%
Permanent suspension	1	1%

Rank	Type	Count
1	Spam/advertising	32
2	Hate speech	24
3	Harassment	18
4	Inappropriate content	12
5	Misinformation	8

English	Korean	Description
flag/report	신고	Notifying of suspected violation content/user
code of conduct	행동 강령	Community rules
moderator	관리자	Person with report processing authority
censorship	검열	Hiding content
suspension	정지	Restricting account activity
appeal	이의 제기	Request for action re-review
federation	연합	Connection between decentralized networks
post	콘텐츠	Collective term for articles and notes
article	게시글	Long-form blog-style post
note	단문	Short microblog-style post
timeline	타임라인	Content feed
fediverse	연합우주	ActivityPub-based decentralized social network
instance	인스턴스	Individual server in the fediverse

Implement flag (report) system for content moderation #192

Description

Summary

Core philosophy

Design principles

Reporter protection

Right to know for reported users

Flexible code of conduct referencing

Transparent processing

Graduated sanctions

Report targets

Content reports

Article reports

Note reports

User reports

Remote content and users

Report process

Report flow

Report form

Required field: Report reason (free-form text)

Optional field: Additional content links (for user reports)

Optional field: Cross-instance forwarding (for remote content/users)

LLM-based code of conduct matching

Matching process

Code of conduct version management

Matching result usage

Duplicate report handling

Report history

Moderator processing

Moderation flow

Report states

Review checklist

Report information

Content information

User information

Action options

Required inputs for actions

Reported user process

Report notification

Notification timing

Notification content for warnings/sanctions

Optional notification for dismissals

Information visibility for reported users

Restrictions during sanctions

Content censorship

Temporary suspension

Permanent suspension

Appeal process

Appeal flow

Appeal eligibility

Appeal form

Appeal review

Decision options

Result notification

To reported user

Penalty system

Penalty types and criteria

Warning

Content censorship

Temporary suspension

Permanent suspension

Penalty history management

ActivityPub federation handling

Overview

Flag activity

Cross-instance report forwarding

Forwarding process (when opted in)

Remote content report processing

Report receipt

Action application

Handling incoming Flag activities

Mastodon compatibility

Notification system

Notification types

Notification channels

Privacy and security

Reporter anonymity protection

Data access control

Content snapshots

Abuse prevention

`Flag` activity

Handling incoming `Flag` activities

`flag` table

`flag_case` table

`flag_action` table

`flag_appeal` table

`content_snapshot` table