Skip to content
codewithsteve edited this page Jan 23, 2026 · 2 revisions

Data Model

ER Diagram of DATS SQL Database

dats_Diagram drawio

Table Overview

Source Document

  • Tracks individual files and their various processing states such as text extraction or thumbnail generation.
  • Contains the raw file content, HTML structure, and the technical token/sentence arrays required for precise annotation.
  • Monitors the background progress of automated tasks for each document.

Annotations

  • Acts as the bridge between users and the documents they are working on.
  • Serves as the central "Unit of Work," linking a specific user to a document within a project to allow independent multi-user annotation.
  • Stores the exact coordinates and content for highlights, bounding boxes, and sentence labels.

Project & User Management

  • Acts as the top-level container for all data; everything from documents to codes is scoped to a specific project.
  • Stores essential profile information such as names, emails, and secure hashed credentials.
  • Manages the access levels and permissions connecting users to their respective workspaces.

Code & Tag

  • Defines the labels available for annotation, including their hierarchy and visual properties like color.
  • Provides a lightweight way to organize documents (e.g., "To Review", "Phase 1") outside of the formal codebook.

Classifiers

  • Manages the entire AI-assisted labeling lifecycle from training to prediction.
  • Stores the definition and training state of specific machine learning models.
  • Tracks model performance metrics such as accuracy, F1 score, and precision.

Metadata

  • Provides a robust system for extending document information without altering the database schema.
  • Defines the "schema" of custom fields (keys and types) available within a specific project.
  • Stores the actual values—ranging from integers and strings to dates—associated with each document.

Memo

  • Facilitates qualitative research and detailed documentation through user-created notes.
  • Stores insights and observations that can be starred for importance.
  • Acts as a universal link, enabling a single note to be attached to any database object, such as a document or a code.

Analyses

  • Records settings and configurations for tracking data trends over time.
  • Stores specific parameters for analyzing how thematic concepts evolve through the document corpus.

Cluster

  • Enables automated organization of large datasets through machine learning.
  • Defines the parameters and models used for embedding and clustering documents.
  • Represents the calculated groups, including metadata like top descriptive words and spatial coordinates.
  • Maps individual documents to these clusters and records the strength of their relationship.

Summary

In summary, Project and User tables control access; Source Document and Metadata organize the input data; Annotation and Memo capture the research work; while Analyses, Cluster, and Classifier provide advanced tools for automated discovery and AI integration.

Clone this wiki locally