-
Notifications
You must be signed in to change notification settings - Fork 3
Data Model
codewithsteve edited this page Jan 23, 2026
·
2 revisions
- Tracks individual files and their various processing states such as text extraction or thumbnail generation.
- Contains the raw file content, HTML structure, and the technical token/sentence arrays required for precise annotation.
- Monitors the background progress of automated tasks for each document.
- Acts as the bridge between users and the documents they are working on.
- Serves as the central "Unit of Work," linking a specific user to a document within a project to allow independent multi-user annotation.
- Stores the exact coordinates and content for highlights, bounding boxes, and sentence labels.
- Acts as the top-level container for all data; everything from documents to codes is scoped to a specific project.
- Stores essential profile information such as names, emails, and secure hashed credentials.
- Manages the access levels and permissions connecting users to their respective workspaces.
- Defines the labels available for annotation, including their hierarchy and visual properties like color.
- Provides a lightweight way to organize documents (e.g., "To Review", "Phase 1") outside of the formal codebook.
- Manages the entire AI-assisted labeling lifecycle from training to prediction.
- Stores the definition and training state of specific machine learning models.
- Tracks model performance metrics such as accuracy, F1 score, and precision.
- Provides a robust system for extending document information without altering the database schema.
- Defines the "schema" of custom fields (keys and types) available within a specific project.
- Stores the actual values—ranging from integers and strings to dates—associated with each document.
- Facilitates qualitative research and detailed documentation through user-created notes.
- Stores insights and observations that can be starred for importance.
- Acts as a universal link, enabling a single note to be attached to any database object, such as a document or a code.
- Records settings and configurations for tracking data trends over time.
- Stores specific parameters for analyzing how thematic concepts evolve through the document corpus.
- Enables automated organization of large datasets through machine learning.
- Defines the parameters and models used for embedding and clustering documents.
- Represents the calculated groups, including metadata like top descriptive words and spatial coordinates.
- Maps individual documents to these clusters and records the strength of their relationship.
In summary, Project and User tables control access; Source Document and Metadata organize the input data; Annotation and Memo capture the research work; while Analyses, Cluster, and Classifier provide advanced tools for automated discovery and AI integration.