This project investigates the landscape of mis/disinformation* and content integrity on TikTok by analyzing metadata and moderator reports on how content is flagged as "claims" versus "opinions." Using a dataset of over 19,000 TikTok videos, the dashboard visualizes the flow of content verification and examines whether video metadata--specifically duration--correlates with content integrity. The main insight demonstrates that claim-based videos are often shorter and less verified, creating a high-speed environment where malcontent can thrive. By mapping these content journeys, the project aims to identify patterns in how misinformation spreads.
*mis = unintentional; dis = intentional
├───assets
│ └───styles.css
├───data
│ └───tiktok_dataset.csv
├───notebooks
│ └───data_extraction.ipynb
├───pages
│ └───about.py
│ └───duration.py
│ └───home.py
│ └───relations.py
│ └───wordcloud.py
├───results
│ └───duration_dynamics.html
│ └───tiktok_sankey.html
├───src
│ └───content_journey_sankey.py
│ └───duration_content_type_kde.py
├───.gitignore
├───.python-version
├───app.py
├───pyproject.toml
├───README.md
├───requirements.txt
└───uv.lock
The analysis uses a pedagogical dataset provided by Ramin Huseyn on Kaggle. It contains video metadata, verification statuses, and moderator review labels.
- Source:
raminhuseyn/dataset-from-tiktokvia Kaggle API. - Location: Data should be stored in the
data/directory. - Format: Comma separated values.
- Access: To refresh/download the data, you must provide a Kaggle API key. Instructions to procure one are provided below. Alternatively, you can download the data directly from Kaggle and move it into the
data/directory. - License: The creator, Ramin Huseyn, has licensed this dataset under the Public Domain (CC0).
You can set up this project using either uv (recommended for speed or beginners) or the standard venv.
If you don't have uv installed, follow the instructions for your OS:
# macOS / Linux / WSL
curl -LsSf [https://astral.sh/uv/install.sh](https://astral.sh/uv/install.sh) | shRestart your terminal or run source ~/.bashrc (Linux) / source ~/.zshrc (macOS) after installation.
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm [https://astral.sh/uv/install.ps1](https://astral.sh/uv/install.ps1) | iex"
# Windows (winget)
winget install --id=astral-sh.uv -eTo verify installation, run uv --version.
# Clone the repository
git clone https://github.com/lowellmonis/tiktok-dashboard.git
cd tiktok-dashboard
# Create environment and install dependencies; you should not need to create an environment from scratch
uv sync# Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# To deactivate
deactivateTo pull the data, you need a .env file in the root directory:
- Go to Kaggle Settings > API > Create New Token.
- Create a file named
.envand add:KAGGLE_API_TOKEN=your_token_here
Follow these steps to go from a fresh clone to a fully interactive dashboard.
Follow the instructions on the extraction notebook to pull the dataset directly from Kaggle and place it in the data/ folder.
To open JupyterLab:
# If using uv
uv run jupyter lab
# If using venv
jupyter labOnce Jupyter Lab opens, navigate to the notebooks/ folder and open data_extraction.ipynb.
- Expected Outcome: A file named
tiktok_dataset.csvwill appear in yourdata/directory.
Step 2: Generate Standalone Visualizations (Recommended to experiment with uv; visualizations already exist)
If you want to generate the specific HTML results (Sankey and KDE plots) located in the results/ folder, run the source scripts from the root:
# If using uv
uv run src/content_journey_sankey.py
uv run src/duration_content_type_kde.py
# If using venv
python src/content_journey_sankey.py
python src/duration_content_type_kde.py
- Expected Outcome: New
.htmlfiles will be created in theresults/directory showing content flows and duration dynamics.
Run the Dash application locally:
python app.py
- Expected Outcome: The terminal will provide a local URL (e.g.,
http://127.0.0.1:8050). Open this in your browser to engage with the dashboard.
Tip
You can also view the live deployment here.
- Pathing: This project uses relative paths (e.g.,
../data/). If you run scripts from inside thesrcfolder instead of the root, they may fail to find the CSV. Always run from the root. - Python Version: Requires Python 3.9+ due to specific dataframe operations and
kagglehubrequirements. Always use a virtual environment to avoid any errors. Ifpythondoes not work on the terminal, trypython3. - File Signature Error: If you see
PKcharacters when opening the CSV, the file is still zipped. Ensure you have run the extraction logic indata_extraction.ipynbwhich handleszipfileunbundling. - Memory: The KDE calculation in
src/usesscipy.stats.gaussian_kde, which can be memory-intensive on very old hardware but should run fine on standard laptops. - Naming conventions: Check the
.gitignorefor what file names you can't use (likesandbox). If you really want to use that name, remove it from the.gitignorefile. - Kaggle Auth Fail: If the download fails, ensure your
.envfile is in the root directory and yourKAGGLE_API_TOKENis correct. - ModuleNotFoundError: If a package is missing in Jupyter, ensure you have selected the correct kernel (usually named
.venvorpython3) from the top-right corner of the notebook. - uv: command not found: Your PATH may not be updated. Try restarting your terminal or manually sourcing your profile (e.g.,
source ~/.bashrc). - PowerShell Execution Policy: If the Windows installation fails, run PowerShell as Administrator and execute:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser, then try again.
Contributions are welcome! To propose changes:
- Fork the repository.
- Create a Feature Branch (
git checkout -b feature/AmazingFeature). - Commit your changes (
git commit -m 'Add some AmazingFeature'). - Push to the branch (
git push origin feature/AmazingFeature). - Open a Pull Request.
For bugs, please open an Issue with a detailed description and steps to reproduce. Thank you!