Skip to content

AILAB-CEFET-RJ/tcene

Repository files navigation

pt-dec

Build Status codecov Streamlit app

This repository offers a PyTorch implementation of a variant of the Deep Embedded Clustering (DEC) algorithm. The original code can be found at vlukiyanov/pt-dec. This implementation is compatible with PyTorch 1.0.0 and supports Python 3.6 and 3.7, with optional CUDA acceleration.

This follows (or attempts to; note this implementation is unofficial) the algorithm described in "Unsupervised Deep Embedding for Clustering Analysis" of Junyuan Xie, Ross Girshick, Ali Farhadi (https://arxiv.org/abs/1511.06335).

Installing

To set up the environment for running this code, you can create a new Conda environment with python 3.11 using the following command:

conda create --name your_env_name python=3.11

Replace your_env_name with a name of your choice for the environment.

Once your env activated, run:

pip install -r requirements.txt

Troubleshooting

If Conda is not installed on your system, follow these steps to install Miniconda:

  1. Download the Miniconda installer:

    cd ~
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
  2. Make the installer executable:

    chmod +x Miniconda3-latest-Linux-x86_64.sh
  3. Run the installer:

    ./Miniconda3-latest-Linux-x86_64.sh
  4. Reload your shell configuration:

    source ~/.bashrc
  5. Verify the installation:

    conda --version

Refer to the official Miniconda documentation for more details.

Usage

1. Data Preparation

  • Transform data to Parquet format:
    Run the following script to convert your data:

    python tools/transform_to_parquet.py
  • Generate embeddings:
    Create embeddings for your data with:

    python tools/generate_embeddings.py

2. Model Training

  • Train autoencoder and DEC models:
    Use the main training script with customizable options:

    python tcc.py [OPTIONS]

    Key options:

    • --cuda: Use CUDA for acceleration (default: False)
    • --testing-mode: Run in testing mode (default: False)
    • --train-autoencoder: Train the autoencoder from scratch or load an existing one (default: True)
    • --sort-by-elem: Split data by "ElemDespesaTCE" and cluster each part separately (default: False)

    Example usage:

        python tcc.py --train-autoencoder False --sort-by-elem True

3. Running the Application

  • Create the vector store:
    Before running the app, generate the vector store:

    python chroma_vector_store.py
  • Launch the Streamlit app :streamlit: 🚀: Start the application with:

    streamlit run app.py

Other implementations of DEC

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •