pt-dec

This repository offers a PyTorch implementation of a variant of the Deep Embedded Clustering (DEC) algorithm. The original code can be found at vlukiyanov/pt-dec. This implementation is compatible with PyTorch 1.0.0 and supports Python 3.6 and 3.7, with optional CUDA acceleration.

This follows (or attempts to; note this implementation is unofficial) the algorithm described in "Unsupervised Deep Embedding for Clustering Analysis" of Junyuan Xie, Ross Girshick, Ali Farhadi (https://arxiv.org/abs/1511.06335).

Installing

To set up the environment for running this code, you can create a new Conda environment with python 3.11 using the following command:

conda create --name your_env_name python=3.11

Replace your_env_name with a name of your choice for the environment.

Once your env activated, run:

pip install -r requirements.txt

Troubleshooting

If Conda is not installed on your system, follow these steps to install Miniconda:

Download the Miniconda installer:

cd ~
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Make the installer executable:

chmod +x Miniconda3-latest-Linux-x86_64.sh

Run the installer:
```
./Miniconda3-latest-Linux-x86_64.sh
```
Reload your shell configuration:
```
source ~/.bashrc
```
Verify the installation:
```
conda --version
```

Refer to the official Miniconda documentation for more details.

Usage

1. Data Preparation

Transform data to Parquet format:
Run the following script to convert your data:
```
python tools/transform_to_parquet.py
```
Generate embeddings:
Create embeddings for your data with:
```
python tools/generate_embeddings.py
```

2. Model Training

Train autoencoder and DEC models:
Use the main training script with customizable options:
```
python tcc.py [OPTIONS]
```
Key options:
- --cuda: Use CUDA for acceleration (default: False)
- --testing-mode: Run in testing mode (default: False)
- --train-autoencoder: Train the autoencoder from scratch or load an existing one (default: True)
- --sort-by-elem: Split data by "ElemDespesaTCE" and cluster each part separately (default: False)
Example usage:
```
    python tcc.py --train-autoencoder False --sort-by-elem True
```

3. Running the Application

Create the vector store:
Before running the app, generate the vector store:
```
python chroma_vector_store.py
```
Launch the Streamlit app :streamlit: 🚀: Start the application with:
```
streamlit run app.py
```

Other implementations of DEC

Original Caffe: https://github.com/piiswrong/dec
PyTorch: https://github.com/CharlesNord/DEC-pytorch and https://github.com/eelxpeng/dec-pytorch
Keras: https://github.com/XifengGuo/DEC-keras and https://github.com/fferroni/DEC-Keras
MXNet: https://github.com/apache/incubator-mxnet/blob/master/example/deep-embedded-clustering/dec.py
Chainer: https://github.com/ymym3412/DeepEmbeddedClustering

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
EDA		EDA
data		data
evaluation		evaluation
intuition		intuition
logs/projector		logs/projector
ptdec		ptdec
ptsdae		ptsdae
tests		tests
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config.yaml		config.yaml
generate_3d_plot.py		generate_3d_plot.py
generate_vector_store.py		generate_vector_store.py
requirements.txt		requirements.txt
tcc.py		tcc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pt-dec

Installing

Troubleshooting

Usage

1. Data Preparation

2. Model Training

3. Running the Application

Other implementations of DEC

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

AILAB-CEFET-RJ/tcene

Folders and files

Latest commit

History

Repository files navigation

pt-dec

Installing

Troubleshooting

Usage

1. Data Preparation

2. Model Training

3. Running the Application

Other implementations of DEC

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages