A CONCEPTUAL TEXT ANALYZER-CLUSTERER

Aydin Manzouri, 2025

SUMMARY

This notebook provides a demo toolbox for conceptual analysis and clustering of text data.

Objective

To analyze and cluster texts based on their conceptual loads, via a hybrid concept-aggregate approach

filepath → read_txt → nlp → token_ext → concept_matcher → concept_aggregator

b.2. concept_aggregator gives a tuple (detailed, aggregated) of data

b.3. Functions json_saver and json_loader enable saving and loading the above data tuple in JSON format, resp.

b.4. Function aggreg_visu generates and saves a bar chart from aggregated

b.5. And function concept_heatmap generates and saves a heatmap from detailed

(C) Working with multiple documents

c.1. Function batch_preprocess loads multiple text files and prepares the data for the next steps

c.2. Function batch_plot generates a batch of a couple of both plot types

c.3. Functions batch_json_saver and batch_json_loader are batch-process analogs of their respective single-process functions

c.4. Function vectorizer converts batch-preprocessed data into vectorized format to be used in ML operations. It combines detailed and aggregated data into a single DataFrame

c.5. Finally, function cluster performs unsupervised learning, in the form of KMeans clustering. It:

receives data in vectorized format,
performs clustering,
applies PCA to high-dimensional data,
generates and saves the resulting 2D plot,
and returns a tuple (df_combo, cluster_labels)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
plots		plots
refs		refs
text-batch		text-batch
A_Conceptual_Text_Analyzer-Clusterer.ipynb		A_Conceptual_Text_Analyzer-Clusterer.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A CONCEPTUAL TEXT ANALYZER-CLUSTERER

Aydin Manzouri, 2025

SUMMARY

Objective

Contents

(A) General

(B) Working with single documents

(C) Working with multiple documents

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A CONCEPTUAL TEXT ANALYZER-CLUSTERER

Aydin Manzouri, 2025

SUMMARY

Objective

Contents

(A) General

(B) Working with single documents

(C) Working with multiple documents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages