In this research, we present a groundbreaking short text classification method for digital forensic analysis that effectively computes probability scores for target topics within a corpus of conversational texts. Unlike traditional state-of-the-art text classification methods, which depend on a trained model, extensive training data, human input, and a large corpus for efficient inference, our innovative approach operates independently of these constraints. We leverage the Sentence Transformer to generate high-quality embeddings and rigorously compare our model's performance with other embedding techniques, such as Word2Vec and Fast Text. Moreover, we evaluate our method against zero-shot and few-shot models. Our experiments involve two authoritative benchmarks: Daily Dialog {Lhoest_Datasets_A_Community_2021} and Dialog Sum {chen-etal-2021-dialogsum} data. The empirical results unequivocally demonstrate that our model outperforms traditional text classification techniques, confirming its effectiveness in this domain.
The dialog sum {chen-etal-2021-dialogsum} data are available at:
https://drive.google.com/drive/folders/1VnW2__6D2RtI0TMP7Ggsyp20TKeevDvq?usp=sharing
The daily dialog {Lhoest_Datasets_A_Community_2021} data available at:
https://huggingface.co/datasets/peandrew/dialy_dialogue_with_recoginized_concept_raw
- The daily dialog {Lhoest_Datasets_A_Community_2021} application available at the following google colab:
https://colab.research.google.com/drive/1QJY60RVnX5etwU0wLImPXU6ra5NpEDVk?usp=sharing
- The dialog sum {chen-etal-2021-dialogsum} application available at the following google colab:
https://colab.research.google.com/drive/15D03KLZTzLk0M5vSlEeUWiNhYRJvU33f?usp=sharing
- Comparison with the current state-of-the-arts embedding methods
https://colab.research.google.com/drive/1D6TUZLvSrbIJXcb3Z7l4KDYbWWZPrFJR?usp=sharing