Neural Stylometry: Authorship Attribution in Gujarati Literature

This repository contains the code and findings for an end-to-end machine learning project on authorship attribution for Gujarati literary texts. The project conducts a comparative analysis between a traditional LSTM model and a modern, pre-trained Transformer model on a small, imbalanced, low-resource dataset.

🧠 Key Finding

The core finding of this project is the dramatic performance difference between a model trained from scratch and one using transfer learning. On a highly imbalanced dataset, the pre-trained Transformer model achieved 97.7% accuracy, while the LSTM model failed with a misleading 59.0% accuracy, proving unable to learn the features of the minority-class authors.

📊 Performance Summary

Model	Overall Accuracy	F1-Score (Kalapi)	F1-Score (Mehta)	F1-Score (Meghani)
📉 LSTM	59.0%	0.74	0.00	0.00
🚀 Transformer	97.7%	0.977	0.977	0.977

📈 Visualizations

Overall Accuracy Comparison

The Transformer model achieved a 38.7 percentage point improvement in accuracy, demonstrating its superior ability to generalize from limited data. This highlights the power of transfer learning in low-resource scenarios.

F1-Score Comparison per Author

This graph clearly shows the LSTM's failure. It achieved a zero F1-score for two of the three authors, indicating it was a biased model that only learned to predict a single author. The Transformer, in contrast, performed robustly across all classes, proving its effectiveness.

🧾 Conclusion

This project successfully demonstrates that for NLP tasks in low-resource languages like Gujarati, transfer learning with pre-trained models is a significantly more effective strategy than training simpler models from scratch, especially when dealing with real-world data imbalance.

The complete methodology, from data scraping to model training and evaluation, is available in the src directory for review.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
hf_deployment		hf_deployment
research_paper		research_paper
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.safetensors		model.safetensors
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Stylometry: Authorship Attribution in Gujarati Literature

🧠 Key Finding

📊 Performance Summary

📈 Visualizations

Overall Accuracy Comparison

F1-Score Comparison per Author

🧾 Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neural Stylometry: Authorship Attribution in Gujarati Literature

🧠 Key Finding

📊 Performance Summary

📈 Visualizations

Overall Accuracy Comparison

F1-Score Comparison per Author

🧾 Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages