A curated list of awesome resources, tools, datasets, papers, and libraries for computational linguistics and natural language processing (NLP).
Computational linguistics is an interdisciplinary field concerned with the computational aspects of human language. This list is useful for researchers, students, and developers working on NLP, linguistics, machine translation, and AI language systems.
- Books & Reading
- Courses & Tutorials
- Conferences & Journals
- Libraries & Frameworks
- Datasets & Corpora
- Tools & Platforms
- Research Papers
- Projects & Labs
- Related Awesome Lists
- Speech and Language Processing – The go-to textbook by Jurafsky and Martin.
- Foundations of Statistical Natural Language Processing – A foundational book by Manning and Schütze.
- Linguistic Fundamentals for Natural Language Processing – Introduces core linguistic concepts for NLP practitioners.
- CS224n – Stanford NLP – Deep learning for NLP.
- Coursera – Natural Language Processing Specialization – A four-course series by Deeplearning.ai.
- Fast.ai NLP Course – Practical course on NLP using fastai and PyTorch.
- ACL – Association for Computational Linguistics.
- NAACL – North American Chapter of the ACL.
- EMNLP – Conference on Empirical Methods in Natural Language Processing.
- Computational Linguistics Journal – Peer-reviewed journal published by MIT Press.
- spaCy – Industrial-strength NLP library in Python.
- NLTK – Classic Python toolkit for symbolic and statistical NLP.
- Stanza – Stanford NLP’s official Python NLP library.
- Hugging Face Transformers – State-of-the-art NLP models and tools.
- OpenNLP – Machine learning-based toolkit for processing natural language text.
- Universal Dependencies – Cross-linguistic grammatical annotations for many languages.
- Corpus of Contemporary American English (COCA) – 1+ billion words, genre-diverse English corpus.
- Project Gutenberg – Public domain books ideal for NLP.
- Common Crawl – Massive web crawl dataset useful for language modeling.
- Tatoeba – Multilingual sentence and translation corpus.
- brat – Web-based tool for text annotation.
- Prodigy – Annotation tool for training NLP models (by Explosion AI).
- UDPipe – Trainable pipeline for tokenization, POS tagging, lemmatization, and dependency parsing.
- TextBlob – Simplified text processing for Python.
- ACL Anthology – Searchable archive of computational linguistics papers.
- Papers with Code – NLP – NLP papers, datasets, and benchmarks with linked code.
- ArXiv.org – Computation and Language – Preprint archive for computational linguistics.
- Stanford NLP Group – One of the leading research groups in computational linguistics.
- MIT CSAIL NLP – Research in language understanding, modeling, and generation.
- AllenNLP – NLP research library and ecosystem by the Allen Institute for AI.
- Awesome NLP
- Awesome Linguistics
- Awesome Language Learning
- Awesome Speech Recognition
- Awesome Hugging Face
Contributions are welcome!