This repository implements a Neural Conditional Random Field (CRF) parser for constituency parsing, developed as part of a cse project submission for 291E - Advanced statistical NLP at UC San Diego.
It reproduces the log-space inside algorithm used for structured inference over all valid parse trees, integrated with BiLSTM encoders and biaffine span scoring.
The core focus is the vectorized implementation of the inside dynamic program using PyTorch tensors (stripe, diagonal, logsumexp).

Example: Constituency parse generated by the trained neural CRF model
| Component | Description |
|---|---|
| Goal | Efficient implementation of the Inside algorithm for Neural CRFs used in constituency parsing |
| Core Algorithm | Log-space dynamic programming (O(n³)) via vectorized tensor operations |
| Model | BiLSTM encoder + biaffine span scoring + CRF inference |
| Dataset | Penn Treebank (PTB) subsets — LE, first 2000, full |
| Framework | PyTorch |
| Performance | LF ≈ 91% (PTB_LE) • LF ≈ 84.7% (PTB_2K) • LF ≈ 87.9% (PTB_full) |
NeuralCRFs_for_ConstituencyParsing/
│
├── Code/ # Core implementation
│ ├── main.py # Entry point: training & inside algorithm
│ ├── model.py # BiLSTM + biaffine scorer
│ ├── crf.py # Inside algorithm & CRF logic
│ ├── utils.py # Masking, metrics, stripe()
│ └── train.py # Training / evaluation loops
│
├── Data/ # Example/small PTB subsets (sample only)
├── Images/ # Visuals and parse tree examples
├── outputs/ # Training logs and evaluation metrics
├── Final_Report.pdf # Full project report
├── requirements.txt # Dependencies
└── README.md # (this file)
# Clone the repository
git clone https://github.com/keerthanap8898/NeuralCRFs_for_ConstituencyParsing.git
cd NeuralCRFs_for_ConstituencyParsing/Code
# Create environment (Python 3.9+ recommended)
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
# Install dependencies
pip install -r ../requirements.txt
Train the model
python main.py --data ../Data/PTB_first_2000/ --epochs 20 --lr 0.001
Evaluate on dev/test set
python main.py --data ../Data/PTB_full/ --evaluate --checkpoint saved_model.pt
Visualize parses
Example parse outputs and score visualizations are available in Images/.

LSTM-CRFs for Constituency Parsing: Neural parameterization overview.
The heart of this project is the vectorized inside() function: • Computes the CRF log-partition function over all valid binary parse trees. • Uses tensor diagonals and stripe slicing to compute span scores efficiently. • Operates entirely in log-space for numerical stability.
For details, see:
• Code/main.py
• Final_Report.pdf (Section: Inside Algorithm Design)
- PTB_LE 93.6 91.0 69.9 66.9
- PTB_first_2000 86.4 84.7 22.5 20.7
- PTB_full 89.2 87.9 28.2 26.4
These results align closely with established baselines for span-based CRF parsers.
-
- Durrett & Klein (2015) — Neural CRF Parsing.
-
- Dozat & Manning (2017) — Deep Biaffine Attention for Dependency Parsing.
-
- CSE291E Assignment 3 — Neural CRFs for Constituency Parsing, UC San Diego.
Keerthana Purushotham github.com/keerthanap8898/bio
This repository is released under the MIT License. You are free to use, modify, and distribute this work for educational and research purposes. Please cite the references above when using the code in academic work.
This project demonstrates practical mastery of structured prediction, dynamic programming, and deep neural architectures in NLP. It serves as an educational, open-source reference implementation for the Neural CRF Inside algorithm, designed for clarity, reproducibility, and pedagogical value.
“Structured learning meets neural networks — understanding trees from tensors.”
Copyright ⓒ 2025 Keerthana Purushotham <keerthanap0808@gmail.com>, <kpurusho@ucsd.edu>, <keep.consult@proton.me>.
Licensed under MIT. See LICENSE for details.