Neural CRFs for Constituency Parsing

This repository implements a Neural Conditional Random Field (CRF) parser for constituency parsing, developed as part of a cse project submission for 291E - Advanced statistical NLP at UC San Diego.

It reproduces the log-space inside algorithm used for structured inference over all valid parse trees, integrated with BiLSTM encoders and biaffine span scoring.
The core focus is the vectorized implementation of the inside dynamic program using PyTorch tensors (stripe, diagonal, logsumexp).

Example: Constituency parse generated by the trained neural CRF model

➤ Overview

Component	Description
Goal	Efficient implementation of the Inside algorithm for Neural CRFs used in constituency parsing
Core Algorithm	Log-space dynamic programming (O(n³)) via vectorized tensor operations
Model	BiLSTM encoder + biaffine span scoring + CRF inference
Dataset	Penn Treebank (PTB) subsets — LE, first 2000, full
Framework	PyTorch
Performance	LF ≈ 91% (PTB_LE) • LF ≈ 84.7% (PTB_2K) • LF ≈ 87.9% (PTB_full)

➤ Repository Structure

NeuralCRFs_for_ConstituencyParsing/
│
├── Code/                         # Core implementation
│   ├── main.py                    # Entry point: training & inside algorithm
│   ├── model.py                   # BiLSTM + biaffine scorer
│   ├── crf.py                     # Inside algorithm & CRF logic
│   ├── utils.py                   # Masking, metrics, stripe()
│   └── train.py                   # Training / evaluation loops
│
├── Data/                          # Example/small PTB subsets (sample only)
├── Images/                        # Visuals and parse tree examples
├── outputs/                       # Training logs and evaluation metrics
├── Final_Report.pdf               # Full project report
├── requirements.txt               # Dependencies
└── README.md                      # (this file)

➤ Installation

# Clone the repository
git clone https://github.com/keerthanap8898/NeuralCRFs_for_ConstituencyParsing.git
cd NeuralCRFs_for_ConstituencyParsing/Code

# Create environment (Python 3.9+ recommended)
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install dependencies
pip install -r ../requirements.txt

➤ Usage

Train the model

python main.py --data ../Data/PTB_first_2000/ --epochs 20 --lr 0.001

Evaluate on dev/test set

python main.py --data ../Data/PTB_full/ --evaluate --checkpoint saved_model.pt

Visualize parses

Example parse outputs and score visualizations are available in Images/.

➤ Core Algorithm: Inside Function

LSTM-CRFs for Constituency Parsing: Neural parameterization overview.

The heart of this project is the vectorized inside() function: • Computes the CRF log-partition function over all valid binary parse trees. • Uses tensor diagonals and stripe slicing to compute span scores efficiently. • Operates entirely in log-space for numerical stability.

For details, see: • Code/main.py • Final_Report.pdf (Section: Inside Algorithm Design)

➤ Results Summary

Dataset UF LF UCM LCM

PTB_LE 93.6 91.0 69.9 66.9
PTB_first_2000 86.4 84.7 22.5 20.7
PTB_full 89.2 87.9 28.2 26.4

These results align closely with established baselines for span-based CRF parsers.

➤ References

1. Durrett & Klein (2015) — Neural CRF Parsing.
1. Dozat & Manning (2017) — Deep Biaffine Attention for Dependency Parsing.
1. CSE291E Assignment 3 — Neural CRFs for Constituency Parsing, UC San Diego.

➤ Author

Keerthana Purushotham github.com/keerthanap8898/bio

➤ License

This repository is released under the MIT License. You are free to use, modify, and distribute this work for educational and research purposes. Please cite the references above when using the code in academic work.

➤ Learning Value

This project demonstrates practical mastery of structured prediction, dynamic programming, and deep neural architectures in NLP. It serves as an educational, open-source reference implementation for the Neural CRF Inside algorithm, designed for clarity, reproducibility, and pedagogical value.

“Structured learning meets neural networks — understanding trees from tensors.”

Copyright ⓒ 2025  Keerthana Purushotham <keerthanap0808@gmail.com>, <kpurusho@ucsd.edu>, <keep.consult@proton.me>.
Licensed under MIT. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Neural CRFs for Constituency Parsing

➤ Overview

➤ Repository Structure

➤ Installation

➤ Usage

➤ Core Algorithm: Inside Function

➤ Results Summary

Dataset UF LF UCM LCM

➤ References

➤ Author

➤ License

➤ Learning Value

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Neural CRFs for Constituency Parsing

➤ Overview

➤ Repository Structure

➤ Installation

➤ Usage

➤ Core Algorithm: Inside Function

➤ Results Summary

Dataset UF LF UCM LCM

➤ References

➤ Author

➤ License

➤ Learning Value