Neural CRFs for Constituency Parsing

This repository implements a Neural Conditional Random Field (CRF) parser for constituency parsing, developed as part of a cse project submission for 291E - Advanced statistical NLP at UC San Diego.

It reproduces the log-space inside algorithm used for structured inference over all valid parse trees, integrated with BiLSTM encoders and biaffine span scoring.
The core focus is the vectorized implementation of the inside dynamic program using PyTorch tensors (stripe, diagonal, logsumexp).

Example: Constituency parse generated by the trained neural CRF model

➤ Overview

Component	Description
Goal	Efficient implementation of the Inside algorithm for Neural CRFs used in constituency parsing
Core Algorithm	Log-space dynamic programming (O(n³)) via vectorized tensor operations
Model	BiLSTM encoder + biaffine span scoring + CRF inference
Dataset	Penn Treebank (PTB) subsets — LE, first 2000, full
Framework	PyTorch
Performance	LF ≈ 91% (PTB_LE) • LF ≈ 84.7% (PTB_2K) • LF ≈ 87.9% (PTB_full)

➤ Repository Structure

NeuralCRFs_for_ConstituencyParsing/
│
├── Code/                         # Core implementation
│   ├── main.py                    # Entry point: training & inside algorithm
│   ├── model.py                   # BiLSTM + biaffine scorer
│   ├── crf.py                     # Inside algorithm & CRF logic
│   ├── utils.py                   # Masking, metrics, stripe()
│   └── train.py                   # Training / evaluation loops
│
├── Data/                          # Example/small PTB subsets (sample only)
├── Images/                        # Visuals and parse tree examples
├── outputs/                       # Training logs and evaluation metrics
├── Final_Report.pdf               # Full project report
├── requirements.txt               # Dependencies
└── README.md                      # (this file)

➤ Installation

# Clone the repository
git clone https://github.com/keerthanap8898/NeuralCRFs_for_ConstituencyParsing.git
cd NeuralCRFs_for_ConstituencyParsing/Code

# Create environment (Python 3.9+ recommended)
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install dependencies
pip install -r ../requirements.txt

➤ Usage

Train the model

python main.py --data ../Data/PTB_first_2000/ --epochs 20 --lr 0.001

Evaluate on dev/test set

python main.py --data ../Data/PTB_full/ --evaluate --checkpoint saved_model.pt

Visualize parses

Example parse outputs and score visualizations are available in Images/.

➤ Core Algorithm: Inside Function

LSTM-CRFs for Constituency Parsing: Neural parameterization overview.

The heart of this project is the vectorized inside() function: • Computes the CRF log-partition function over all valid binary parse trees. • Uses tensor diagonals and stripe slicing to compute span scores efficiently. • Operates entirely in log-space for numerical stability.

For details, see: • Code/main.py • Final_Report.pdf (Section: Inside Algorithm Design)

➤ Results Summary

Dataset UF LF UCM LCM

PTB_LE 93.6 91.0 69.9 66.9
PTB_first_2000 86.4 84.7 22.5 20.7
PTB_full 89.2 87.9 28.2 26.4

These results align closely with established baselines for span-based CRF parsers.

➤ References

1. Durrett & Klein (2015) — Neural CRF Parsing.
1. Dozat & Manning (2017) — Deep Biaffine Attention for Dependency Parsing.
1. CSE291E Assignment 3 — Neural CRFs for Constituency Parsing, UC San Diego.

➤ Author

Keerthana Purushotham github.com/keerthanap8898/bio

➤ License

This repository is released under the MIT License. You are free to use, modify, and distribute this work for educational and research purposes. Please cite the references above when using the code in academic work.

➤ Learning Value

This project demonstrates practical mastery of structured prediction, dynamic programming, and deep neural architectures in NLP. It serves as an educational, open-source reference implementation for the Neural CRF Inside algorithm, designed for clarity, reproducibility, and pedagogical value.

“Structured learning meets neural networks — understanding trees from tensors.”

Copyright ⓒ 2025  Keerthana Purushotham <keerthanap0808@gmail.com>, <kpurusho@ucsd.edu>, <keep.consult@proton.me>.
Licensed under MIT. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural CRFs for Constituency Parsing

➤ Overview

➤ Repository Structure

➤ Installation

➤ Usage

➤ Core Algorithm: Inside Function

➤ Results Summary

Dataset UF LF UCM LCM

➤ References

➤ Author

➤ License

➤ Learning Value

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Code		Code
Data		Data
Images		Images
outputs		outputs
.DS_Store		.DS_Store
Final_Report.pdf		Final_Report.pdf
LICENSE		LICENSE
README.md		README.md
Task.pdf		Task.pdf

Folders and files

Latest commit

History

Repository files navigation

Neural CRFs for Constituency Parsing

➤ Overview

➤ Repository Structure

➤ Installation

➤ Usage

➤ Core Algorithm: Inside Function

➤ Results Summary

Dataset UF LF UCM LCM

➤ References

➤ Author

➤ License

➤ Learning Value

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages