Skip to content

Latest commit

 

History

History
138 lines (92 loc) · 5.03 KB

File metadata and controls

138 lines (92 loc) · 5.03 KB

Neural CRFs for Constituency Parsing

This repository implements a Neural Conditional Random Field (CRF) parser for constituency parsing, developed as part of a cse project submission for 291E - Advanced statistical NLP at UC San Diego.

It reproduces the log-space inside algorithm used for structured inference over all valid parse trees, integrated with BiLSTM encoders and biaffine span scoring.
The core focus is the vectorized implementation of the inside dynamic program using PyTorch tensors (stripe, diagonal, logsumexp).

Example Parse Tree
Example: Constituency parse generated by the trained neural CRF model

➤ Overview

Component Description
Goal Efficient implementation of the Inside algorithm for Neural CRFs used in constituency parsing
Core Algorithm Log-space dynamic programming (O(n³)) via vectorized tensor operations
Model BiLSTM encoder + biaffine span scoring + CRF inference
Dataset Penn Treebank (PTB) subsets — LE, first 2000, full
Framework PyTorch
Performance LF ≈ 91% (PTB_LE) • LF ≈ 84.7% (PTB_2K) • LF ≈ 87.9% (PTB_full)

➤ Repository Structure

NeuralCRFs_for_ConstituencyParsing/
│
├── Code/                         # Core implementation
│   ├── main.py                    # Entry point: training & inside algorithm
│   ├── model.py                   # BiLSTM + biaffine scorer
│   ├── crf.py                     # Inside algorithm & CRF logic
│   ├── utils.py                   # Masking, metrics, stripe()
│   └── train.py                   # Training / evaluation loops
│
├── Data/                          # Example/small PTB subsets (sample only)
├── Images/                        # Visuals and parse tree examples
├── outputs/                       # Training logs and evaluation metrics
├── Final_Report.pdf               # Full project report
├── requirements.txt               # Dependencies
└── README.md                      # (this file)

➤ Installation

# Clone the repository
git clone https://github.com/keerthanap8898/NeuralCRFs_for_ConstituencyParsing.git
cd NeuralCRFs_for_ConstituencyParsing/Code

# Create environment (Python 3.9+ recommended)
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows

# Install dependencies
pip install -r ../requirements.txt

➤ Usage

Train the model

python main.py --data ../Data/PTB_first_2000/ --epochs 20 --lr 0.001

Evaluate on dev/test set

python main.py --data ../Data/PTB_full/ --evaluate --checkpoint saved_model.pt

Visualize parses

Example parse outputs and score visualizations are available in Images/.

➤ Core Algorithm: Inside Function

Example Parse Tree
LSTM-CRFs for Constituency Parsing: Neural parameterization overview.

The heart of this project is the vectorized inside() function: • Computes the CRF log-partition function over all valid binary parse trees. • Uses tensor diagonals and stripe slicing to compute span scores efficiently. • Operates entirely in log-space for numerical stability.

For details, see: • Code/main.pyFinal_Report.pdf (Section: Inside Algorithm Design)

➤ Results Summary

Dataset UF LF UCM LCM

  • PTB_LE 93.6 91.0 69.9 66.9
  • PTB_first_2000 86.4 84.7 22.5 20.7
  • PTB_full 89.2 87.9 28.2 26.4

These results align closely with established baselines for span-based CRF parsers.

➤ References

    1. Durrett & Klein (2015) — Neural CRF Parsing.
    1. Dozat & Manning (2017) — Deep Biaffine Attention for Dependency Parsing.
    1. CSE291E Assignment 3 — Neural CRFs for Constituency Parsing, UC San Diego.

➤ Author

Keerthana Purushotham github.com/keerthanap8898/bio

➤ License

This repository is released under the MIT License. You are free to use, modify, and distribute this work for educational and research purposes. Please cite the references above when using the code in academic work.

➤ Learning Value

This project demonstrates practical mastery of structured prediction, dynamic programming, and deep neural architectures in NLP. It serves as an educational, open-source reference implementation for the Neural CRF Inside algorithm, designed for clarity, reproducibility, and pedagogical value.

“Structured learning meets neural networks — understanding trees from tensors.”

Copyright ⓒ 2025  Keerthana Purushotham <keerthanap0808@gmail.com>, <kpurusho@ucsd.edu>, <keep.consult@proton.me>.
Licensed under MIT. See LICENSE for details.