NL2SQL Model Training (Research Prototype)

Note: All Python imports in this project use absolute imports relative to the src/ directory. Scripts and modules should be run with src/ as the working directory or as the Python path root. For example:
cd src
python cli/cli.py ...
If you encounter ImportError, ensure your working directory is src/.

This repository contains code for training a natural language to SQL (NL2SQL) transformer model with comprehensive TensorBoard logging.

Disclaimer: This project was originally built as a learning exercise and a proof of concept for research on relation-aware attention and pointer-generator mechanisms. The codebase has not been hardened for production use and has only minimal testing. It is provided as a reference for educational and experimental purposes.

This means that certain parts of the training pipeline may be brittle or incomplete. There is no formal support or guarantee of backwards compatibility, and the repository is maintained mainly for illustrative purposes. Feel free to fork it and adapt pieces for your own experiments.

Setup and Installation

Clone the repository:

git clone https://github.com/Shaurya-Sethi/nl2sql-rat-pointer.git
cd nl2sql-rat-pointer

Install dependencies:
```
pip install -r requirements.txt
```
Ensure you have the required data files and model checkpoints.

Training

Configuration

The model uses a YAML configuration file for both model architecture and training parameters. TensorBoard logging is configured in the logging section of the config:

logging:
  tensorboard_log_dir: "runs"  # Base directory for TensorBoard logs
  log_every_n_steps: 10        # Log training metrics every N steps
  log_grad_norm: true          # Whether to log gradient norms
  log_grad_histogram: false    # Whether to log parameter histograms (expensive)
  log_memory: true             # Whether to log memory usage

Running Training

For pretraining:

python src/train.py --phase pretrain --config src/config.yaml

For supervised fine-tuning:

python src/train.py --phase sft --config src/config.yaml --pretrained_model path/to/pretrained_model.pt

TensorBoard Monitoring

TensorBoard is integrated for comprehensive metric logging during training.

Logged Metrics

The following metrics are logged:

Training Metrics

Loss (per step and per epoch)
Perplexity
Token accuracy
Learning rate
Gradient norms
Memory usage

Validation Metrics

Loss
Perplexity
Token accuracy
Throughput (tokens per second)

Model Parameters

Weight histograms
Gradient histograms
Per-layer gradient norms

Starting TensorBoard Locally

To view training metrics locally:

# Start TensorBoard server
tensorboard --logdir=runs

# Access TensorBoard in your browser at
# http://localhost:6006

You can also compare multiple runs:

tensorboard --logdir runs_to_compare/run1:runs/20230601-120000_sft,runs_to_compare/run2:runs/20230602-130000_sft

Using TensorBoard on GCP

If your model is training on Google Cloud Platform (GCP), follow these steps to monitor training:

Option 1: Port Forwarding

SSH into your GCP instance with port forwarding:

gcloud compute ssh your-instance-name -- -L 6006:localhost:6006

Start TensorBoard on the remote instance:
```
tensorboard --logdir=runs --bind_all
```
Access TensorBoard in your local browser at http://localhost:6006

Option 2: TensorBoard.dev (For Sharing Results)

Install the tensorboard plugin:
```
pip install tensorboard-plugin-wit
```

Upload logs to tensorboard.dev:

tensorboard dev upload --logdir runs \
  --name "NL2SQL Experiment" \
  --description "Training results for NL2SQL model"

Follow the link provided to view your results online. They'll be available for 90 days.

Interpreting TensorBoard Metrics

Loss and Perplexity

Training Loss: Should steadily decrease
Validation Loss: Should decrease but may plateau
Perplexity: The exponentiated loss value; lower is better

Gradient Metrics

Gradient Norm: Measures the magnitude of gradients
- Very high values (>10) might indicate potential instability
- Very low values (<0.01) might indicate vanishing gradients
- Watch for sudden spikes or drops

Learning Rate

Should follow the expected schedule (warmup, decay, etc.)

Memory Usage

Monitor GPU memory to detect leaks or inefficiencies

Token Accuracy

Direction should correlate with loss improvement
Useful for tracking concrete progress

Customizing TensorBoard Logging

To modify what's being logged:

Edit the TensorBoard configuration in src/config.yaml
For more detailed changes, modify the logging code in src/utils/training.py

Troubleshooting

"No data found": Ensure your log directory is correct and that training has saved some data
High memory usage: Set log_grad_histogram: false to reduce memory overhead
Missing GPU metrics: Install pynvml with pip install pynvml

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_config.py		test_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NL2SQL Model Training (Research Prototype)

Setup and Installation

Training

Configuration

Running Training

TensorBoard Monitoring

Logged Metrics

Training Metrics

Validation Metrics

Model Parameters

Starting TensorBoard Locally

Using TensorBoard on GCP

Option 1: Port Forwarding

Option 2: TensorBoard.dev (For Sharing Results)

Interpreting TensorBoard Metrics

Loss and Perplexity

Gradient Metrics

Learning Rate

Memory Usage

Token Accuracy

Customizing TensorBoard Logging

Troubleshooting

About

Uh oh!

Releases

Packages

Languages

License

Shaurya-Sethi/nl2sql-rat-pointer

Folders and files

Latest commit

History

Repository files navigation

NL2SQL Model Training (Research Prototype)

Setup and Installation

Training

Configuration

Running Training

TensorBoard Monitoring

Logged Metrics

Training Metrics

Validation Metrics

Model Parameters

Starting TensorBoard Locally

Using TensorBoard on GCP

Option 1: Port Forwarding

Option 2: TensorBoard.dev (For Sharing Results)

Interpreting TensorBoard Metrics

Loss and Perplexity

Gradient Metrics

Learning Rate

Memory Usage

Token Accuracy

Customizing TensorBoard Logging

Troubleshooting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages