ALIGN Package Tutorials

This directory contains user-friendly tutorial notebooks for the ALIGN (Analyzing Linguistic Alignment) package.

Tutorial Notebooks

📘 Tutorial 1: Preprocessing (`tutorial_1_preprocessing.ipynb`)

Purpose: Learn how to prepare raw conversational transcripts for alignment analysis

What's Included:

Step-by-step preprocessing workflow
Using different POS taggers (NLTK, spaCy, Stanford)
Setup instructions for optional taggers
Input/output format validation
Sample data inspection

Time to Complete: ~10-15 minutes (plus download time for optional taggers)

📗 Tutorial 2: Alignment Analysis (`tutorial_2_alignment.ipynb`)

Purpose: Learn how to analyze linguistic alignment in preprocessed conversations

What's Included:

Lexical-syntactic alignment (word and grammar similarity)
Semantic alignment with FastText
Semantic alignment with BERT (optional)
Conversation-level alignment (aggregate repertoire overlap)
Multi-analyzer comprehensive analysis
Visualization and interpretation
Correlation analysis between alignment types

Time to Complete: ~15-20 minutes (plus download time for FastText on first run)

📙 Tutorial 3: Baseline Analysis (`tutorial_3_baseline.ipynb`)

Purpose: Learn how to establish baseline alignment levels using surrogate conversations

What's Included:

Understanding surrogate/baseline analysis
Generating surrogate conversation pairs (cross-role pairing)
Controlling surrogate sample sizes with num_surrogates
Multi-party conversation support via dyadic decomposition
Analyzing alignment in surrogate data
Comparing real vs. baseline alignment
Statistical significance testing
Interpreting results

Why This Matters:

Establishes what alignment occurs "by chance"
Allows statistical testing of real alignment
Essential for research and publication
Helps interpret whether observed alignment is meaningful

Time to Complete: ~20-30 minutes (generates many surrogate pairs)

Quick Start

Step 1: Clone and Install

git clone https://github.com/your-username/align2-linguistic-alignment.git
cd align2-linguistic-alignment
pip install -r requirements.txt
pip install -e .

Step 2: Open Tutorial 1

jupyter notebook tutorial_1_preprocessing.ipynb

💡 Tip: You can also open and run these notebooks in Visual Studio Code! VS Code has excellent Jupyter notebook support with features like IntelliSense, debugging, and variable inspection. Just open the .ipynb file in VS Code and click "Run All" or run cells individually.

Step 3: Follow Along

View the notebook on GitHub to see expected outputs
Download and run locally to process your own data
Use included CHILDES sample data to learn

Step 4: Open Tutorial 2

jupyter notebook tutorial_2_alignment.ipynb

Step 5: Analyze!

Use preprocessed data from Tutorial 1
Compute alignment metrics
Visualize and interpret results

Step 6: Open Tutorial 3 (Optional but Recommended)

jupyter notebook tutorial_3_baseline.ipynb

Step 7: Compare Real vs. Baseline!

Generate surrogate conversation pairs
Compute baseline alignment levels
Test if real alignment is statistically significant
Publish with confidence!

What's Included in Each Tutorial

Tutorial 1 Output:

tutorial_output/
├── preprocessed_nltk/          # NLTK-only (fastest)
├── preprocessed_spacy/         # NLTK + spaCy (recommended)
└── preprocessed_stanford/      # NLTK + Stanford (highest accuracy)

Tutorial 2 Output:

tutorial_output/alignment_results/
├── lexsyn/                     # Lexical-syntactic alignment results
│   ├── lexsyn_alignment_ngram2_lag1_noDups_noAdd.csv
│   ├── lexsyn_alignment_ngram2_lag1_noDups_withSpacy.csv
│   └── convo_lexsyn_alignment_ngram2_noDups_noAdd.csv    # Conversation-level
├── fasttext/                   # FastText semantic alignment
│   ├── semantic_alignment_fasttext_lag1_sd3_n1.csv
│   └── convo_semantic_alignment_fasttext_sd3_n1.csv      # Conversation-level
├── bert/                       # BERT semantic alignment (optional)
│   └── semantic_alignment_bert-base-uncased_lag1.csv
├── merged/                     # Combined multi-analyzer results
│   └── merged-lag1-ngram2-noAdd-noDups-sd3-n1.csv
└── cache/                      # Model caches (FastText, BERT)

Tutorial 3 Output:

tutorial_output/baseline_results/
├── surrogates/                 # Generated surrogate conversation pairs
│   └── surrogate_run-{timestamp}/
│       ├── SurrogatePair-dyad1-dyad2-cond1.txt
│       ├── SurrogatePair-dyad1-dyad3-cond1.txt
│       └── ... (one surrogate per file pair)
├── lexsyn/                     # Baseline alignment results
│   └── baseline_alignment_lexsyn_ngram2_lag1_noDups_noAdd.csv
├── fasttext/                   # Baseline semantic alignment
│   └── baseline_alignment_fasttext_lag1_sd3_n1.csv
└── comparison/                 # Real vs. Baseline comparisons
    └── alignment_comparison_lexsyn.csv

Using Your Own Data

Input Format Requirements:

Tab-delimited text files (.txt)
Required columns: participant, content
UTF-8 encoding
One utterance per row

Example Input:

participant	content
Speaker1	Hello there
Speaker2	Hi how are you
Speaker1	I am doing well

To Use Your Data:

Tutorial 1: Change INPUT_DIR to your data directory
Run preprocessing
Tutorial 2: Update INPUT_DIR_NLTK to your preprocessed output
Run alignment analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ALIGN Package Tutorials

Tutorial Notebooks

📘 Tutorial 1: Preprocessing (`tutorial_1_preprocessing.ipynb`)

📗 Tutorial 2: Alignment Analysis (`tutorial_2_alignment.ipynb`)

📙 Tutorial 3: Baseline Analysis (`tutorial_3_baseline.ipynb`)

Quick Start

Step 1: Clone and Install

Step 2: Open Tutorial 1

Step 3: Follow Along

Step 4: Open Tutorial 2

Step 5: Analyze!

Step 6: Open Tutorial 3 (Optional but Recommended)

Step 7: Compare Real vs. Baseline!

What's Included in Each Tutorial

Tutorial 1 Output:

Tutorial 2 Output:

Tutorial 3 Output:

Using Your Own Data

Input Format Requirements:

Example Input:

To Use Your Data:

FilesExpand file tree

TUTORIAL_README.md

Latest commit

History

TUTORIAL_README.md

File metadata and controls

ALIGN Package Tutorials

Tutorial Notebooks

📘 Tutorial 1: Preprocessing (tutorial_1_preprocessing.ipynb)

📗 Tutorial 2: Alignment Analysis (tutorial_2_alignment.ipynb)

📙 Tutorial 3: Baseline Analysis (tutorial_3_baseline.ipynb)

Quick Start

Step 1: Clone and Install

Step 2: Open Tutorial 1

Step 3: Follow Along

Step 4: Open Tutorial 2

Step 5: Analyze!

Step 6: Open Tutorial 3 (Optional but Recommended)

Step 7: Compare Real vs. Baseline!

What's Included in Each Tutorial

Tutorial 1 Output:

Tutorial 2 Output:

Tutorial 3 Output:

Using Your Own Data

Input Format Requirements:

Example Input:

To Use Your Data:

📘 Tutorial 1: Preprocessing (`tutorial_1_preprocessing.ipynb`)

📗 Tutorial 2: Alignment Analysis (`tutorial_2_alignment.ipynb`)

📙 Tutorial 3: Baseline Analysis (`tutorial_3_baseline.ipynb`)