Version Comparison Dashboard

A web-based dashboard for comparing different bot categories in the OpenChatStudio (OCS) system, including Control bot and Coaching bot versions.

Features

Summary Metrics: Overview of sessions, annotations, and ratings by bot version with "Total (All Versions)" aggregation row
Performance Analysis: Refrigerator example rates and FLW scores by method and version with "All Versions" columns and "Total (All Methods)" rows
Session Volume Analysis:
- Stacked bar chart showing session volume over time by coaching method and version
- Aggregation options: day, week, month (default: week)
- Date range filtering for interactive analysis
- Summary table with session counts by method and version
User Engagement:
- Median user words per session with progression analysis
- Median participant messages per session by method and version
- Outlier filtering for extreme sessions (>50 messages or >1000 words)
- Dynamic table updates based on outlier filter
Interactive Charts: Line graphs showing session progression over time with outlier filtering
Method Detection: Automatic detection of coaching methods (Scenario, Microlearning, etc.)
Enhanced Rating Detection: Comprehensive pattern matching for 68% rating extraction (vs. 0.07% basic)
Data Quality Filtering: Excludes split sessions (less than 3 participant messages) and test sessions for accurate analysis
Dynamic Statistics: Real-time rating coverage metrics displayed as footnotes
Outlier Detection: Advanced filtering to exclude training sessions and extreme user interactions
Aggregated Views: "All Versions" columns and "Total" rows across all method-based tables for comprehensive cross-version and cross-method analysis

Quick Start

Generate Dashboard:

python version_comparison_simple.py

View Dashboard:
- Open output/version_comparison/version_comparison_dashboard.html in your browser
- Or serve via HTTP: cd output/version_comparison && python3 -m http.server 8002

Bot Categories

The dashboard compares five bot categories:

Control bot: Experiment ID: 1027993a-40c9-4484-a5fb-5c7e034dadcd (All versions)
Coaching bot V3: Experiment ID: e2b4855f-8550-47ff-87d2-d92018676ff3 (All versions)
Coaching bot V4: Experiment ID: b7621271-da98-459f-9f9b-f68335d09ad4 (Version 13 and above)
Coaching bot V5: Experiment ID: 5d8be75e-03ff-4e3a-ab6a-e0aff6580986 (Version 1 to 4)
Coaching bot V6: Experiment ID: 5d8be75e-03ff-4e3a-ab6a-e0aff6580986 (Version 5 and above)

Data Requirements

The dashboard expects data files in the following structure:

../data/consolidated/sessions/session_*.json
../data/consolidated/messages/message_*.json

Dashboard Sections

Summary Tab

Summary metrics by version (sessions, annotations, ratings)
Summary Metrics - All Versions vs Refrigerator Examples table (aggregated comparison including "Not Refrigerator Example Sessions Only" row)

Performance Tab

Refrigerator example rate by method and version (with toggle to switch between calculation modes)
Average FLW score by method and version

User Engagement Tab

Median number of participant messages per session by method and version (with "All Versions" column and "Total (All Methods)" row)
Median user words per session by method and version (with outlier filtering, "All Versions" column, and "Total (All Methods)" row)
Session progression analysis with interactive line graphs and outlier filtering

Session Volume Tab

Stacked bar chart showing volume of sessions per coach version over time
Aggregation options: day, week, month (default: week)
Date range filtering for interactive time-based analysis
Session count summary table by method and version (with "All Versions" column and "Total (All Methods)" row)

Definitions Tab

Version definitions and experiment ID mappings
Coaching method detection criteria
Metric calculation explanations

Data Quality and Filtering

The dashboard applies comprehensive filtering to ensure data quality:

Split Sessions Excluded: Sessions with less than 3 participant messages (improved definition for better data quality)
Test Sessions Excluded: Sessions with participant IDs ending in @dimagi.com (internal testing)
Outlier Sessions Filtered: Optional filtering of extreme sessions (>50 messages or >1000 words)
Consistent Filtering: All tables, graphs, and metrics use the same exclusion criteria
Enhanced Rating Detection: Comprehensive pattern matching improves rating extraction from 0.07% to 68% of sessions
Date Range Filtering: Interactive date range filtering for Session Volume chart (client-side filtering)

Outlier Detection

The dashboard includes advanced outlier detection to identify and optionally exclude extreme sessions:

Message Count Threshold: Sessions with >50 participant messages
Word Count Threshold: Sessions with >1000 total participant words
Training Session Detection: Identifies intensive practice/training sessions
Interactive Filtering: Checkboxes allow real-time toggling between filtered and unfiltered views

Version Detection

The dashboard identifies bot versions based on:

Experiment IDs: Specific experiment identifiers
Version Tags: Extracted from the last message in each session
Version Ranges: Numeric ranges for different bot versions

Coaching Methods

Methods are detected through:

Session Tags: coach_method_* tags
Message Tags: Tags in assistant messages
Content Analysis: Keyword matching in message content

Requirements

Python 3.7+
Required packages listed in requirements.txt

Deployment

This repository is configured for GitHub Pages deployment. The dashboard will be automatically available at: https://[username].github.io/[repository-name]/output/version_comparison/version_comparison_dashboard.html

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
gs_score_analysis		gs_score_analysis
output/version_comparison		output/version_comparison
.gitignore		.gitignore
FOCUS_GROUP_SELECTION_REPORT.md		FOCUS_GROUP_SELECTION_REPORT.md
README.md		README.md
TECHNICAL_SPECS.md		TECHNICAL_SPECS.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
USAGE.md		USAGE.md
analyze_rating_patterns.py		analyze_rating_patterns.py
analyze_rating_question.py		analyze_rating_question.py
analyze_today_yesterday_distribution.py		analyze_today_yesterday_distribution.py
analyze_unknown_messages.py		analyze_unknown_messages.py
analyze_v6_experiment.py		analyze_v6_experiment.py
comprehensive_rating_analysis.py		comprehensive_rating_analysis.py
find_today_yesterday_questions.py		find_today_yesterday_questions.py
find_v6_release_date.py		find_v6_release_date.py
find_v6_unknown_sessions.py		find_v6_unknown_sessions.py
generate_session_assignments.py		generate_session_assignments.py
investigate_spike.py		investigate_spike.py
requirements.txt		requirements.txt
run_dashboard.py		run_dashboard.py
select_focus_group_participants.py		select_focus_group_participants.py
selected_focus_group_participants.csv		selected_focus_group_participants.csv
session_assignments.csv		session_assignments.csv
verify_v6_release_date.py		verify_v6_release_date.py
version_comparison_simple.py		version_comparison_simple.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Version Comparison Dashboard

Features

Quick Start

Bot Categories

Data Requirements

Dashboard Sections

Summary Tab

Performance Tab

User Engagement Tab

Session Volume Tab

Definitions Tab

Data Quality and Filtering

Outlier Detection

Version Detection

Coaching Methods

Requirements

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Version Comparison Dashboard

Features

Quick Start

Bot Categories

Data Requirements

Dashboard Sections

Summary Tab

Performance Tab

User Engagement Tab

Session Volume Tab

Definitions Tab

Data Quality and Filtering

Outlier Detection

Version Detection

Coaching Methods

Requirements

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages